Search results for: logistic regression model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18229

Search results for: logistic regression model

18109 Generalized Additive Model Approach for the Chilean Hake Population in a Bio-Economic Context

Authors: Selin Guney, Andres Riquelme

Abstract:

The traditional bio-economic method for fisheries modeling uses some estimate of the growth parameters and the system carrying capacity from a biological model for the population dynamics (usually a logistic population growth model) which is then analyzed as a traditional production function. The stock dynamic is transformed into a revenue function and then compared with the extraction costs to estimate the maximum economic yield. In this paper, the logistic population growth model for the population is combined with a forecast of the abundance and location of the stock by using a generalized additive model approach. The paper focuses on the Chilean hake population. This method allows for the incorporation of climatic variables and the interaction with other marine species, which in turn will increase the reliability of the estimates and generate better extraction paths for different conservation objectives, such as the maximum biological yield or the maximum economic yield.

Keywords: bio-economic, fisheries, GAM, production

Procedia PDF Downloads 216
18108 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK

Authors: Mais Khader, Xingjie Wei

Abstract:

This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.

Keywords: company survival, entrepreneurship, females, machine learning, SMEs

Procedia PDF Downloads 51
18107 Regression Model Evaluation on Depth Camera Data for Gaze Estimation

Authors: James Purnama, Riri Fitri Sari

Abstract:

We investigate the machine learning algorithm selection problem in the term of a depth image based eye gaze estimation, with respect to its essential difficulty in reducing the number of required training samples and duration time of training. Statistics based prediction accuracy are increasingly used to assess and evaluate prediction or estimation in gaze estimation. This article evaluates Root Mean Squared Error (RMSE) and R-Squared statistical analysis to assess machine learning methods on depth camera data for gaze estimation. There are 4 machines learning methods have been evaluated: Random Forest Regression, Regression Tree, Support Vector Machine (SVM), and Linear Regression. The experiment results show that the Random Forest Regression has the lowest RMSE and the highest R-Squared, which means that it is the best among other methods.

Keywords: gaze estimation, gaze tracking, eye tracking, kinect, regression model, orange python

Procedia PDF Downloads 503
18106 Farmers’ Access to Agricultural Extension Services Delivery Systems: Evidence from a Field Study in India

Authors: Ankit Nagar, Dinesh Kumar Nauriyal, Sukhpal Singh

Abstract:

This paper examines the key determinants of farmers’ access to agricultural extension services, sources of agricultural extension services preferred and accessed by the farmers. An ordered logistic regression model was used to analyse the data of the 360 sample households based on a primary survey conducted in western Uttar Pradesh, India. The study finds that farmers' decision to engage in the agricultural extension programme is significantly influenced by factors such as education level, gender, farming experience, social group, group membership, farm size, credit access, awareness about the extension scheme, farmers' perception, and distance from extension sources. The most intriguing finding of this study is that the progressive farmers, which have long been regarded as a major source of knowledge diffusion, are the most distrusted sources of information as they are suspected of withholding vital information from potential beneficiaries. The positive relationship between farm size and ‘Access’ underlines that the extension services should revisit their strategies for targeting more marginal and small farmers constituting over 85 percent of the agricultural households by incorporating their priorities in their outreach programs. The study suggests that marginal and small farmers' productive potential could still be greatly augmented by the appropriate technology, advisory services, guidance, and improved market access. Also, the perception of poor quality of the public extension services can be corrected by initiatives aimed at building up extension workers' capacity.

Keywords: agriculture, access, extension services, ordered logistic regression

Procedia PDF Downloads 170
18105 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 430
18104 Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping

Authors: Endrick Barnacin, Jean-Luc Henry, Jack Molinié, Jimmy Nagau, Hélène Delatte, Gérard Lebreton

Abstract:

Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.

Keywords: pollen recognition, logistic model tree, expectation-maximization, local binary pattern

Procedia PDF Downloads 147
18103 Using Machine Learning to Classify Different Body Parts and Determine Healthiness

Authors: Zachary Pan

Abstract:

Our general mission is to solve the problem of classifying images into different body part types and deciding if each of them is healthy or not. However, for now, we will determine healthiness for only one-sixth of the body parts, specifically the chest. We will detect pneumonia in X-ray scans of those chest images. With this type of AI, doctors can use it as a second opinion when they are taking CT or X-ray scans of their patients. Another ad-vantage of using this machine learning classifier is that it has no human weaknesses like fatigue. The overall ap-proach to this problem is to split the problem into two parts: first, classify the image, then determine if it is healthy. In order to classify the image into a specific body part class, the body parts dataset must be split into test and training sets. We can then use many models, like neural networks or logistic regression models, and fit them using the training set. Now, using the test set, we can obtain a realistic accuracy the models will have on images in the real world since these testing images have never been seen by the models before. In order to increase this testing accuracy, we can also apply many complex algorithms to the models, like multiplicative weight update. For the second part of the problem, to determine if the body part is healthy, we can have another dataset consisting of healthy and non-healthy images of the specific body part and once again split that into the test and training sets. We then use another neural network to train on those training set images and use the testing set to figure out its accuracy. We will do this process only for the chest images. A major conclusion reached is that convolutional neural networks are the most reliable and accurate at image classification. In classifying the images, the logistic regression model, the neural network, neural networks with multiplicative weight update, neural networks with the black box algorithm, and the convolutional neural network achieved 96.83 percent accuracy, 97.33 percent accuracy, 97.83 percent accuracy, 96.67 percent accuracy, and 98.83 percent accuracy, respectively. On the other hand, the overall accuracy of the model that de-termines if the images are healthy or not is around 78.37 percent accuracy.

Keywords: body part, healthcare, machine learning, neural networks

Procedia PDF Downloads 67
18102 Factors Affecting Students' Performance in the Examination

Authors: Amylyn F. Labasano

Abstract:

A significant number of empirical studies are carried out to investigate factors affecting college students’ performance in the academic examination. With a wide-array of literature-and studies-supported findings, this study is limited only on the students’ probability of passing periodical exams which is associated with students’ gender, absences in the class, use of reference book, and hours of study. Binary logistic regression was the technique used in the analysis. The research is based on the students’ record and data collected through survey. The result reveals that gender, use of reference book and hours of study are significant predictors of passing an examination while students’ absenteeism is an insignificant predictor. Females have 45% likelihood of passing the exam than their male classmates. Students who use and read their reference book are 38 times more likely pass the exam than those who do not use and read their reference book. Those who spent more than 3 hours in studying are four (4) times more likely pass the exam than those who spent only 3 hours or less in studying.

Keywords: absences, binary logistic regression, gender, hours of study prediction-causation method, periodical exams, random sampling, reference book

Procedia PDF Downloads 273
18101 The Alarming Caesarean-Section Delivery Rate in Addis Ababa, Ethiopia

Authors: Yibeltal T. Bayou, Yohana S. Mashalla, Gloria Thupayagale-Tshweneagae

Abstract:

Background: According to the World Health Organization, caesarean section delivery rates of more than 10-15% caesarean section deliveries in any specific geographic region in the world are not justifiable. The aim of the study was to describe the level and analyse determinants of caesarean section delivery in Addis Ababa. Methods: Data was collected in Addis Ababa using a structured questionnaire administered to 901 women aged 15-49 years through a stratified two-stage cluster sampling technique. Binary logistic regression model was employed to identify predictors of caesarean section delivery. Results: Among the 835 women who delivered their last birth at healthcare facilities, 19.2% of them gave birth by caesarean section. About 9.0% of the caesarean section births were due to mother’s request or service provider’s influence without any medical indication. The caesarean section delivery rate was much higher than the recommended rate particularly among the non-slum residents (27.2%); clients of private healthcare facilities (41.1%); currently married women (20.6%); women with secondary (22.2%) and tertiary (33.6%) level of education; and women belonging to the highest wealth quintile household (28.2%). The majority (65.8%) of the caesarean section clients were not informed about the consequences of caesarean section delivery by service providers. The logistic regression model shows that older age (30-49), secondary and above education, non-slum residence, high-risk pregnancy and receiving adequate antenatal care were significantly positively associated with caesarean section delivery. Conclusion: Despite the unreserved effort towards achieving MDG 5 through safe skilled delivery assistance among others, the high caesarean section rate beyond the recommend limit, and the finding that caesarean sections done without medical indications were also alarming. The government and city administration should take appropriate measures before the problems become setbacks in healthcare provision. Further investigations should focus on the effect of caesarean section delivery on maternal and child health outcomes in the study area.

Keywords: Addis Ababa, caesarean section, mode of delivery, slum residence

Procedia PDF Downloads 372
18100 Effect of Drying on the Concrete Structures

Authors: A. Brahma

Abstract:

The drying of hydraulics materials is unavoidable and conducted to important spontaneous deformations. In this study, we show that it is possible to describe the drying shrinkage of the high-performance concrete by a simple expression. A multiple regression model was developed for the prediction of the drying shrinkage of the high-performance concrete. The assessment of the proposed model has been done by a set of statistical tests. The model developed takes in consideration the main parameters of confection and conservation. There was a very good agreement between drying shrinkage predicted by the multiple regression model and experimental results. The developed model adjusts easily to all hydraulic concrete types.

Keywords: hydraulic concretes, drying, shrinkage, prediction, modeling

Procedia PDF Downloads 335
18099 Myers-Briggs Type Index Personality Type Classification Based on an Individual’s Spotify Playlists

Authors: Sefik Can Karakaya, Ibrahim Demir

Abstract:

In this study, the relationship between musical preferences and personality traits has been investigated in terms of Spotify audio analysis features. The aim of this paper is to build such a classifier capable of segmenting people into their Myers-Briggs Type Index (MBTI) personality type based on their Spotify playlists. Music takes an important place in the lives of people all over the world and online music streaming platforms make it easier to reach musical contents. In this context, the motivation to build such a classifier is allowing people to gain access to their MBTI personality type and perhaps for more reliably and more quickly. For this purpose, logistic regression and deep neural networks have been selected for classifier and their performances are compared. In conclusion, it has been found that musical preferences differ statistically between personality traits, and evaluated models are able to distinguish personality types based on given musical data structure with over %60 accuracy rate.

Keywords: myers-briggs type indicator, music psychology, Spotify, behavioural user profiling, deep neural networks, logistic regression

Procedia PDF Downloads 97
18098 Exploring Syntactic and Semantic Features for Text-Based Authorship Attribution

Authors: Haiyan Wu, Ying Liu, Shaoyun Shi

Abstract:

Authorship attribution is to extract features to identify authors of anonymous documents. Many previous works on authorship attribution focus on statistical style features (e.g., sentence/word length), content features (e.g., frequent words, n-grams). Modeling these features by regression or some transparent machine learning methods gives a portrait of the authors' writing style. But these methods do not capture the syntactic (e.g., dependency relationship) or semantic (e.g., topics) information. In recent years, some researchers model syntactic trees or latent semantic information by neural networks. However, few works take them together. Besides, predictions by neural networks are difficult to explain, which is vital in authorship attribution tasks. In this paper, we not only utilize the statistical style and content features but also take advantage of both syntactic and semantic features. Different from an end-to-end neural model, feature selection and prediction are two steps in our method. An attentive n-gram network is utilized to select useful features, and logistic regression is applied to give prediction and understandable representation of writing style. Experiments show that our extracted features can improve the state-of-the-art methods on three benchmark datasets.

Keywords: authorship attribution, attention mechanism, syntactic feature, feature extraction

Procedia PDF Downloads 97
18097 A Fuzzy Nonlinear Regression Model for Interval Type-2 Fuzzy Sets

Authors: O. Poleshchuk, E. Komarov

Abstract:

This paper presents a regression model for interval type-2 fuzzy sets based on the least squares estimation technique. Unknown coefficients are assumed to be triangular fuzzy numbers. The basic idea is to determine aggregation intervals for type-1 fuzzy sets, membership functions of whose are low membership function and upper membership function of interval type-2 fuzzy set. These aggregation intervals were called weighted intervals. Low and upper membership functions of input and output interval type-2 fuzzy sets for developed regression models are considered as piecewise linear functions.

Keywords: interval type-2 fuzzy sets, fuzzy regression, weighted interval

Procedia PDF Downloads 331
18096 A Location Routing Model for the Logistic System in the Mining Collection Centers of the Northern Region of Boyacá-Colombia

Authors: Erika Ruíz, Luis Amaya, Diego Carreño

Abstract:

The main objective of this study is to design a mathematical model for the logistics of mining collection centers in the northern region of the department of Boyacá (Colombia), determining the structure that facilitates the flow of products along the supply chain. In order to achieve this, it is necessary to define a suitable design of the distribution network, taking into account the products, customer’s characteristics and the availability of information. Likewise, some other aspects must be defined, such as number and capacity of collection centers to establish, routes that must be taken to deliver products to the customers, among others. This research will use one of the operation research problems, which is used in the design of distribution networks known as Location Routing Problem (LRP).

Keywords: location routing problem, logistic, mining collection, model

Procedia PDF Downloads 184
18095 An Investigation of the Relevant Factors of Unplanned Readmission within 14 Days of Discharge in a Regional Teaching Hospital in South Taiwan

Authors: Xuan Hua Huang, Shu Fen Wu, Yi Ting Huang, Pi Yueh Lee

Abstract:

Background: In Taiwan, the Taiwan healthcare care Indicator Series regards the rate of hospital readmission as an important indicator of healthcare quality. Unplanned readmission not only effects patient’s condition but also increase healthcare utilization rate and healthcare costs. Purpose: The purpose of this study was explored the effects of adult unplanned readmission within 14 days of discharge at a regional teaching hospital in South Taiwan. Methods: The retrospectively review design was used. A total 495 participants of unplanned readmissions and 878 of non-readmissions within 14 days recruited from a regional teaching hospital in Southern Taiwan. The instruments used included the Charlson Comorbidity Index, and demographic characteristics, and disease-related variables. Statistical analyses were performed with SPSS version 22.0. The descriptive statistics were used (means, standard deviations, and percentage) and the inferential statistics were used T-test, Chi-square test and Logistic regression. Results: The unplanned readmissions within 14 days rate was 36%. The majorities were 268 males (54.1%), aged >65 were 318 (64.2%), and mean age was 68.8±14.65 years (23-98years). The mean score for the comorbidities was 3.77±2.73. The top three diagnosed of the readmission were digestive diseases (32.7%), respiratory diseases (15.2%), and genitourinary diseases (10.5%). There were significant relationships among the gender, age, marriage, comorbidity status, and discharge planning services (χ2: 3.816-16.474, p: 0.051~0.000). Logistic regression analysis showed that old age (OR = 1.012, 95% CI: 1.003, 1.021), had the multi-morbidity (OR = 0.712~4.040, 95% CI: 0.559~8.522), had been consult with discharge planning services (OR = 1.696, 95% CI: 1.105, 2.061) have a higher risk of readmission. Conclusions: This study finds that multi-morbidity was independent risk factor for unplanned readmissions at 14 days, recommended that the interventional treatment of the medical team be provided to provide integrated care for multi-morbidity to improve the patient's self-care ability and reduce the 14-day unplanned readmission rate.

Keywords: unplanned readmission, comorbidities, Charlson comorbidity index, logistic regression

Procedia PDF Downloads 118
18094 Predictors of Glycaemic Variability and Its Association with Mortality in Critically Ill Patients with or without Diabetes

Authors: Haoming Ma, Guo Yu, Peiru Zhou

Abstract:

Background: Previous studies show that dysglycemia, mostly hyperglycemia, hypoglycemia and glycemic variability(GV), are associated with excess mortality in critically ill patients, especially those without diabetes. Glycemic variability is an increasingly important measure of glucose control in the intensive care unit (ICU) due to this association. However, there is limited data pertaining to the relationship between different clinical factors and glycemic variability and clinical outcomes categorized by their DM status. This retrospective study of 958 intensive care unit(ICU) patients was conducted to investigate the relationship between GV and outcome in critically ill patients and further to determine the significant factors that contribute to the glycemic variability. Aim: We hypothesize that the factors contributing to mortality and the glycemic variability are different from critically ill patients with or without diabetes. And the primary aim of this study was to determine which dysglycemia (hyperglycemia\hypoglycemia\glycemic variability) is independently associated with an increase in mortality among critically ill patients in different groups (DM/Non-DM). Secondary objectives were to further investigate any factors affecting the glycemic variability in two groups. Method: A total of 958 diabetic and non-diabetic patients with severe diseases in the ICU were selected for this retrospective analysis. The glycemic variability was defined as the coefficient of variation (CV) of blood glucose. The main outcome was death during hospitalization. The secondary outcome was GV. The logistic regression model was used to identify factors associated with mortality. The relationships between GV and other variables were investigated using linear regression analysis. Results: Information on age, APACHE II score, GV, gender, in-ICU treatment and nutrition was available for 958 subjects. Predictors remaining in the final logistic regression model for mortality were significantly different in DM/Non-DM groups. Glycemic variability was associated with an increase in mortality in both DM(odds ratio 1.05; 95%CI:1.03-1.08,p<0.001) or Non-DM group(odds ratio 1.07; 95%CI:1.03-1.11,p=0.002). For critically ill patients without diabetes, factors associated with glycemic variability included APACHE II score(regression coefficient, 95%CI:0.29,0.22-0.36,p<0.001), Mean BG(0.73,0.46-1.01,p<0.001), total parenteral nutrition(2.87,1.57-4.17,p<0.001), serum albumin(-0.18,-0.271 to -0.082,p<0.001), insulin treatment(2.18,0.81-3.55,p=0.002) and duration of ventilation(0.006,0.002-1.010,p=0.003).However, for diabetes patients, APACHE II score(0.203,0.096-0.310,p<0.001), mean BG(0.503,0.138-0.869,p=0.007) and duration of diabetes(0.167,0.033-0.301,p=0.015) remained as independent risk factors of GV. Conclusion: We found that the relation between dysglycemia and mortality is different in the diabetes and non-diabetes groups. And we confirm that GV was associated with excess mortality in DM or Non-DM patients. Furthermore, APACHE II score, Mean BG, total parenteral nutrition, serum albumin, insulin treatment and duration of ventilation were significantly associated with an increase in GV in Non-DM patients. While APACHE II score, mean BG and duration of diabetes (years) remained as independent risk factors of increased GV in DM patients. These findings provide important context for further prospective trials investigating the effect of different clinical factors in critically ill patients with or without diabetes.

Keywords: diabetes, glycemic variability, predictors, severe disease

Procedia PDF Downloads 148
18093 Exploring Factors Related to Unplanning Readmission of Elderly Patients in Taiwan

Authors: Hui-Yen Lee, Hsiu-Yun Wei, Guey-Jen Lin, Pi-Yueh Lee Lee

Abstract:

Background: Unplanned hospital readmissions increase healthcare costs and have been considered a marker of poor healthcare performance. The elderly face a higher risk of unplanned readmission due to elderly-specific characteristics such as deteriorating body functions and the relatively high incidence of complications after treatment of acute diseases. Purpose: The aim of this study was exploring the factors that relate to the unplanned readmission of elderly within 14 days of discharge at our hospital in southern Taiwan. Methods: We retrospectively reviewed the medical records of patients aged ≥65 years who had been re-admitted between January 2018 and December 2018.The Charlson Comorbidity score was calculated using previous used method. Related factors that affected the rate of unplanned readmission within 14 days of discharge were screened and analyzed using the chi-squared test and logistic regression analysis. Results: This study enrolled 829 subjects aged more than 65 years. The numbers of unplanned readmission patients within 14 days were 318 cases, while those did not belong to the unplanned readmission were 511 cases. In 2018, the rate of elderly patients in unplanned 14 days readmissions was 38.4%. The majority patients were females (166 cases, 52.2%), with an average age of 77.6 ± 7.90 years (65-98). The average value of Charlson Comorbidity score was 4.42±2.76. Using logistic regression analysis, we found that the gastric or peptic ulcer (OR=1.917 , P< 0.002), diabetes (OR= 0.722, P< 0.043), hemiplegia (OR= 2.292, P< 0.015), metastatic solid tumor (OR= 2.204, P< 0.025), hypertension (OR= 0.696, P< 0.044), and skin ulcer/cellulitis (OR= 2.747, P< 0.022) have significantly higher risk of 14-day readmissions. Conclusion: The results of the present study may assist the healthcare teams to understand the factors that may affect unplanned readmission in the elderly. We recommend that these teams give efficient approach in their medical practice, provide timely health education for elderly, and integrative healthcare for chronic diseases in order to reduce unplanned readmissions.

Keywords: unplanning readmission, elderly, Charlson comorbidity score, logistic regression analysis

Procedia PDF Downloads 104
18092 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 98
18091 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 47
18090 Efficient Model Selection in Linear and Non-Linear Quantile Regression by Cross-Validation

Authors: Yoonsuh Jung, Steven N. MacEachern

Abstract:

Check loss function is used to define quantile regression. In the prospect of cross validation, it is also employed as a validation function when underlying truth is unknown. However, our empirical study indicates that the validation with check loss often leads to choosing an over estimated fits. In this work, we suggest a modified or L2-adjusted check loss which rounds the sharp corner in the middle of check loss. It has a large effect of guarding against over fitted model in some extent. Through various simulation settings of linear and non-linear regressions, the improvement of check loss by L2 adjustment is empirically examined. This adjustment is devised to shrink to zero as sample size grows.

Keywords: cross-validation, model selection, quantile regression, tuning parameter selection

Procedia PDF Downloads 403
18089 An Experimental Machine Learning Analysis on Adaptive Thermal Comfort and Energy Management in Hospitals

Authors: Ibrahim Khan, Waqas Khalid

Abstract:

The Healthcare sector is known to consume a higher proportion of total energy consumption in the HVAC market owing to an excessive cooling and heating requirement in maintaining human thermal comfort in indoor conditions, catering to patients undergoing treatment in hospital wards, rooms, and intensive care units. The indoor thermal comfort conditions in selected hospitals of Islamabad, Pakistan, were measured on a real-time basis with the collection of first-hand experimental data using calibrated sensors measuring Ambient Temperature, Wet Bulb Globe Temperature, Relative Humidity, Air Velocity, Light Intensity and CO2 levels. The Experimental data recorded was analyzed in conjunction with the Thermal Comfort Questionnaire Surveys, where the participants, including patients, doctors, nurses, and hospital staff, were assessed based on their thermal sensation, acceptability, preference, and comfort responses. The Recorded Dataset, including experimental and survey-based responses, was further analyzed in the development of a correlation between operative temperature, operative relative humidity, and other measured operative parameters with the predicted mean vote and adaptive predicted mean vote, with the adaptive temperature and adaptive relative humidity estimated using the seasonal data set gathered for both summer – hot and dry, and hot and humid as well as winter – cold and dry, and cold and humid climate conditions. The Machine Learning Logistic Regression Algorithm was incorporated to train the operative experimental data parameters and develop a correlation between patient sensations and the thermal environmental parameters for which a new ML-based adaptive thermal comfort model was proposed and developed in our study. Finally, the accuracy of our model was determined using the K-fold cross-validation.

Keywords: predicted mean vote, thermal comfort, energy management, logistic regression, machine learning

Procedia PDF Downloads 17
18088 Unveiling Comorbidities in Irritable Bowel Syndrome: A UK BioBank Study utilizing Supervised Machine Learning

Authors: Uswah Ahmad Khan, Muhammad Moazam Fraz, Humayoon Shafique Satti, Qasim Aziz

Abstract:

Approximately 10-14% of the global population experiences a functional disorder known as irritable bowel syndrome (IBS). The disorder is defined by persistent abdominal pain and an irregular bowel pattern. IBS significantly impairs work productivity and disrupts patients' daily lives and activities. Although IBS is widespread, there is still an incomplete understanding of its underlying pathophysiology. This study aims to help characterize the phenotype of IBS patients by differentiating the comorbidities found in IBS patients from those in non-IBS patients using machine learning algorithms. In this study, we extracted samples coding for IBS from the UK BioBank cohort and randomly selected patients without a code for IBS to create a total sample size of 18,000. We selected the codes for comorbidities of these cases from 2 years before and after their IBS diagnosis and compared them to the comorbidities in the non-IBS cohort. Machine learning models, including Decision Trees, Gradient Boosting, Support Vector Machine (SVM), AdaBoost, Logistic Regression, and XGBoost, were employed to assess their accuracy in predicting IBS. The most accurate model was then chosen to identify the features associated with IBS. In our case, we used XGBoost feature importance as a feature selection method. We applied different models to the top 10% of features, which numbered 50. Gradient Boosting, Logistic Regression and XGBoost algorithms yielded a diagnosis of IBS with an optimal accuracy of 71.08%, 71.427%, and 71.53%, respectively. Among the comorbidities most closely associated with IBS included gut diseases (Haemorrhoids, diverticular diseases), atopic conditions(asthma), and psychiatric comorbidities (depressive episodes or disorder, anxiety). This finding emphasizes the need for a comprehensive approach when evaluating the phenotype of IBS, suggesting the possibility of identifying new subsets of IBS rather than relying solely on the conventional classification based on stool type. Additionally, our study demonstrates the potential of machine learning algorithms in predicting the development of IBS based on comorbidities, which may enhance diagnosis and facilitate better management of modifiable risk factors for IBS. Further research is necessary to confirm our findings and establish cause and effect. Alternative feature selection methods and even larger and more diverse datasets may lead to more accurate classification models. Despite these limitations, our findings highlight the effectiveness of Logistic Regression and XGBoost in predicting IBS diagnosis.

Keywords: comorbidities, disease association, irritable bowel syndrome (IBS), predictive analytics

Procedia PDF Downloads 82
18087 Association Between Advanced Parental Age and Implantation Failure: A Prospective Cohort Study in Anhui, China

Authors: Jiaqian Yin, Ruoling Chen, David Churchill, Huijuan Zou, Peipei Guo, Chunmei Liang, Xiaoqing Peng, Zhikang Zhang, Weiju Zhou, Yunxia Cao

Abstract:

Purpose: This study aimed to explore the interaction of male and female age on implantation failure from in vitro fertilisation (IVF)/ intracytoplasmic sperm injection (ICSI) treatments in couples following their first cycles using the Anhui Maternal-Child Health Study (AMCHS). Methods: The AMCHS recruited 2042 infertile couples who were physically fit for in vitro fertilisation (IVF) or intracytoplasmic sperm injection (ICSI) treatment at the Reproductive Centre of the First Affiliated Hospital of Anhui Medical University between May 2017 to April 2021. This prospective cohort study analysed the data from 1910 cohort couples for the current paper data analysis. The multivariate logistic regression model was used to identify the effect of male and female age on implantation failure after controlling for confounding factors. Male age and female age were examined as continuous and categorical (male age: 20-<25, 25-<30, 30-<35, 35-<40, ≥40; female age: 20-<25, 25-<30, 30-<35, 35-<40, ≥40) predictors. Results: Logistic regression indicated that advanced maternal age was associated with increased implantation failure (P<0.001). There was evidence of an interaction between maternal age (30-<35 and ≥ 35) and paternal age (≥35) on implantation failure. (p<0.05). Only when the male was ≥35 years of increased maternal age was associated with the risk of implantation failure. Conclusion: In conclusion, there was an additive effect on implantation failure with advanced parental age. The impact of advanced maternal age was only seen in the older paternal age group. The delay of childbearing in both men and women will be a serious public issue that may contribute to a higher risk of implantation failure in patients needing assisted reproductive technology (ART).

Keywords: parental age, infertility, cohort study, IVF

Procedia PDF Downloads 113
18086 Prediction of Bariatric Surgery Publications by Using Different Machine Learning Algorithms

Authors: Senol Dogan, Gunay Karli

Abstract:

Identification of relevant publications based on a Medline query is time-consuming and error-prone. An all based process has the potential to solve this problem without any manual work. To the best of our knowledge, our study is the first to investigate the ability of machine learning to identify relevant articles accurately. 5 different machine learning algorithms were tested using 23 predictors based on several metadata fields attached to publications. We find that the Boosted model is the best-performing algorithm and its overall accuracy is 96%. In addition, specificity and sensitivity of the algorithm is 97 and 93%, respectively. As a result of the work, we understood that we can apply the same procedure to understand cancer gene expression big data.

Keywords: prediction of publications, machine learning, algorithms, bariatric surgery, comparison of algorithms, boosted, tree, logistic regression, ANN model

Procedia PDF Downloads 174
18085 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 139
18084 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 109
18083 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity

Procedia PDF Downloads 280
18082 Predicting Survival in Cancer: How Cox Regression Model Compares to Artifial Neural Networks?

Authors: Dalia Rimawi, Walid Salameh, Amal Al-Omari, Hadeel AbdelKhaleq

Abstract:

Predication of Survival time of patients with cancer, is a core factor that influences oncologist decisions in different aspects; such as offered treatment plans, patients’ quality of life and medications development. For a long time proportional hazards Cox regression (ph. Cox) was and still the most well-known statistical method to predict survival outcome. But due to the revolution of data sciences; new predication models were employed and proved to be more flexible and provided higher accuracy in that type of studies. Artificial neural network is one of those models that is suitable to handle time to event predication. In this study we aim to compare ph Cox regression with artificial neural network method according to data handling and Accuracy of each model.

Keywords: Cox regression, neural networks, survival, cancer.

Procedia PDF Downloads 150
18081 Survival and Hazard Maximum Likelihood Estimator with Covariate Based on Right Censored Data of Weibull Distribution

Authors: Al Omari Mohammed Ahmed

Abstract:

This paper focuses on Maximum Likelihood Estimator with Covariate. Covariates are incorporated into the Weibull model. Under this regression model with regards to maximum likelihood estimator, the parameters of the covariate, shape parameter, survival function and hazard rate of the Weibull regression distribution with right censored data are estimated. The mean square error (MSE) and absolute bias are used to compare the performance of Weibull regression distribution. For the simulation comparison, the study used various sample sizes and several specific values of the Weibull shape parameter.

Keywords: weibull regression distribution, maximum likelihood estimator, survival function, hazard rate, right censoring

Procedia PDF Downloads 407
18080 Radio Frequency Identification Encryption via Modified Two Dimensional Logistic Map

Authors: Hongmin Deng, Qionghua Wang

Abstract:

A modified two dimensional (2D) logistic map based on cross feedback control is proposed. This 2D map exhibits more random chaotic dynamical properties than the classic one dimensional (1D) logistic map in the statistical characteristics analysis. So it is utilized as the pseudo-random (PN) sequence generator, where the obtained real-valued PN sequence is quantized at first, then applied to radio frequency identification (RFID) communication system in this paper. This system is experimentally validated on a cortex-M0 development board, which shows the effectiveness in key generation, the size of key space and security. At last, further cryptanalysis is studied through the test suite in the National Institute of Standards and Technology (NIST).

Keywords: chaos encryption, logistic map, pseudo-random sequence, RFID

Procedia PDF Downloads 369