Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 18335

Search results for: regression model

17675 Unsupervised Learning and Similarity Comparison of Water Mass Characteristics with Gaussian Mixture Model for Visualizing Ocean Data

Abstract:

The temperature-salinity relationship is one of the most important characteristics used for identifying water masses in marine research. Temperature-salinity characteristics, however, may change dynamically with respect to the geographic location and is quite sensitive to the depth at the same location. When depth is taken into consideration, however, it is not easy to compare the characteristics of different water masses efficiently for a wide range of areas of the ocean. In this paper, the Gaussian mixture model was proposed to analyze the temperature-salinity-depth characteristics of water masses, based on which comparison between water masses may be conducted. Gaussian mixture model could model the distribution of a random vector and is formulated as the weighting sum for a set of multivariate normal distributions. The temperature-salinity-depth data for different locations are first used to train a set of Gaussian mixture models individually. The distance between two Gaussian mixture models can then be defined as the weighting sum of pairwise Bhattacharyya distances among the Gaussian distributions. Consequently, the distance between two water masses may be measured fast, which allows the automatic and efficient comparison of the water masses for a wide range area. The proposed approach not only can approximate the distribution of temperature, salinity, and depth directly without the prior knowledge for assuming the regression family, but may restrict the complexity by controlling the number of mixtures when the amounts of samples are unevenly distributed. In addition, it is critical for knowledge discovery in marine research to represent, manage and share the temperature-salinity-depth characteristics flexibly and responsively. The proposed approach has been applied to a real-time visualization system of ocean data, which may facilitate the comparison of water masses by aggregating the data without degrading the discriminating capabilities. This system provides an interface for querying geographic locations with similar temperature-salinity-depth characteristics interactively and for tracking specific patterns of water masses, such as the Kuroshio near Taiwan or those in the South China Sea.

Keywords: water mass, Gaussian mixture model, data visualization, system framework

Procedia PDF Downloads 125

17674 Hospital Malnutrition and its Impact on 30-day Mortality in Hospitalized General Medicine Patients in a Tertiary Hospital in South India

Authors: Vineet Agrawal, Deepanjali S., Medha R., Subitha L.

Abstract:

Background. Hospital malnutrition is a highly prevalent issue and is known to increase the morbidity, mortality, length of hospital stay, and cost of care. In India, studies on hospital malnutrition have been restricted to ICU, post-surgical, and cancer patients. We designed this study to assess the impact of hospital malnutrition on 30-day post-discharge and in-hospital mortality in patients admitted in the general medicine department, irrespective of diagnosis. Methodology. All patients aged above 18 years admitted in the medicine wards, excluding medico-legal cases, were enrolled in the study. Nutritional assessment was done within 72 h of admission, using Subjective Global Assessment (SGA), which classifies patients into three categories: Severely malnourished, Mildly/moderately malnourished, and Normal/well-nourished. Anthropometric measurements like Body Mass Index (BMI), Triceps skin-fold thickness (TSF), and Mid-upper arm circumference (MUAC) were also performed. Patients were followed-up during hospital stay and 30 days after discharge through telephonic interview, and their final diagnosis, comorbidities, and cause of death were noted. Multivariate logistic regression and cox regression model were used to determine if the nutritional status at admission independently impacted mortality at one month. Results. The prevalence of malnourishment by SGA in our study was 67.3% among 395 hospitalized patients, of which 155 patients (39.2%) were moderately malnourished, and 111 (28.1%) were severely malnourished. Of 395 patients, 61 patients (15.4%) expired, of which 30 died in the hospital, and 31 died within 1 month of discharge from hospital. On univariate analysis, malnourished patients had significantly higher morality (24.3% in 111 Cat C patients) than well-nourished patients (10.1% in 129 Cat A patients), with OR 9.17, p-value 0.007. On multivariate logistic regression, age and higher Charlson Comorbidity Index (CCI) were independently associated with mortality. Higher CCI indicates higher burden of comorbidities on admission, and the CCI in the expired patient group (mean=4.38) was significantly higher than that of the alive cohort (mean=2.85). Though malnutrition significantly contributed to higher mortality on univariate analysis, it was not an independent predictor of outcome on multivariate logistic regression. Length of hospitalisation was also longer in the malnourished group (mean= 9.4 d) compared to the well-nourished group (mean= 8.03 d) with a trend towards significance (p=0.061). None of the anthropometric measurements like BMI, MUAC, or TSF showed any association with mortality or length of hospitalisation. Inference. The results of our study highlight the issue of hospital malnutrition in medicine wards and reiterate that malnutrition contributes significantly to patient outcomes. We found that SGA performs better than anthropometric measurements in assessing under-nutrition. We are of the opinion that the heterogeneity of the study population by diagnosis was probably the primary reason why malnutrition by SGA was not found to be an independent risk factor for mortality. Strategies to identify high-risk patients at admission and treat malnutrition in the hospital and post-discharge are needed.

Keywords: hospitalization outcome, length of hospital stay, mortality, malnutrition, subjective global assessment (SGA)

Procedia PDF Downloads 133

17673 Neighborhood Linking Social Capital as a Predictor of Drug Abuse: A Swedish National Cohort Study

Authors: X. Li, J. Sundquist, C. Sjöstedt, M. Winkleby, K. S. Kendler, K. Sundquist

Abstract:

Aims: This study examines the association between the incidence of drug abuse (DA) and linking (communal) social capital, a theoretical concept describing the amount of trust between individuals and societal institutions. Methods: We present results from an 8-year population-based cohort study that followed all residents in Sweden, aged 15-44, from 2003 through 2010, for a total of 1,700,896 men and 1,642,798 women. Social capital was conceptualized as the proportion of people in a geographically defined neighborhood who voted in local government elections. Multilevel logistic regression was used to estimate odds ratios (ORs) and between-neighborhood variance. Results: We found robust associations between linking social capital (scored as a three level variable) and DA in men and women. For men, the OR for DA in the crude model was 2.11 [95% confidence interval (CI) 2.02-2.21] for those living in areas with the lowest vs. highest level of social capital. After accounting for neighborhood-level deprivation, the OR fell to 1.59 (1.51-1-68), indicating that neighborhood deprivation lies in the pathway between linking social capital and DA. The ORs remained significant after accounting for age, sex, family income, marital status, country of birth, education level, and region of residence, and after further accounting for comorbidities and family history of comorbidities and family history of DA. For women, the OR decreased from 2.15 (2.03-2.27) in the crude model to 1.31 (1.22-1.40) in the final model, adjusted for multiple neighborhood-level and individual-level variables. Conclusions: Our study suggests that low linking social capital may have important independent effects on DA.

Keywords: drug abuse, social linking capital, environment, family

Procedia PDF Downloads 457

17672 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 121

17671 Using Predictive Analytics to Identify First-Year Engineering Students at Risk of Failing

Authors: Beng Yew Low, Cher Liang Cha, Cheng Yong Teoh

Abstract:

Due to a lack of continual assessment or grade related data, identifying first-year engineering students in a polytechnic education at risk of failing is challenging. Our experience over the years tells us that there is no strong correlation between having good entry grades in Mathematics and the Sciences and excelling in hardcore engineering subjects. Hence, identifying students at risk of failure cannot be on the basis of entry grades in Mathematics and the Sciences alone. These factors compound the difficulty of early identification and intervention. This paper describes the development of a predictive analytics model in the early detection of students at risk of failing and evaluates its effectiveness. Data from continual assessments conducted in term one, supplemented by data of student psychological profiles such as interests and study habits, were used. Three classification techniques, namely Logistic Regression, K Nearest Neighbour, and Random Forest, were used in our predictive model. Based on our findings, Random Forest was determined to be the strongest predictor with an Area Under the Curve (AUC) value of 0.994. Correspondingly, the Accuracy, Precision, Recall, and F-Score were also highest among these three classifiers. Using this Random Forest Classification technique, students at risk of failure could be identified at the end of term one. They could then be assigned to a Learning Support Programme at the beginning of term two. This paper gathers the results of our findings. It also proposes further improvements that can be made to the model.

Keywords: continual assessment, predictive analytics, random forest, student psychological profile

Procedia PDF Downloads 114

17670 Machine Learning Models for the Prediction of Heating and Cooling Loads of a Residential Building

Authors: Aaditya U. Jhamb

Abstract:

Due to the current energy crisis that many countries are battling, energy-efficient buildings are the subject of extensive research in the modern technological era because of growing worries about energy consumption and its effects on the environment. The paper explores 8 factors that help determine energy efficiency for a building: (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution), with Tsanas and Xifara providing a dataset. The data set employed 768 different residential building models to anticipate heating and cooling loads with a low mean squared error. By optimizing these characteristics, machine learning algorithms may assess and properly forecast a building's heating and cooling loads, lowering energy usage while increasing the quality of people's lives. As a result, the paper studied the magnitude of the correlation between these input factors and the two output variables using various statistical methods of analysis after determining which input variable was most closely associated with the output loads. The most conclusive model was the Decision Tree Regressor, which had a mean squared error of 0.258, whilst the least definitive model was the Isotonic Regressor, which had a mean squared error of 21.68. This paper also investigated the KNN Regressor and the Linear Regression, which had to mean squared errors of 3.349 and 18.141, respectively. In conclusion, the model, given the 8 input variables, was able to predict the heating and cooling loads of a residential building accurately and precisely.

Keywords: energy efficient buildings, heating load, cooling load, machine learning models

Procedia PDF Downloads 78

17669 Modeling Spatio-Temporal Variation in Rainfall Using a Hierarchical Bayesian Regression Model

Authors: Sabyasachi Mukhopadhyay, Joseph Ogutu, Gundula Bartzke, Hans-Peter Piepho

Abstract:

Rainfall is a critical component of climate governing vegetation growth and production, forage availability and quality for herbivores. However, reliable rainfall measurements are not always available, making it necessary to predict rainfall values for particular locations through time. Predicting rainfall in space and time can be a complex and challenging task, especially where the rain gauge network is sparse and measurements are not recorded consistently for all rain gauges, leading to many missing values. Here, we develop a flexible Bayesian model for predicting rainfall in space and time and apply it to Narok County, situated in southwestern Kenya, using data collected at 23 rain gauges from 1965 to 2015. Narok County encompasses the Maasai Mara ecosystem, the northern-most section of the Mara-Serengeti ecosystem, famous for its diverse and abundant large mammal populations and spectacular migration of enormous herds of wildebeest, zebra and Thomson's gazelle. The model incorporates geographical and meteorological predictor variables, including elevation, distance to Lake Victoria and minimum temperature. We assess the efficiency of the model by comparing it empirically with the established Gaussian process, Kriging, simple linear and Bayesian linear models. We use the model to predict total monthly rainfall and its standard error for all 5 * 5 km grid cells in Narok County. Using the Monte Carlo integration method, we estimate seasonal and annual rainfall and their standard errors for 29 sub-regions in Narok. Finally, we use the predicted rainfall to predict large herbivore biomass in the Maasai Mara ecosystem on a 5 * 5 km grid for both the wet and dry seasons. We show that herbivore biomass increases with rainfall in both seasons. The model can handle data from a sparse network of observations with many missing values and performs at least as well as or better than four established and widely used models, on the Narok data set. The model produces rainfall predictions consistent with expectation and in good agreement with the blended station and satellite rainfall values. The predictions are precise enough for most practical purposes. The model is very general and applicable to other variables besides rainfall.

Keywords: non-stationary covariance function, gaussian process, ungulate biomass, MCMC, maasai mara ecosystem

Procedia PDF Downloads 277

17668 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 65

17667 A New Mathematical Model of Human Olfaction

Authors: H. Namazi, H. T. N. Kuan

Abstract:

It is known that in humans, the adaptation to a given odor occurs within a quite short span of time (typically one minute) after the odor is presented to the brain. Different models of human olfaction have been developed by scientists but none of these models consider the diffusion phenomenon in olfaction. A novel microscopic model of the human olfaction is presented in this paper. We develop this model by incorporating the transient diffusivity. In fact, the mathematical model is written based on diffusion of the odorant within the mucus layer. By the use of the model developed in this paper, it becomes possible to provide quantification of the objective strength of odor.

Keywords: diffusion, microscopic model, mucus layer, olfaction

Procedia PDF Downloads 490

17666 Investigations in Machining of Hot Work Tool Steel with Mixed Ceramic Tool

Authors: B. Varaprasad, C. Srinivasa Rao

Abstract:

Hard turning has been explored as an alternative to the conventional one used for manufacture of Parts using tool steels. In the present study, the effects of cutting speed, feed rate and Depth of Cut (DOC) on cutting forces, specific cutting force, power and surface roughness in the hard turning are experimentally investigated. Experiments are carried out using mixed ceramic(Al2O3+TiC) cutting tool of corner radius 0.8mm, in turning operations on AISI H13 tool steel, heat treated to a hardness of 62 HRC. Based on Design of Experiments (DOE), a total of 20 tests are carried out. The range of each one of the three parameters is set at three different levels, viz, low, medium and high. The validity of the model is checked by Analysis of variance (ANOVA). Predicted models are derived from regression analysis. Comparison of experimental and predicted values of specific cutting force, power and surface roughness shows that good agreement has been achieved between them. Therefore, the developed model may be recommended to be used for predicting specific cutting force, power and surface roughness in hard turning of tool steel that is AISI H13 steel.

Keywords: hard turning, specific cutting force, power, surface roughness, AISI H13, mixed ceramic

Procedia PDF Downloads 684

17665 An Output Oriented Super-Efficiency Model for Considering Time Lag Effect

Authors: Yanshuang Zhang, Byungho Jeong

Abstract:

There exists some time lag between the consumption of inputs and the production of outputs. This time lag effect should be considered in calculating efficiency of decision making units (DMU). Recently, a couple of DEA models were developed for considering time lag effect in efficiency evaluation of research activities. However, these models can’t discriminate efficient DMUs because of the nature of basic DEA model in which efficiency scores are limited to ‘1’. This problem can be resolved a super-efficiency model. However, a super efficiency model sometimes causes infeasibility problem. This paper suggests an output oriented super-efficiency model for efficiency evaluation under the consideration of time lag effect. A case example using a long term research project is given to compare the suggested model with the MpO model

Keywords: DEA, Super-efficiency, Time Lag, research activities

Procedia PDF Downloads 638

17664 Estimation of Foliar Nitrogen in Selected Vegetation Communities of Uttrakhand Himalayas Using Hyperspectral Satellite Remote Sensing

Authors: Yogita Mishra, Arijit Roy, Dhruval Bhavsar

Abstract:

The study estimates the nitrogen concentration in selected vegetation community’s i.e. chir pine (pinusroxburghii) by using hyperspectral satellite data and also identified the appropriate spectral bands and nitrogen indices. The Short Wave InfraRed reflectance spectrum at 1790 nm and 1680 nm shows the maximum possible absorption by nitrogen in selected species. Among the nitrogen indices, log normalized nitrogen index performed positively and negatively too. The strong positive correlation is taken out from 1510 nm and 760 nm for the pinusroxburghii for leaf nitrogen concentration and leaf nitrogen mass while using NDNI. The regression value of R² developed by using linear equation achieved maximum at 0.7525 for the analysis of satellite image data and R² is maximum at 0.547 for ground truth data for pinusroxburghii respectively.

Keywords: hyperspectral, NDNI, nitrogen concentration, regression value

Procedia PDF Downloads 277

17663 CNN-Based Compressor Mass Flow Estimator in Industrial Aircraft Vapor Cycle System

Authors: Justin Reverdi, Sixin Zhang, Saïd Aoues, Fabrice Gamboa, Serge Gratton, Thomas Pellegrini

Abstract:

In vapor cycle systems, the mass flow sensor plays a key role for different monitoring and control purposes. However, physical sensors can be inaccurate, heavy, cumbersome, expensive, or highly sensitive to vibrations, which is especially problematic when embedded into an aircraft. The conception of a virtual sensor, based on other standard sensors, is a good alternative. This paper has two main objectives. Firstly, a data-driven model using a convolutional neural network is proposed to estimate the mass flow of the compressor. We show that it significantly outperforms the standard polynomial regression model (thermodynamic maps) in terms of the standard MSE metric and engineer performance metrics. Secondly, a semi-automatic segmentation method is proposed to compute the engineer performance metrics for real datasets, as the standard MSE metric may pose risks in analyzing the dynamic behavior of vapor cycle systems.

Keywords: deep learning, convolutional neural network, vapor cycle system, virtual sensor

Procedia PDF Downloads 39

17662 Validation of the Formal Model of Web Services Applications for Digital Reference Service of Library Information System

Authors: Zainab Magaji Musa, Nordin M. A. Rahman, Julaily Aida Jusoh

Abstract:

The web services applications for digital reference service (WSDRS) of LIS model is an informal model that claims to reduce the problems of digital reference services in libraries. It uses web services technology to provide efficient way of satisfying users’ needs in the reference section of libraries. The formal WSDRS model consists of the Z specifications of all the informal specifications of the model. This paper discusses the formal validation of the Z specifications of WSDRS model. The authors formally verify and thus validate the properties of the model using Z/EVES theorem prover.

Keywords: validation, verification, formal, theorem prover

Procedia PDF Downloads 495

17661 Forecasting the Sea Level Change in Strait of Hormuz

Authors: Hamid Goharnejad, Amir Hossein Eghbali

Abstract:

Recent investigations have demonstrated the global sea level rise due to climate change impacts. In this study climate changes study the effects of increasing water level in the strait of Hormuz. The probable changes of sea level rise should be investigated to employ the adaption strategies. The climatic output data of a GCM (General Circulation Model) named CGCM3 under climate change scenario of A1b and A2 were used. Among different variables simulated by this model, those of maximum correlation with sea level changes in the study region and least redundancy among themselves were selected for sea level rise prediction by using stepwise regression. One models of Discrete Wavelet artificial Neural Network (DWNN) was developed to explore the relationship between climatic variables and sea level changes. In these models, wavelet was used to disaggregate the time series of input and output data into different components and then ANN was used to relate the disaggregated components of predictors and predictands to each other. The results showed in the Shahid Rajae Station for scenario A1B sea level rise is among 64 to 75 cm and for the A2 Scenario sea level rise is among 90 to 105 cm. Furthermore the result showed a significant increase of sea level at the study region under climate change impacts, which should be incorporated in coastal areas management.

Keywords: climate change scenarios, sea-level rise, strait of Hormuz, forecasting

Procedia PDF Downloads 249

17660 Linear MIMO Model Identification Using an Extended Kalman Filter

Authors: Matthew C. Best

Abstract:

Linear Multi-Input Multi-Output (MIMO) dynamic models can be identified, with no a priori knowledge of model structure or order, using a new Generalised Identifying Filter (GIF). Based on an Extended Kalman Filter, the new filter identifies the model iteratively, in a continuous modal canonical form, using only input and output time histories. The filter’s self-propagating state error covariance matrix allows easy determination of convergence and conditioning, and by progressively increasing model order, the best fitting reduced-order model can be identified. The method is shown to be resistant to noise and can easily be extended to identification of smoothly nonlinear systems.

Keywords: system identification, Kalman filter, linear model, MIMO, model order reduction

Procedia PDF Downloads 576

17659 Response Surface Methodology for Optimum Hardness of TiN on Steel Substrate

Authors: R. Joseph Raviselvan, K. Ramanathan, P. Perumal, M. R. Thansekhar

Abstract:

Hard coatings are widely used in cutting and forming tool industries. Titanium Nitride (TiN) possesses good hardness, strength and corrosion resistant. The coating properties are influenced by many process parameters. The coatings were deposited on steel substrate by changing the process parameters such as substrate temperature, nitrogen flow rate and target power in a D.C planer magnetron sputtering. The structure of coatings were analysed using XRD. The hardness of coatings was found using Micro hardness tester. From the experimental data, a regression model was developed and the optimum response was determined using Response Surface Methodology (RSM).

Keywords: hardness, RSM, sputtering, TiN XRD

Procedia PDF Downloads 299

17658 Income Inequality among Selected Entrepreneurs in Ondo State, Nigeria

Authors: O.O. Ehinmowo, A.I. Fatuase, D.F. Oke

Abstract:

Nigeria is endowed with resources that could boost the economy as well as generate income and provide jobs to the teaming populace. One of the keys of attaining this is by making the environment conducive for the entrepreneurs to excel in their respective enterprises so that more income could be accrued to the entrepreneurs. This study therefore examines income inequality among selected entrepreneurs in Ondo State, Nigeria using primary data. A multistage sampling technique was used to select 200 respondents for the study with the aid of structured questionnaire and personal interview. The data collected were subjected to descriptive statistics, Lorenz curve, Gini coefficient and Double - Log regression model. Results revealed that majority of the entrepreneurs (63%) were males and 90% were married with an average age of 44 years. About 40% of the respondents spent at most 12 years in school with 81% of the respondents had 4-6 members per household, while hair dressing (43.5%) and fashion designing (31.5%) were the most common enterprises among the sampled respondents. The findings also showed that majority of the entrepreneurs in hairdressing, fashion designing and laundry service earned below N200,000 per annum while the majority of those in restaurant and food vending earned between N400,000 – N600,000 followed by the entrepreneurs in pure water enterprise where majority earned N800,000 and above per annum. The result of the Gini coefficient (0.58) indicated that there was presence of inequality among the entrepreneurs which was also affirmed by the Lorenz curve. The Regression results showed that gender, household size and number of employees significantly affected the income of the entrepreneurs in the study area. Therefore, more female households should be encouraged into entrepreneurial businesses and government should give incentive cum conductive environment that could bridge the disparity in the income of the entrepreneurs in their various enterprises.

Keywords: entrepreneurs, Gini coefficient, income inequality, Lorenz curve

Procedia PDF Downloads 329

17657 Comparison of Different Machine Learning Algorithms for Solubility Prediction

Authors: Muhammet Baldan, Emel Timuçin

Abstract:

Molecular solubility prediction plays a crucial role in various fields, such as drug discovery, environmental science, and material science. In this study, we compare the performance of five machine learning algorithms—linear regression, support vector machines (SVM), random forests, gradient boosting machines (GBM), and neural networks—for predicting molecular solubility using the AqSolDB dataset. The dataset consists of 9981 data points with their corresponding solubility values. MACCS keys (166 bits), RDKit properties (20 properties), and structural properties(3) features are extracted for every smile representation in the dataset. A total of 189 features were used for training and testing for every molecule. Each algorithm is trained on a subset of the dataset and evaluated using metrics accuracy scores. Additionally, computational time for training and testing is recorded to assess the efficiency of each algorithm. Our results demonstrate that random forest model outperformed other algorithms in terms of predictive accuracy, achieving an 0.93 accuracy score. Gradient boosting machines and neural networks also exhibit strong performance, closely followed by support vector machines. Linear regression, while simpler in nature, demonstrates competitive performance but with slightly higher errors compared to ensemble methods. Overall, this study provides valuable insights into the performance of machine learning algorithms for molecular solubility prediction, highlighting the importance of algorithm selection in achieving accurate and efficient predictions in practical applications.

Keywords: random forest, machine learning, comparison, feature extraction

Procedia PDF Downloads 20

17656 Towards Efficient Reasoning about Families of Class Diagrams Using Union Models

Authors: Tejush Badal, Sanaa Alwidian

Abstract:

Class diagrams are useful tools within the Unified Modelling Language (UML) to model and visualize the relationships between, and properties of objects within a system. As a system evolves over time and space (e.g., products), a series of models with several commonalities and variabilities create what is known as a model family. In circumstances where there are several versions of a model, examining each model individually, becomes expensive in terms of computation resources. To avoid performing redundant operations, this paper proposes an approach for representing a family of class diagrams into Union Models to represent model families using a single generic model. The paper aims to analyze and reason about a family of class diagrams using union models as opposed to individual analysis of each member model in the family. The union algorithm provides a holistic view of the model family, where the latter cannot be otherwise obtained from an individual analysis approach, this in turn, enhances the analysis performed in terms of speeding up the time needed to analyze a family of models together as opposed to analyzing individual models, one model at a time.

Keywords: analysis, class diagram, model family, unified modeling language, union model

Procedia PDF Downloads 56

17655 Infestation in Omani Date Palm Orchards by Dubas Bug Is Related to Tree Density

Authors: Lalit Kumar, Rashid Al Shidi

Abstract:

Phoenix dactylifera (date palm) is a major crop in many middle-eastern countries, including Oman. The Dubas bug Ommatissus lybicus is the main pest that affects date palm crops. However not all plantations are infested. It is still uncertain why some plantations get infested while others are not. This research investigated whether tree density and the system of planting (random versus systematic) had any relationship with infestation and levels of infestation. Remote Sensing and Geographic Information Systems were used to determine the density of trees (number of trees per unit area) while infestation levels were determined by manual counting of insects on 40 leaflets from two fronds on each tree, with a total of 20-60 trees in each village. The infestation was recorded as the average number of insects per leaflet. For tree density estimation, WorldView-3 scenes, with eight bands and 2m spatial resolution, were used. The Local maxima method, which depends on locating of the pixel of highest brightness inside a certain exploration window, was used to identify the trees in the image and delineating individual trees. This information was then used to determine whether the plantation was random or systematic. The ordinary least square regression (OLS) was used to test the global correlation between tree density and infestation level and the Geographic Weight Regression (GWR) was used to find the local spatial relationship. The accuracy of detecting trees varied from 83–99% in agricultural lands with systematic planting patterns to 50–70% in natural forest areas. Results revealed that the density of the trees in most of the villages was higher than the recommended planting number (120–125 trees/hectare). For infestation correlations, the GWR model showed a good positive significant relationship between infestation and tree density in the spring season with R² = 0.60 and medium positive significant relationship in the autumn season, with R² = 0.30. In contrast, the OLS model results showed a weaker positive significant relationship in the spring season with R² = 0.02, p < 0.05 and insignificant relationship in the autumn season with R² = 0.01, p > 0.05. The results showed a positive correlation between infestation and tree density, which suggests the infestation severity increased as the density of date palm trees increased. The correlation result showed that the density alone was responsible for about 60% of the increase in the infestation. This information can be used by the relevant authorities to better control infestations as well as to manage their pesticide spraying programs.

Keywords: dubas bug, date palm, tree density, infestation levels

Procedia PDF Downloads 171

17654 Impact Assessment of Climate Change on Water Resources in the Kabul River Basin

Authors: Tayib Bromand, Keisuke Sato

Abstract:

This paper presents the introduction to current water balance and climate change assessment in the Kabul river basin. The historical and future impacts of climate change on different components of water resources and hydrology in the Kabul river basin. The eastern part of Afghanistan, the Kabul river basin was chosen due to rapid population growth and land degradation to quantify the potential influence of Gobal Climate Change on its hydrodynamic characteristics. Luck of observed meteorological data was the main limitation of present research, few existed precipitation stations in the plain area of Kabul basin selected to compare with TRMM precipitation records, the result has been evaluated satisfactory based on regression and normal ratio methods. So the TRMM daily precipitation and NCEP temperature data set applied in the SWAT model to evaluate water balance for 2008 to 2012. Middle of the twenty – first century (2064) selected as the target period to assess impacts of climate change on hydrology aspects in the Kabul river basin. For this purpose three emission scenarios, A2, A1B and B1 and four GCMs, such as MIROC 3.2 (Med), CGCM 3.1 (T47), GFDL-CM2.0 and CNRM-CM3 have been selected, to estimate the future initial conditions of the proposed model. The outputs of the model compared and calibrated based on (R2) satisfactory. The assessed hydrodynamic characteristics and precipitation pattern. The results show that there will be significant impacts on precipitation patter such as decreasing of snowfall in the mountainous area of the basin in the Winter season due to increasing of 2.9°C mean annual temperature and land degradation due to deforestation.

Keywords: climate change, emission scenarios, hydrological components, Kabul river basin, SWAT model

Procedia PDF Downloads 445

17653 Application of a Confirmatory Composite Model for Assessing the Extent of Agricultural Digitalization: A Case of Proactive Land Acquisition Strategy (PLAS) Farmers in South Africa

Authors: Mazwane S., Makhura M. N., Ginege A.

Abstract:

Digitalization in South Africa has received considerable attention from policymakers. The support for the development of the digital economy by the South African government has been demonstrated through the enactment of various national policies and strategies. This study sought to develop an index for agricultural digitalization by applying composite confirmatory analysis (CCA). Another aim was to determine the factors that affect the development of digitalization in PLAS farms. Data on the indicators of the three dimensions of digitalization were collected from 300 Proactive Land Acquisition Strategy (PLAS) farms in South Africa using semi-structured questionnaires. Confirmatory composite analysis (CCA) was employed to reduce the items into three digitalization dimensions and ultimately to a digitalization index. Standardized digitalization index scores were extracted and fitted to a linear regression model to determine the factors affecting digitalization development. The results revealed that the model shows practical validity and can be used to measure digitalization development as measures of fit (geodesic distance, standardized root mean square residual, and squared Euclidean distance) were all below their respective 95%quantiles of bootstrap discrepancies (HI95 values). Therefore, digitalization is an emergent variable that can be measured using CCA. The average level of digitalization in PLAS farms was 0.2 and varied significantly across provinces. The factors that significantly influence digitalization development in PLAS land reform farms were age, gender, farm type, network type, and cellular data type. This should enable researchers and policymakers to understand the level of digitalization and patterns of development, as well as correctly attribute digitalization development to the contributing factors.

Keywords: agriculture, digitalization, confirmatory composite model, land reform, proactive land acquisition strategy, South Africa

Procedia PDF Downloads 38

17652 Electrical Load Estimation Using Estimated Fuzzy Linear Parameters

Authors: Bader Alkandari, Jamal Y. Madouh, Ahmad M. Alkandari, Anwar A. Alnaqi

Abstract:

A new formulation of fuzzy linear estimation problem is presented. It is formulated as a linear programming problem. The objective is to minimize the spread of the data points, taking into consideration the type of the membership function of the fuzzy parameters to satisfy the constraints on each measurement point and to insure that the original membership is included in the estimated membership. Different models are developed for a fuzzy triangular membership. The proposed models are applied to different examples from the area of fuzzy linear regression and finally to different examples for estimating the electrical load on a busbar. It had been found that the proposed technique is more suited for electrical load estimation, since the nature of the load is characterized by the uncertainty and vagueness.

Keywords: fuzzy regression, load estimation, fuzzy linear parameters, electrical load estimation

Procedia PDF Downloads 520

17651 Pre-Operative Psychological Factors Significantly Add to the Predictability of Chronic Narcotic Use: A Two Year Prospective Study

Authors: Dana El-Mughayyar, Neil Manson, Erin Bigney, Eden Richardson, Dean Tripp, Edward Abraham

Abstract:

Use of narcotics to treat pain has increased over the past two decades and is a contributing factor to the current public health crisis. Understanding the pre-operative risks of chronic narcotic use may be aided through investigation of psychological measures. The objective of the reported study is to determine predictors of narcotic use two years post-surgery in a thoracolumbar spine surgery population, including an array of psychological factors. A prospective observational study of 191 consecutively enrolled adult patients having undergone thoracolumbar spine surgery is presented. Baseline measures of interest included the Pain Catastrophizing Scale (PCS), Tampa Scale for Kinesiophobia, Multidimensional Scale for Perceived Social Support (MSPSS), Chronic Pain Acceptance Questionnaire (CPAQ-8), Oswestry Disability Index (ODI), Numeric Rating Scales for back and leg pain (NRS-B/L), SF-12’s Mental Component Summary (MCS), narcotic use and demographic variables. The post-operative measure of interest is narcotic use at 2-year follow-up. Narcotic use is collapsed into binary categories of use and no use. Descriptive statistics are run. Chi Square analysis is used for categorical variables and an ANOVA for continuous variables. Significant variables are built into a hierarchical logistic regression to determine predictors of post-operative narcotic use. Significance is set at α < 0.05. Results: A total of 27.23% of the sample were using narcotics two years after surgery. The regression model included ODI, NRS-Leg, time with condition, chief complaint, pre-operative drug use, gender, MCS, PCS subscale helplessness, and CPAQ subscale pain willingness and was significant χ² (13, N=191)= 54.99; p = .000. The model accounted for 39.6% of the variance in narcotic use and correctly predicted in 79.7% of cases. Psychological variables accounted for 9.6% of the variance over and above the other predictors. Conclusions: Managing chronic narcotic usage is central to the patient’s overall health and quality of life. Psychological factors in the preoperative period are significant predictors of narcotic use 2 years post-operatively. The psychological variables are malleable, potentially allowing surgeons to direct their patients to preventative resources prior to surgery.

Keywords: narcotics, psychological factors, quality of life, spine surgery

Procedia PDF Downloads 120

17650 Uncovering the Relationship between EFL Students' Self-Concept and Their Willingness to Communicate in Language Classes

Authors: Seyedeh Khadijeh Amirian, Seyed Mohammad Reza Amirian, Narges Hekmati

Abstract:

The current study aims at examining the relationship between English as a foreign language (EFL) students' self-concept and their willingness to communicate (WTC) in EFL classes. To this effect, two questionnaires, namely 'Willingness to Communicate' (MacIntyre et al., 2001) and 'Self-Concept Scale' (Liu and Wang, 2005), were distributed among 174 (45 males and 129 females) Iranian EFL university students. Correlation and regression analyses were conducted to examine the relationship between the two variables. The results indicated that there was a significantly positive correlation between EFL students' self-concept and their WTC in EFL classes (p < .0.05). Moreover, regression analyses indicated that self-concept has a significantly positive influence on students’ WTC in language classes (B= .302, p < .0.05) and explains .302 percent of the variance in the dependent variable (WTC). The results are discussed with regards to the individual differences in educational contexts, and implications are offered.

Keywords: EFL students, language classes, willingness to communicate, self-concept

Procedia PDF Downloads 106

17649 The Influence of Interest, Beliefs, and Identity with Mathematics on Achievement

Authors: Asma Alzahrani, Elizabeth Stojanovski

Abstract:

This study investigated factors that influence mathematics achievement based on a sample of ninth-grade students (N = 21,444) from the High School Longitudinal Study of 2009 (HSLS09). Key aspects studied included efficacy in mathematics, interest and enjoyment of mathematics, identity with mathematics and future utility beliefs and how these influence mathematics achievement. The predictability of mathematics achievement based on these factors was assessed using correlation coefficients and multiple linear regression. Spearman rank correlations and multiple regression analyses indicated positive and statistically significant relationships between the explanatory variables: mathematics efficacy, identity with mathematics, interest in and future utility beliefs with the response variable, achievement in mathematics.

Keywords: Mathematics achievement, math efficacy, mathematics interest, factors influence

Procedia PDF Downloads 130

17648 Metacognitive Processing in Early Readers: The Role of Metacognition in Monitoring Linguistic and Non-Linguistic Performance and Regulating Students' Learning

Authors: Ioanna Taouki, Marie Lallier, David Soto

Abstract:

Metacognition refers to the capacity to reflect upon our own cognitive processes. Although there is an ongoing discussion in the literature on the role of metacognition in learning and academic achievement, little is known about its neurodevelopmental trajectories in early childhood, when children begin to receive formal education in reading. Here, we evaluate the metacognitive ability, estimated under a recently developed Signal Detection Theory model, of a cohort of children aged between 6 and 7 (N=60), who performed three two-alternative-forced-choice tasks (two linguistic: lexical decision task, visual attention span task, and one non-linguistic: emotion recognition task) including trial-by-trial confidence judgements. Our study has three aims. First, we investigated how metacognitive ability (i.e., how confidence ratings track accuracy in the task) relates to performance in general standardized tasks related to students' reading and general cognitive abilities using Spearman's and Bayesian correlation analysis. Second, we assessed whether or not young children recruit common mechanisms supporting metacognition across the different task domains or whether there is evidence for domain-specific metacognition at this early stage of development. This was done by examining correlations in metacognitive measures across different task domains and evaluating cross-task covariance by applying a hierarchical Bayesian model. Third, using robust linear regression and Bayesian regression models, we assessed whether metacognitive ability in this early stage is related to the longitudinal learning of children in a linguistic and a non-linguistic task. Notably, we did not observe any association between students’ reading skills and metacognitive processing in this early stage of reading acquisition. Some evidence consistent with domain-general metacognition was found, with significant positive correlations between metacognitive efficiency between lexical and emotion recognition tasks and substantial covariance indicated by the Bayesian model. However, no reliable correlations were found between metacognitive performance in the visual attention span and the remaining tasks. Remarkably, metacognitive ability significantly predicted children's learning in linguistic and non-linguistic domains a year later. These results suggest that metacognitive skill may be dissociated to some extent from general (i.e., language and attention) abilities and further stress the importance of creating educational programs that foster students’ metacognitive ability as a tool for long term learning. More research is crucial to understand whether these programs can enhance metacognitive ability as a transferable skill across distinct domains or whether unique domains should be targeted separately.

Keywords: confidence ratings, development, metacognitive efficiency, reading acquisition

Procedia PDF Downloads 133

17647 Performance of the Cmip5 Models in Simulation of the Present and Future Precipitation over the Lake Victoria Basin

Authors: M. A. Wanzala, L. A. Ogallo, F. J. Opijah, J. N. Mutemi

Abstract:

The usefulness and limitations in climate information are due to uncertainty inherent in the climate system. For any given region to have sustainable development it is important to apply climate information into its socio-economic strategic plans. The overall objective of the study was to assess the performance of the Coupled Model Inter-comparison Project (CMIP5) over the Lake Victoria Basin. The datasets used included the observed point station data, gridded rainfall data from Climate Research Unit (CRU) and hindcast data from eight CMIP5. The methodology included trend analysis, spatial analysis, correlation analysis, Principal Component Analysis (PCA) regression analysis, and categorical statistical skill score. Analysis of the trends in the observed rainfall records indicated an increase in rainfall variability both in space and time for all the seasons. The spatial patterns of the individual models output from the models of MPI, MIROC, EC-EARTH and CNRM were closest to the observed rainfall patterns.

Keywords: categorical statistics, coupled model inter-comparison project, principal component analysis, statistical downscaling

Procedia PDF Downloads 352

17646 The Social Change Leadership Model for Administrators and Teachers Development in Northeast Thailand

Authors: D. Thawinkarn, S. Wongbutlee

Abstract:

The Social Change Leadership model is strongly aligned with administration’s mission. This research aims to examine the elements of social change leadership, build and develop leadership for social change, and evaluate effectiveness of leadership development model for social change. The research operation has 3 phases: model studies by in-depth interviews and survey research; drafting and creating model which verified by the experts; and trial of model in schools. The results showed that administrators and teachers have the elements of leadership for social change in moderate level. These elements are ranged descending from consciousness of self, common purpose, congruence, collaboration, commitment, citizenship, and controversy with civility. Model of leadership for social change is included the principles, objectives, content, process. Workshop process: Results show that the model of leadership development for social change in administrators and teachers leads to higher score in leadership evaluation prior to administering the operation.

Keywords: leadership, social change model, organization, administrators

Procedia PDF Downloads 400