Search results for: classification and regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5051

Search results for: classification and regression

4571 Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

Authors: Haonan Hu, Shuge Lei, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang

Abstract:

This paper focuses on the classification task of breast ultrasound images and conducts research on the reliability measurement of classification results. A dual-channel evaluation framework was developed based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationals based on the improved feature attribution algorithm SP-RISA are gracefully applied. Uncertainty quantification is used to evaluate the predictive reliability via the test time enhancement. The effectiveness of this reliability evaluation framework has been verified on the breast ultrasound clinical dataset YBUS, and its robustness is verified on the public dataset BUSI. The expected calibration errors on both datasets are significantly lower than traditional evaluation methods, which proves the effectiveness of the proposed reliability measurement.

Keywords: medical imaging, ultrasound imaging, XAI, uncertainty measurement, trustworthy AI

Procedia PDF Downloads 67
4570 Development of Generalized Correlation for Liquid Thermal Conductivity of N-Alkane and Olefin

Authors: A. Ishag Mohamed, A. A. Rabah

Abstract:

The objective of this research is to develop a generalized correlation for the prediction of thermal conductivity of n-Alkanes and Alkenes. There is a minority of research and lack of correlation for thermal conductivity of liquids in the open literature. The available experimental data are collected covering the groups of n-Alkanes and Alkenes.The data were assumed to correlate to temperature using Filippov correlation. Nonparametric regression of Grace Algorithm was used to develop the generalized correlation model. A spread sheet program based on Microsoft Excel was used to plot and calculate the value of the coefficients. The results obtained were compared with the data that found in Perry's Chemical Engineering Hand Book. The experimental data correlated to the temperature ranged "between" 273.15 to 673.15 K, with R2 = 0.99.The developed correlation reproduced experimental data that which were not included in regression with absolute average percent deviation (AAPD) of less than 7 %. Thus the spread sheet was quite accurate which produces reliable data.

Keywords: N-Alkanes, N-Alkenes, nonparametric, regression

Procedia PDF Downloads 636
4569 A Multi-Output Network with U-Net Enhanced Class Activation Map and Robust Classification Performance for Medical Imaging Analysis

Authors: Jaiden Xuan Schraut, Leon Liu, Yiqiao Yin

Abstract:

Computer vision in medical diagnosis has achieved a high level of success in diagnosing diseases with high accuracy. However, conventional classifiers that produce an image to-label result provides insufficient information for medical professionals to judge and raise concerns over the trust and reliability of a model with results that cannot be explained. In order to gain local insight into cancerous regions, separate tasks such as imaging segmentation need to be implemented to aid the doctors in treating patients, which doubles the training time and costs which renders the diagnosis system inefficient and difficult to be accepted by the public. To tackle this issue and drive AI-first medical solutions further, this paper proposes a multi-output network that follows a U-Net architecture for image segmentation output and features an additional convolutional neural networks (CNN) module for auxiliary classification output. Class activation maps are a method of providing insight into a convolutional neural network’s feature maps that leads to its classification but in the case of lung diseases, the region of interest is enhanced by U-net-assisted Class Activation Map (CAM) visualization. Therefore, our proposed model combines image segmentation models and classifiers to crop out only the lung region of a chest X-ray’s class activation map to provide a visualization that improves the explainability and is able to generate classification results simultaneously which builds trust for AI-led diagnosis systems. The proposed U-Net model achieves 97.61% accuracy and a dice coefficient of 0.97 on testing data from the COVID-QU-Ex Dataset which includes both diseased and healthy lungs.

Keywords: multi-output network model, U-net, class activation map, image classification, medical imaging analysis

Procedia PDF Downloads 174
4568 Urbanization in Delhi: A Multiparameter Study

Authors: Ishu Surender, M. Amez Khair, Ishan Singh

Abstract:

Urbanization is a multidimensional phenomenon. It is an indication of the long-term process for the shift of economics to industrial from rural. The significance of urbanization in modernization, socio-economic development, and poverty eradication is relevant in modern times. This paper aims to study the urbanization index model in the capital of India, Delhi using aspects such as demographic aspect, infrastructural development aspect, and economic development aspect. The urbanization index of all the nine districts of Delhi will be determined using multiple parameters such as population density and the availability of health and education facilities. The definition of the urban area varies from city to city and requires periodic classification which makes direct comparisons difficult. The urbanization index calculated in this paper can be employed to measure the urbanization of a district and compare the level of urbanization in different districts.

Keywords: multiparameter, population density, multiple regression, normalized urbanization index

Procedia PDF Downloads 91
4567 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 142
4566 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 129
4565 Classification of EEG Signals Based on Dynamic Connectivity Analysis

Authors: Zoran Šverko, Saša Vlahinić, Nino Stojković, Ivan Markovinović

Abstract:

In this article, the classification of target letters is performed using data from the EEG P300 Speller paradigm. Neural networks trained with the results of dynamic connectivity analysis between different brain regions are used for classification. Dynamic connectivity analysis is based on the adaptive window size and the imaginary part of the complex Pearson correlation coefficient. Brain dynamics are analysed using the relative intersection of confidence intervals for the imaginary component of the complex Pearson correlation coefficient method (RICI-imCPCC). The RICI-imCPCC method overcomes the shortcomings of currently used dynamical connectivity analysis methods, such as the low reliability and low temporal precision for short connectivity intervals encountered in constant sliding window analysis with wide window size and the high susceptibility to noise encountered in constant sliding window analysis with narrow window size. This method overcomes these shortcomings by dynamically adjusting the window size using the RICI rule. This method extracts information about brain connections for each time sample. Seventy percent of the extracted brain connectivity information is used for training and thirty percent for validation. Classification of the target word is also done and based on the same analysis method. As far as we know, through this research, we have shown for the first time that dynamic connectivity can be used as a parameter for classifying EEG signals.

Keywords: dynamic connectivity analysis, EEG, neural networks, Pearson correlation coefficients

Procedia PDF Downloads 186
4564 Accuracy Analysis of the American Society of Anesthesiologists Classification Using ChatGPT

Authors: Jae Ni Jang, Young Uk Kim

Abstract:

Background: Chat Generative Pre-training Transformer-3 (ChatGPT; San Francisco, California, Open Artificial Intelligence) is an artificial intelligence chatbot based on a large language model designed to generate human-like text. As the usage of ChatGPT is increasing among less knowledgeable patients, medical students, and anesthesia and pain medicine residents or trainees, we aimed to evaluate the accuracy of ChatGPT-3 responses to questions about the American Society of Anesthesiologists (ASA) classification based on patients’ underlying diseases and assess the quality of the generated responses. Methods: A total of 47 questions were submitted to ChatGPT using textual prompts. The questions were designed for ChatGPT-3 to provide answers regarding ASA classification in response to common underlying diseases frequently observed in adult patients. In addition, we created 18 questions regarding the ASA classification for pediatric patients and pregnant women. The accuracy of ChatGPT’s responses was evaluated by cross-referencing with Miller’s Anesthesia, Morgan & Mikhail’s Clinical Anesthesiology, and the American Society of Anesthesiologists’ ASA Physical Status Classification System (2020). Results: Out of the 47 questions pertaining to adults, ChatGPT -3 provided correct answers for only 23, resulting in an accuracy rate of 48.9%. Furthermore, the responses provided by ChatGPT-3 regarding children and pregnant women were mostly inaccurate, as indicated by a 28% accuracy rate (5 out of 18). Conclusions: ChatGPT provided correct responses to questions relevant to the daily clinical routine of anesthesiologists in approximately half of the cases, while the remaining responses contained errors. Therefore, caution is advised when using ChatGPT to retrieve anesthesia-related information. Although ChatGPT may not yet be suitable for clinical settings, we anticipate significant improvements in ChatGPT and other large language models in the near future. Regular assessments of ChatGPT's ASA classification accuracy are essential due to the evolving nature of ChatGPT as an artificial intelligence entity. This is especially important because ChatGPT has a clinically unacceptable rate of error and hallucination, particularly in pediatric patients and pregnant women. The methodology established in this study may be used to continue evaluating ChatGPT.

Keywords: American Society of Anesthesiologists, artificial intelligence, Chat Generative Pre-training Transformer-3, ChatGPT

Procedia PDF Downloads 19
4563 Response Surface Methodology for the Optimization of Paddy Husker by Medium Brown Rice Peeling Machine 6 Rubber Type

Authors: S. Bangphan, P. Bangphan, C. Ketsombun, T. Sammana

Abstract:

Optimization of response surface methodology (RSM) was employed to study the effects of three factor (rubber of clearance, spindle of speed, and rice of moisture) in brown rice peeling machine of the optimal good rice yield (99.67, average of three repeats). The optimized composition derived from RSM regression was analyzed using Regression analysis and Analysis of Variance (ANOVA). At a significant level α=0.05, the values of Regression coefficient, R2 adjust were 96.55% and standard deviation were 1.05056. The independent variables are initial rubber of clearance, spindle of speed and rice of moisture parameters namely. The investigating responses are final rubber clearance, spindle of speed and moisture of rice.

Keywords: brown rice, response surface methodology (RSM), peeling machine, optimization, paddy husker

Procedia PDF Downloads 551
4562 Automatic Identification and Classification of Contaminated Biodegradable Plastics using Machine Learning Algorithms and Hyperspectral Imaging Technology

Authors: Nutcha Taneepanichskul, Helen C. Hailes, Mark Miodownik

Abstract:

Plastic waste has emerged as a critical global environmental challenge, primarily driven by the prevalent use of conventional plastics derived from petrochemical refining and manufacturing processes in modern packaging. While these plastics serve vital functions, their persistence in the environment post-disposal poses significant threats to ecosystems. Addressing this issue necessitates approaches, one of which involves the development of biodegradable plastics designed to degrade under controlled conditions, such as industrial composting facilities. It is imperative to note that compostable plastics are engineered for degradation within specific environments and are not suited for uncontrolled settings, including natural landscapes and aquatic ecosystems. The full benefits of compostable packaging are realized when subjected to industrial composting, preventing environmental contamination and waste stream pollution. Therefore, effective sorting technologies are essential to enhance composting rates for these materials and diminish the risk of contaminating recycling streams. In this study, it leverage hyperspectral imaging technology (HSI) coupled with advanced machine learning algorithms to accurately identify various types of plastics, encompassing conventional variants like Polyethylene terephthalate (PET), Polypropylene (PP), Low density polyethylene (LDPE), High density polyethylene (HDPE) and biodegradable alternatives such as Polybutylene adipate terephthalate (PBAT), Polylactic acid (PLA), and Polyhydroxyalkanoates (PHA). The dataset is partitioned into three subsets: a training dataset comprising uncontaminated conventional and biodegradable plastics, a validation dataset encompassing contaminated plastics of both types, and a testing dataset featuring real-world packaging items in both pristine and contaminated states. Five distinct machine learning algorithms, namely Partial Least Squares Discriminant Analysis (PLS-DA), Support Vector Machine (SVM), Convolutional Neural Network (CNN), Logistic Regression, and Decision Tree Algorithm, were developed and evaluated for their classification performance. Remarkably, the Logistic Regression and CNN model exhibited the most promising outcomes, achieving a perfect accuracy rate of 100% for the training and validation datasets. Notably, the testing dataset yielded an accuracy exceeding 80%. The successful implementation of this sorting technology within recycling and composting facilities holds the potential to significantly elevate recycling and composting rates. As a result, the envisioned circular economy for plastics can be established, thereby offering a viable solution to mitigate plastic pollution.

Keywords: biodegradable plastics, sorting technology, hyperspectral imaging technology, machine learning algorithms

Procedia PDF Downloads 52
4561 The Impact on the Composition of Survey Refusals΄ Demographic Profile When Implementing Different Classifications

Authors: Eva Tsouparopoulou, Maria Symeonaki

Abstract:

The internationally documented declining survey response rates of the last two decades are mainly attributed to refusals. In fieldwork, a refusal may be obtained not only from the respondent himself/herself, but from other sources on the respondent’s behalf, such as other household members, apartment building residents or administrator(s), and neighborhood residents. In this paper, we investigate how the composition of the demographic profile of survey refusals changes when different classifications are implemented and the classification issues arising from that. The analysis is based on the 2002-2018 European Social Survey (ESS) datasets for Belgium, Germany, and United Kingdom. For these three countries, the size of selected sample units coded as a type of refusal for all nine under investigation rounds was large enough to meet the purposes of the analysis. The results indicate the existence of four different possible classifications that can be implemented and the significance of choosing the one that strengthens the contrasts of the different types of respondents' demographic profiles. Since the foundation of social quantitative research lies in the triptych of definition, classification, and measurement, this study aims to identify the multiplicity of the definition of survey refusals as a methodological tool for the continually growing research on non-response.

Keywords: non-response, refusals, European social survey, classification

Procedia PDF Downloads 65
4560 Disease Level Assessment in Wheat Plots Using a Residual Deep Learning Algorithm

Authors: Felipe A. Guth, Shane Ward, Kevin McDonnell

Abstract:

The assessment of disease levels in crop fields is an important and time-consuming task that generally relies on expert knowledge of trained individuals. Image classification in agriculture problems historically has been based on classical machine learning strategies that make use of hand-engineered features in the top of a classification algorithm. This approach tends to not produce results with high accuracy and generalization to the classes classified by the system when the nature of the elements has a significant variability. The advent of deep convolutional neural networks has revolutionized the field of machine learning, especially in computer vision tasks. These networks have great resourcefulness of learning and have been applied successfully to image classification and object detection tasks in the last years. The objective of this work was to propose a new method based on deep learning convolutional neural networks towards the task of disease level monitoring. Common RGB images of winter wheat were obtained during a growing season. Five categories of disease levels presence were produced, in collaboration with agronomists, for the algorithm classification. Disease level tasks performed by experts provided ground truth data for the disease score of the same winter wheat plots were RGB images were acquired. The system had an overall accuracy of 84% on the discrimination of the disease level classes.

Keywords: crop disease assessment, deep learning, precision agriculture, residual neural networks

Procedia PDF Downloads 303
4559 On the Performance of Improvised Generalized M-Estimator in the Presence of High Leverage Collinearity Enhancing Observations

Authors: Habshah Midi, Mohammed A. Mohammed, Sohel Rana

Abstract:

Multicollinearity occurs when two or more independent variables in a multiple linear regression model are highly correlated. The ridge regression is the commonly used method to rectify this problem. However, the ridge regression cannot handle the problem of multicollinearity which is caused by high leverage collinearity enhancing observation (HLCEO). Since high leverage points (HLPs) are responsible for inducing multicollinearity, the effect of HLPs needs to be reduced by using Generalized M estimator. The existing GM6 estimator is based on the Minimum Volume Ellipsoid (MVE) which tends to swamp some low leverage points. Hence an improvised GM (MGM) estimator is presented to improve the precision of the GM6 estimator. Numerical example and simulation study are presented to show how HLPs can cause multicollinearity. The numerical results show that our MGM estimator is the most efficient method compared to some existing methods.

Keywords: identification, high leverage points, multicollinearity, GM-estimator, DRGP, DFFITS

Procedia PDF Downloads 235
4558 Population Dynamics and Land Use/Land Cover Change on the Chilalo-Galama Mountain Range, Ethiopia

Authors: Yusuf Jundi Sado

Abstract:

Changes in land use are mostly credited to human actions that result in negative impacts on biodiversity and ecosystem functions. This study aims to analyze the dynamics of land use and land cover changes for sustainable natural resources planning and management. Chilalo-Galama Mountain Range, Ethiopia. This study used Thematic Mapper 05 (TM) for 1986, 2001 and Landsat 8 (OLI) data 2017. Additionally, data from the Central Statistics Agency on human population growth were analyzed. Semi-Automatic classification plugin (SCP) in QGIS 3.2.3 software was used for image classification. Global positioning system, field observations and focus group discussions were used for ground verification. Land Use Land Cover (LU/LC) change analysis was using maximum likelihood supervised classification and changes were calculated for the 1986–2001 and the 2001–2017 and 1986-2017 periods. The results show that agricultural land increased from 27.85% (1986) to 44.43% and 51.32% in 2001 and 2017, respectively with the overall accuracies of 92% (1986), 90.36% (2001), and 88% (2017). On the other hand, forests decreased from 8.51% (1986) to 7.64 (2001) and 4.46% (2017), and grassland decreased from 37.47% (1986) to 15.22%, and 15.01% in 2001 and 2017, respectively. It indicates for the years 1986–2017 the largest area cover gain of agricultural land was obtained from grassland. The matrix also shows that shrubland gained land from agricultural land, afro-alpine, and forest land. Population dynamics is found to be one of the major driving forces for the LU/LU changes in the study area.

Keywords: Landsat, LU/LC change, Semi-Automatic classification plugin, population dynamics, Ethiopia

Procedia PDF Downloads 60
4557 Clinical Feature Analysis and Prediction on Recurrence in Cervical Cancer

Authors: Ravinder Bahl, Jamini Sharma

Abstract:

The paper demonstrates analysis of the cervical cancer based on a probabilistic model. It involves technique for classification and prediction by recognizing typical and diagnostically most important test features relating to cervical cancer. The main contributions of the research include predicting the probability of recurrences in no recurrence (first time detection) cases. The combination of the conventional statistical and machine learning tools is applied for the analysis. Experimental study with real data demonstrates the feasibility and potential of the proposed approach for the said cause.

Keywords: cervical cancer, recurrence, no recurrence, probabilistic, classification, prediction, machine learning

Procedia PDF Downloads 342
4556 A Machine Learning Approach for Classification of Directional Valve Leakage in the Hydraulic Final Test

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Due to increasing cost pressure in global markets, artificial intelligence is becoming a technology that is decisive for competition. Predictive quality enables machinery and plant manufacturers to ensure product quality by using data-driven forecasts via machine learning models as a decision-making basis for test results. The use of cross-process Bosch production data along the value chain of hydraulic valves is a promising approach to classifying the quality characteristics of workpieces.

Keywords: predictive quality, hydraulics, machine learning, classification, supervised learning

Procedia PDF Downloads 209
4555 Time-Frequency Feature Extraction Method Based on Micro-Doppler Signature of Ground Moving Targets

Authors: Ke Ren, Huiruo Shi, Linsen Li, Baoshuai Wang, Yu Zhou

Abstract:

Since some discriminative features are required for ground moving targets classification, we propose a new feature extraction method based on micro-Doppler signature. Firstly, the time-frequency analysis of measured data indicates that the time-frequency spectrograms of the three kinds of ground moving targets, i.e., single walking person, two people walking and a moving wheeled vehicle, are discriminative. Then, a three-dimensional time-frequency feature vector is extracted from the time-frequency spectrograms to depict these differences. At last, a Support Vector Machine (SVM) classifier is trained with the proposed three-dimensional feature vector. The classification accuracy to categorize ground moving targets into the three kinds of the measured data is found to be over 96%, which demonstrates the good discriminative ability of the proposed micro-Doppler feature.

Keywords: micro-doppler, time-frequency analysis, feature extraction, radar target classification

Procedia PDF Downloads 384
4554 Neural Network Modelling for Turkey Railway Load Carrying Demand

Authors: Humeyra Bolakar Tosun

Abstract:

The transport sector has an undisputed place in human life. People need transport access to continuous increase day by day with growing population. The number of rail network, urban transport planning, infrastructure improvements, transportation management and other related areas is a key factor affecting our country made it quite necessary to improve the work of transportation. In this context, it plays an important role in domestic rail freight demand planning. Alternatives that the increase in the transportation field and has made it mandatory requirements such as the demand for improving transport quality. In this study generally is known and used in studies by the definition, rail freight transport, railway line length, population, energy consumption. In this study, Iron Road Load Net Demand was modeled by multiple regression and ANN methods. In this study, model dependent variable (Output) is Iron Road Load Net demand and 6 entries variable was determined. These outcome values extracted from the model using ANN and regression model results. In the regression model, some parameters are considered as determinative parameters, and the coefficients of the determinants give meaningful results. As a result, ANN model has been shown to be more successful than traditional regression model.

Keywords: railway load carrying, neural network, modelling transport, transportation

Procedia PDF Downloads 128
4553 Clustering the Wheat Seeds Using SOM Artificial Neural Networks

Authors: Salah Ghamari

Abstract:

In this study, the ability of self organizing map artificial (SOM) neural networks in clustering the wheat seeds varieties according to morphological properties of them was considered. The SOM is one type of unsupervised competitive learning. Experimentally, five morphological features of 300 seeds (including three varieties: gaskozhen, Md and sardari) were obtained using image processing technique. The results show that the artificial neural network has a good performance (90.33% accuracy) in classification of the wheat varieties despite of high similarity in them. The highest classification accuracy (100%) was achieved for sardari.

Keywords: artificial neural networks, clustering, self organizing map, wheat variety

Procedia PDF Downloads 621
4552 SEM Image Classification Using CNN Architectures

Authors: Güzi̇n Ti̇rkeş, Özge Teki̇n, Kerem Kurtuluş, Y. Yekta Yurtseven, Murat Baran

Abstract:

A scanning electron microscope (SEM) is a type of electron microscope mainly used in nanoscience and nanotechnology areas. Automatic image recognition and classification are among the general areas of application concerning SEM. In line with these usages, the present paper proposes a deep learning algorithm that classifies SEM images into nine categories by means of an online application to simplify the process. The NFFA-EUROPE - 100% SEM data set, containing approximately 21,000 images, was used to train and test the algorithm at 80% and 20%, respectively. Validation was carried out using a separate data set obtained from the Middle East Technical University (METU) in Turkey. To increase the accuracy in the results, the Inception ResNet-V2 model was used in view of the Fine-Tuning approach. By using a confusion matrix, it was observed that the coated-surface category has a negative effect on the accuracy of the results since it contains other categories in the data set, thereby confusing the model when detecting category-specific patterns. For this reason, the coated-surface category was removed from the train data set, hence increasing accuracy by up to 96.5%.

Keywords: convolutional neural networks, deep learning, image classification, scanning electron microscope

Procedia PDF Downloads 95
4551 In Silico Modeling of Drugs Milk/Plasma Ratio in Human Breast Milk Using Structures Descriptors

Authors: Navid Kaboudi, Ali Shayanfar

Abstract:

Introduction: Feeding infants with safe milk from the beginning of their life is an important issue. Drugs which are used by mothers can affect the composition of milk in a way that is not only unsuitable, but also toxic for infants. Consuming permeable drugs during that sensitive period by mother could lead to serious side effects to the infant. Due to the ethical restrictions of drug testing on humans, especially women, during their lactation period, computational approaches based on structural parameters could be useful. The aim of this study is to develop mechanistic models to predict the M/P ratio of drugs during breastfeeding period based on their structural descriptors. Methods: Two hundred and nine different chemicals with their M/P ratio were used in this study. All drugs were categorized into two groups based on their M/P value as Malone classification: 1: Drugs with M/P>1, which are considered as high risk 2: Drugs with M/P>1, which are considered as low risk Thirty eight chemical descriptors were calculated by ACD/labs 6.00 and Data warrior software in order to assess the penetration during breastfeeding period. Later on, four specific models based on the number of hydrogen bond acceptors, polar surface area, total surface area, and number of acidic oxygen were established for the prediction. The mentioned descriptors can predict the penetration with an acceptable accuracy. For the remaining compounds (N= 147, 158, 160, and 174 for models 1 to 4, respectively) of each model binary regression with SPSS 21 was done in order to give us a model to predict the penetration ratio of compounds. Only structural descriptors with p-value<0.1 remained in the final model. Results and discussion: Four different models based on the number of hydrogen bond acceptors, polar surface area, and total surface area were obtained in order to predict the penetration of drugs into human milk during breastfeeding period About 3-4% of milk consists of lipids, and the amount of lipid after parturition increases. Lipid soluble drugs diffuse alongside with fats from plasma to mammary glands. lipophilicity plays a vital role in predicting the penetration class of drugs during lactation period. It was shown in the logistic regression models that compounds with number of hydrogen bond acceptors, PSA and TSA above 5, 90 and 25 respectively, are less permeable to milk because they are less soluble in the amount of fats in milk. The pH of milk is acidic and due to that, basic compounds tend to be concentrated in milk than plasma while acidic compounds may consist lower concentrations in milk than plasma. Conclusion: In this study, we developed four regression-based models to predict the penetration class of drugs during the lactation period. The obtained models can lead to a higher speed in drug development process, saving energy, and costs. Milk/plasma ratio assessment of drugs requires multiple steps of animal testing, which has its own ethical issues. QSAR modeling could help scientist to reduce the amount of animal testing, and our models are also eligible to do that.

Keywords: logistic regression, breastfeeding, descriptors, penetration

Procedia PDF Downloads 46
4550 Mixed Integer Programming-Based One-Class Classification Method for Process Monitoring

Authors: Younghoon Kim, Seoung Bum Kim

Abstract:

One-class classification plays an important role in detecting outlier and abnormality from normal observations. In the previous research, several attempts were made to extend the scope of application of the one-class classification techniques to statistical process control problems. For most previous approaches, such as support vector data description (SVDD) control chart, the design of the control limits is commonly based on the assumption that the proportion of abnormal observations is approximately equal to an expected Type I error rate in Phase I process. Because of the limitation of the one-class classification techniques based on convex optimization, we cannot make the proportion of abnormal observations exactly equal to expected Type I error rate: controlling Type I error rate requires to optimize constraints with integer decision variables, but convex optimization cannot satisfy the requirement. This limitation would be undesirable in theoretical and practical perspective to construct effective control charts. In this work, to address the limitation of previous approaches, we propose the one-class classification algorithm based on the mixed integer programming technique, which can solve problems formulated with continuous and integer decision variables. The proposed method minimizes the radius of a spherically shaped boundary subject to the number of normal data to be equal to a constant value specified by users. By modifying this constant value, users can exactly control the proportion of normal data described by the spherically shaped boundary. Thus, the proportion of abnormal observations can be made theoretically equal to an expected Type I error rate in Phase I process. Moreover, analogous to SVDD, the boundary can be made to describe complex structures by using some kernel functions. New multivariate control chart applying the effectiveness of the algorithm is proposed. This chart uses a monitoring statistic to characterize the degree of being an abnormal point as obtained through the proposed one-class classification. The control limit of the proposed chart is established by the radius of the boundary. The usefulness of the proposed method was demonstrated through experiments with simulated and real process data from a thin film transistor-liquid crystal display.

Keywords: control chart, mixed integer programming, one-class classification, support vector data description

Procedia PDF Downloads 157
4549 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 364
4548 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach

Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf

Abstract:

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis

Procedia PDF Downloads 51
4547 A Meta Regression Analysis to Detect Price Premium Threshold for Eco-Labeled Seafood

Authors: Cristina Giosuè, Federica Biondo, Sergio Vitale

Abstract:

In the last years, the consumers' awareness for environmental concerns has been increasing, and seafood eco-labels are considered as a possible instrument to improve both seafood markets and sustainable fishing management. In this direction, the aim of this study was to carry out a meta-analysis on consumers’ willingness to pay (WTP) for eco-labeled wild seafood, by a meta-regression. Therefore, only papers published on ISI journals were searched on “Web of Knowledge” and “SciVerse Scopus” platforms, using the combinations of the following key words: seafood, ecolabel, eco-label, willingness, WTP and premium. The dataset was built considering: paper’s and survey’s codes, year of publication, first author’s nationality, species’ taxa and family, sample size, survey’s continent and country, data collection (where and how), gender and age of consumers, brand and ΔWTP. From analysis the interest on eco labeled seafood emerged clearly, in particular in developed countries. In general, consumers declared greater willingness to pay than that actually applied for eco-label products, with difference related to taxa and brand.

Keywords: eco label, meta regression, seafood, willingness to pay

Procedia PDF Downloads 101
4546 A Proposed Optimized and Efficient Intrusion Detection System for Wireless Sensor Network

Authors: Abdulaziz Alsadhan, Naveed Khan

Abstract:

In recent years intrusions on computer network are the major security threat. Hence, it is important to impede such intrusions. The hindrance of such intrusions entirely relies on its detection, which is primary concern of any security tool like Intrusion Detection System (IDS). Therefore, it is imperative to accurately detect network attack. Numerous intrusion detection techniques are available but the main issue is their performance. The performance of IDS can be improved by increasing the accurate detection rate and reducing false positive. The existing intrusion detection techniques have the limitation of usage of raw data set for classification. The classifier may get jumble due to redundancy, which results incorrect classification. To minimize this problem, Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Local Binary Pattern (LBP) can be applied to transform raw features into principle features space and select the features based on their sensitivity. Eigen values can be used to determine the sensitivity. To further classify, the selected features greedy search, back elimination, and Particle Swarm Optimization (PSO) can be used to obtain a subset of features with optimal sensitivity and highest discriminatory power. These optimal feature subset used to perform classification. For classification purpose, Support Vector Machine (SVM) and Multilayer Perceptron (MLP) used due to its proven ability in classification. The Knowledge Discovery and Data mining (KDD’99) cup dataset was considered as a benchmark for evaluating security detection mechanisms. The proposed approach can provide an optimal intrusion detection mechanism that outperforms the existing approaches and has the capability to minimize the number of features and maximize the detection rates.

Keywords: Particle Swarm Optimization (PSO), Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Local Binary Pattern (LBP), Support Vector Machine (SVM), Multilayer Perceptron (MLP)

Procedia PDF Downloads 345
4545 Classification of Germinatable Mung Bean by Near Infrared Hyperspectral Imaging

Authors: Kaewkarn Phuangsombat, Arthit Phuangsombat, Anupun Terdwongworakul

Abstract:

Hard seeds will not grow and can cause mold in sprouting process. Thus, the hard seeds need to be separated from the normal seeds. Near infrared hyperspectral imaging in a range of 900 to 1700 nm was implemented to develop a model by partial least squares discriminant analysis to discriminate the hard seeds from the normal seeds. The orientation of the seeds was also studied to compare the performance of the models. The model based on hilum-up orientation achieved the best result giving the coefficient of determination of 0.98, and root mean square error of prediction of 0.07 with classification accuracy was equal to 100%.

Keywords: mung bean, near infrared, germinatability, hard seed

Procedia PDF Downloads 279
4544 Performance Study of Classification Algorithms for Consumer Online Shopping Attitudes and Behavior Using Data Mining

Authors: Rana Alaa El-Deen Ahmed, M. Elemam Shehab, Shereen Morsy, Nermeen Mekawie

Abstract:

With the growing popularity and acceptance of e-commerce platforms, users face an ever increasing burden in actually choosing the right product from the large number of online offers. Thus, techniques for personalization and shopping guides are needed by users. For a pleasant and successful shopping experience, users need to know easily which products to buy with high confidence. Since selling a wide variety of products has become easier due to the popularity of online stores, online retailers are able to sell more products than a physical store. The disadvantage is that the customers might not find products they need. In this research the customer will be able to find the products he is searching for, because recommender systems are used in some ecommerce web sites. Recommender system learns from the information about customers and products and provides appropriate personalized recommendations to customers to find the needed product. In this paper eleven classification algorithms are comparatively tested to find the best classifier fit for consumer online shopping attitudes and behavior in the experimented dataset. The WEKA knowledge analysis tool, which is an open source data mining workbench software used in comparing conventional classifiers to get the best classifier was used in this research. In this research by using the data mining tool (WEKA) with the experimented classifiers the results show that decision table and filtered classifier gives the highest accuracy and the lowest accuracy classification via clustering and simple cart.

Keywords: classification, data mining, machine learning, online shopping, WEKA

Procedia PDF Downloads 333
4543 Comparative Study Using WEKA for Red Blood Cells Classification

Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-alaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.

Keywords: K-nearest neighbors algorithm, radial basis function neural network, red blood cells, support vector machine

Procedia PDF Downloads 389
4542 Application of Fuzzy Clustering on Classification Agile Supply Chain

Authors: Hamidreza Fallah Lajimi , Elham Karami, Fatemeh Ali nasab, Mostafa Mahdavikia

Abstract:

Being responsive is an increasingly important skill for firms in today’s global economy; thus firms must be agile. Naturally, it follows that an organization’s agility depends on its supply chain being agile. However, achieving supply chain agility is a function of other abilities within the organization. This paper analyses results from a survey of 71 Iran manufacturing companies in order to identify some of the factors for agile organizations in managing their supply chains. Then we classification this company in four cluster with fuzzy c-mean technique and with four validations functional determine automatically the optimal number of clusters.

Keywords: agile supply chain, clustering, fuzzy clustering

Procedia PDF Downloads 444