Search results for: microarray datasets
106 A Hybrid Approach for Quantification of Novelty in Rule Discovery
Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar
Abstract:
Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.
Keywords: Knowledge Discovery in Databases (KDD), Data Mining, Rule Discovery, Interestingness, Subjective Measures, Novelty Measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1354105 Medical Image Segmentation Using Deformable Model and Local Fitting Binary: Thoracic Aorta
Authors: B. Bagheri Nakhjavanlo, T. S. Ellis, P.Raoofi, Sh.ziari
Abstract:
This paper presents an application of level sets for the segmentation of abdominal and thoracic aortic aneurysms in CTA datasets. An important challenge in reliably detecting aortic is the need to overcome problems associated with intensity inhomogeneities. Level sets are part of an important class of methods that utilize partial differential equations (PDEs) and have been extensively applied in image segmentation. A kernel function in the level set formulation aids the suppression of noise in the extracted regions of interest and then guides the motion of the evolving contour for the detection of weak boundaries. The speed of curve evolution has been significantly improved with a resulting decrease in segmentation time compared with previous implementations of level sets, and are shown to be more effective than other approaches in coping with intensity inhomogeneities. We have applied the Courant Friedrichs Levy (CFL) condition as stability criterion for our algorithm.Keywords: Image segmentation, Level-sets, Local fitting binary, Thoracic aorta.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456104 3D Oil Reservoir Visualisation Using Octree Compression Techniques Utilising Logical Grid Co-Ordinates
Authors: S. Mulholland
Abstract:
Octree compression techniques have been used for several years for compressing large three dimensional data sets into homogeneous regions. This compression technique is ideally suited to datasets which have similar values in clusters. Oil engineers represent reservoirs as a three dimensional grid where hydrocarbons occur naturally in clusters. This research looks at the efficiency of storing these grids using octree compression techniques where grid cells are broken into active and inactive regions. Initial experiments yielded high compression ratios as only active leaf nodes and their ancestor, header nodes are stored as a bitstream to file on disk. Savings in computational time and memory were possible at decompression, as only active leaf nodes are sent to the graphics card eliminating the need of reconstructing the original matrix. This results in a more compact vertex table, which can be loaded into the graphics card quicker and generating shorter refresh delay times.
Keywords: 3D visualisation, compressed vertex tables, octree compression techniques, oil reservoir grids.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737103 GeNS: a Biological Data Integration Platform
Authors: Joel Arrais, João E. Pereira, João Fernandes, José Luís Oliveira
Abstract:
The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is available in several distinct public databases. Nevertheless, developing a centralized access to these distributed databases rises up a set of challenges such as: what is the best integration strategy, how to solve nomenclature clashes, how to solve database overlapping data and how to deal with huge datasets. In this paper we present GeNS, a system that uses a simple and yet innovative approach to address several biological data integration issues. Compared with existing systems, the main advantages of GeNS are related to its maintenance simplicity and to its coverage and scalability, in terms of number of supported databases and data types. To support our claims we present the current use of GeNS in two concrete applications. GeNS currently contains more than 140 million of biological relations and it can be publicly downloaded or remotely access through SOAP web services.Keywords: Data integration, biological databases
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1632102 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data
Authors: Benjamin D. Leiby, Darryl K. Ahner
Abstract:
This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions, while presenting a need for further refinement that mimics predictive mean matching.
Keywords: Correlation, country conflict, imputation, stochastic regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 418101 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases
Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha
Abstract:
Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.
Keywords: Feature fusion, image retrieval, membership function, normalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1345100 Medical Image Segmentation Using Deformable Models and Local Fitting Binary
Authors: B. Bagheri Nakhjavanlo, T. J. Ellis, P. Raoofi, J. Dehmeshki
Abstract:
This paper presents a customized deformable model for the segmentation of abdominal and thoracic aortic aneurysms in CTA datasets. An important challenge in reliably detecting aortic aneurysm is the need to overcome problems associated with intensity inhomogeneities and image noise. Level sets are part of an important class of methods that utilize partial differential equations (PDEs) and have been extensively applied in image segmentation. A Gaussian kernel function in the level set formulation, which extracts the local intensity information, aids the suppression of noise in the extracted regions of interest and then guides the motion of the evolving contour for the detection of weak boundaries. The speed of curve evolution has been significantly improved with a resulting decrease in segmentation time compared with previous implementations of level sets. The results indicate the method is more effective than other approaches in coping with intensity inhomogeneities.Keywords: Abdominal and thoracic aortic aneurysms, intensityinhomogeneity, level sets, local fitting binary.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 181699 PmSPARQL: Extended SPARQL for Multi-paradigm Path Extraction
Authors: Thabet Slimani, Boutheina Ben Yaghlane, Khaled Mellouli
Abstract:
In the last few years, the Semantic Web gained scientific acceptance as a means of relationships identification in knowledge base, widely known by semantic association. Query about complex relationships between entities is a strong requirement for many applications in analytical domains. In bioinformatics for example, it is critical to extract exchanges between proteins. Currently, the widely known result of such queries is to provide paths between connected entities from data graph. However, they do not always give good results while facing the user need by the best association or a set of limited best association, because they only consider all existing paths but ignore the path evaluation. In this paper, we present an approach for supporting association discovery queries. Our proposal includes (i) a query language PmSPRQL which provides a multiparadigm query expressions for association extraction and (ii) some quantification measures making easy the process of association ranking. The originality of our proposal is demonstrated by a performance evaluation of our approach on real world datasets.
Keywords: Association extraction, query Language, relationships, knowledge base, multi-paradigm query.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 144698 Detecting Remote Protein Evolutionary Relationships via String Scoring Method
Authors: Nazar Zaki, Safaai Deris
Abstract:
The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarity typically implies homology, which in turn may imply structural and functional similarities. In this work, we propose, a learning method for detecting remote protein homology. The proposed method uses a transformation that converts protein sequence into fixed-dimensional representative feature vectors. Each feature vector records the sensitivity of a protein sequence to a set of amino acids substrings generated from the protein sequences of interest. These features are then used in conjunction with support vector machines for the detection of the protein remote homology. The proposed method is tested and evaluated on two different benchmark protein datasets and it-s able to deliver improvements over most of the existing homology detection methods.
Keywords: Protein homology detection; support vectormachine; string kernel.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 139297 User Intention Generation with Large Language Models Using Chain-of-Thought Prompting
Authors: Gangmin Li, Fan Yang
Abstract:
Personalized recommendation is crucial for any recommendation system. One of the techniques for personalized recommendation is to identify the intention. Traditional user intention identification uses the user’s selection when facing multiple items. This modeling relies primarily on historical behavior data resulting in challenges such as the cold start, unintended choice, and failure to capture intention when items are new. Motivated by recent advancements in Large Language Models (LLMs) like ChatGPT, we present an approach for user intention identification by embracing LLMs with Chain-of-Thought (CoT) prompting. We use the initial user profile as input to LLMs and design a collection of prompts to align the LLM's response through various recommendation tasks encompassing rating prediction, search and browse history, user clarification, etc. Our tests on real-world datasets demonstrate the improvements in recommendation by explicit user intention identification and, with that intention, merged into a user model.
Keywords: Personalized recommendation, generative user modeling, user intention identification, large language models, chain-of-thought prompting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8796 Human Action Recognition Based on Ridgelet Transform and SVM
Authors: A. Ouanane, A. Serir
Abstract:
In this paper, a novel algorithm based on Ridgelet Transform and support vector machine is proposed for human action recognition. The Ridgelet transform is a directional multi-resolution transform and it is more suitable for describing the human action by performing its directional information to form spatial features vectors. The dynamic transition between the spatial features is carried out using both the Principal Component Analysis and clustering algorithm K-means. First, the Principal Component Analysis is used to reduce the dimensionality of the obtained vectors. Then, the kmeans algorithm is then used to perform the obtained vectors to form the spatio-temporal pattern, called set-of-labels, according to given periodicity of human action. Finally, a Support Machine classifier is used to discriminate between the different human actions. Different tests are conducted on popular Datasets, such as Weizmann and KTH. The obtained results show that the proposed method provides more significant accuracy rate and it drives more robustness in very challenging situations such as lighting changes, scaling and dynamic environmentKeywords: Human action, Ridgelet Transform, PCA, K-means, SVM.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 207095 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods
Authors: Cristina Vatamanu, Doina Cosovan, Dragoş Gavriluţ, Henri Luchian
Abstract:
In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.Keywords: Detection Rate, False Positives, Perceptron, One Side Class, Ensembles, Decision Tree, Hybrid methods, Feature Selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 328094 3D Point Cloud Model Color Adjustment by Combining Terrestrial Laser Scanner and Close Range Photogrammetry Datasets
Authors: M. Pepe, S. Ackermann, L. Fregonese, C. Achille
Abstract:
3D models obtained with advanced survey techniques such as close-range photogrammetry and laser scanner are nowadays particularly appreciated in Cultural Heritage and Archaeology fields. In order to produce high quality models representing archaeological evidences and anthropological artifacts, the appearance of the model (i.e. color) beyond the geometric accuracy, is not a negligible aspect. The integration of the close-range photogrammetry survey techniques with the laser scanner is still a topic of study and research. By combining point cloud data sets of the same object generated with both technologies, or with the same technology but registered in different moment and/or natural light condition, could construct a final point cloud with accentuated color dissimilarities. In this paper, a methodology to uniform the different data sets, to improve the chromatic quality and to highlight further details by balancing the point color will be presented.
Keywords: Color models, cultural heritage, laser scanner, photogrammetry, point cloud color.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 163193 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models
Authors: Chad Goldsworthy, B. Rajeswari Matam
Abstract:
The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.
Keywords: Convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 141992 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification
Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman
Abstract:
In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 269891 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms
Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang
Abstract:
Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.
Keywords: Bioassay, machine learning, preprocessing, virtual screen.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 98190 Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes
Authors: Ilona Mewes, Helena Jenzer, Farshideh Einsele
Abstract:
Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.
Keywords: Health informatics, data mining, nutritional and health databases, nutritional and chronical databases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172589 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data
Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch
Abstract:
It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 252388 Automatic Classification of Periodic Heart Sounds Using Convolutional Neural Network
Authors: Jia Xin Low, Keng Wah Choo
Abstract:
This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.Keywords: Convolutional neural network, discrete wavelet transform, deep learning, heart sound classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 114787 Improved Predictive Models for the IRMA Network Using Nonlinear Optimisation
Authors: Vishwesh Kulkarni, Nikhil Bellarykar
Abstract:
Cellular complexity stems from the interactions among thousands of different molecular species. Thanks to the emerging fields of systems and synthetic biology, scientists are beginning to unravel these regulatory, signaling, and metabolic interactions and to understand their coordinated action. Reverse engineering of biological networks has has several benefits but a poor quality of data combined with the difficulty in reproducing it limits the applicability of these methods. A few years back, many of the commonly used predictive algorithms were tested on a network constructed in the yeast Saccharomyces cerevisiae (S. cerevisiae) to resolve this issue. The network was a synthetic network of five genes regulating each other for the so-called in vivo reverse-engineering and modeling assessment (IRMA). The network was constructed in S. cereviase since it is a simple and well characterized organism. The synthetic network included a variety of regulatory interactions, thus capturing the behaviour of larger eukaryotic gene networks on a smaller scale. We derive a new set of algorithms by solving a nonlinear optimization problem and show how these algorithms outperform other algorithms on these datasets.Keywords: Synthetic gene network, network identification, nonlinear modeling, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 80086 Improved Feature Extraction Technique for Handling Occlusion in Automatic Facial Expression Recognition
Authors: Khadijat T. Bamigbade, Olufade F. W. Onifade
Abstract:
The field of automatic facial expression analysis has been an active research area in the last two decades. Its vast applicability in various domains has drawn so much attention into developing techniques and dataset that mirror real life scenarios. Many techniques such as Local Binary Patterns and its variants (CLBP, LBP-TOP) and lately, deep learning techniques, have been used for facial expression recognition. However, the problem of occlusion has not been sufficiently handled, making their results not applicable in real life situations. This paper develops a simple, yet highly efficient method tagged Local Binary Pattern-Histogram of Gradient (LBP-HOG) with occlusion detection in face image, using a multi-class SVM for Action Unit and in turn expression recognition. Our method was evaluated on three publicly available datasets which are JAFFE, CK, SFEW. Experimental results showed that our approach performed considerably well when compared with state-of-the-art algorithms and gave insight to occlusion detection as a key step to handling expression in wild.
Keywords: Automatic facial expression analysis, local binary pattern, LBP-HOG, occlusion detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 78285 Combining Fuzzy Logic and Neural Networks in Modeling Landfill Gas Production
Authors: Mohamed Abdallah, Mostafa Warith, Roberto Narbaitz, Emil Petriu, Kevin Kennedy
Abstract:
Heterogeneity of solid waste characteristics as well as the complex processes taking place within the landfill ecosystem motivated the implementation of soft computing methodologies such as artificial neural networks (ANN), fuzzy logic (FL), and their combination. The present work uses a hybrid ANN-FL model that employs knowledge-based FL to describe the process qualitatively and implements the learning algorithm of ANN to optimize model parameters. The model was developed to simulate and predict the landfill gas production at a given time based on operational parameters. The experimental data used were compiled from lab-scale experiment that involved various operating scenarios. The developed model was validated and statistically analyzed using F-test, linear regression between actual and predicted data, and mean squared error measures. Overall, the simulated landfill gas production rates demonstrated reasonable agreement with actual data. The discussion focused on the effect of the size of training datasets and number of training epochs.
Keywords: Adaptive neural fuzzy inference system (ANFIS), gas production, landfill
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 241584 A Genetic Algorithm Based Permutation and Non-Permutation Scheduling Heuristics for Finite Capacity Material Requirement Planning Problem
Authors: Watchara Songserm, Teeradej Wuttipornpun
Abstract:
This paper presents a genetic algorithm based permutation and non-permutation scheduling heuristics (GAPNP) to solve a multi-stage finite capacity material requirement planning (FCMRP) problem in automotive assembly flow shop with unrelated parallel machines. In the algorithm, the sequences of orders are iteratively improved by the GA characteristics, whereas the required operations are scheduled based on the presented permutation and non-permutation heuristics. Finally, a linear programming is applied to minimize the total cost. The presented GAPNP algorithm is evaluated by using real datasets from automotive companies. The required parameters for GAPNP are intently tuned to obtain a common parameter setting for all case studies. The results show that GAPNP significantly outperforms the benchmark algorithm about 30% on average.
Keywords: Finite capacity MRP, genetic algorithm, linear programming, flow shop, unrelated parallel machines, application in industries.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 110783 An Approach to Measure Snow Depth of Winter Accumulation at Basin Scale Using Satellite Data
Authors: M. Geetha Priya, D. Krishnaveni
Abstract:
Snow depth estimation and monitoring studies have been carried out for decades using empirical relationship or extrapolation of point measurements carried out in field. With the development of advanced satellite based remote sensing techniques, a modified approach is proposed in the present study to estimate the winter accumulated snow depth at basin scale. Assessment of snow depth by differencing Digital Elevation Model (DEM) generated at the beginning and end of winter season can be experimented for the region of interest (Himalayan and polar regions) accounting for winter accumulation (solid precipitation). The proposed approach is based on existing geodetic method that is being used for glacier mass balance estimation. Considering the satellite datasets purely acquired during beginning and end of winter season, it is possible to estimate the change in depth or thickness for the snow that is accumulated during the winter as it takes one year for the snow to get transformed into firn (snow that has survived one summer or one-year old snow).
Keywords: Digital elevation model, snow depth, geodetic method, snow cover.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 71682 Malaria Parasite Detection Using Deep Learning Methods
Authors: Kaustubh Chakradeo, Michael Delves, Sofya Titarenko
Abstract:
Malaria is a serious disease which affects hundreds of millions of people around the world, each year. If not treated in time, it can be fatal. Despite recent developments in malaria diagnostics, the microscopy method to detect malaria remains the most common. Unfortunately, the accuracy of microscopic diagnostics is dependent on the skill of the microscopist and limits the throughput of malaria diagnosis. With the development of Artificial Intelligence tools and Deep Learning techniques in particular, it is possible to lower the cost, while achieving an overall higher accuracy. In this paper, we present a VGG-based model and compare it with previously developed models for identifying infected cells. Our model surpasses most previously developed models in a range of the accuracy metrics. The model has an advantage of being constructed from a relatively small number of layers. This reduces the computer resources and computational time. Moreover, we test our model on two types of datasets and argue that the currently developed deep-learning-based methods cannot efficiently distinguish between infected and contaminated cells. A more precise study of suspicious regions is required.Keywords: Malaria, deep learning, DL, convolution neural network, CNN, thin blood smears.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 65581 A Study on Early Prediction of Fault Proneness in Software Modules using Genetic Algorithm
Authors: Parvinder S. Sandhu, Sunil Khullar, Satpreet Singh, Simranjit K. Bains, Manpreet Kaur, Gurvinder Singh
Abstract:
Fault-proneness of a software module is the probability that the module contains faults. To predict faultproneness of modules different techniques have been proposed which includes statistical methods, machine learning techniques, neural network techniques and clustering techniques. The aim of proposed study is to explore whether metrics available in the early lifecycle (i.e. requirement metrics), metrics available in the late lifecycle (i.e. code metrics) and metrics available in the early lifecycle (i.e. requirement metrics) combined with metrics available in the late lifecycle (i.e. code metrics) can be used to identify fault prone modules using Genetic Algorithm technique. This approach has been tested with real time defect C Programming language datasets of NASA software projects. The results show that the fusion of requirement and code metric is the best prediction model for detecting the faults as compared with commonly used code based model.Keywords: Genetic Algorithm, Fault Proneness, Software Faultand Software Quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 198480 Artificial Neural Network-Based Short-Term Load Forecasting for Mymensingh Area of Bangladesh
Authors: S. M. Anowarul Haque, Md. Asiful Islam
Abstract:
Electrical load forecasting is considered to be one of the most indispensable parts of a modern-day electrical power system. To ensure a reliable and efficient supply of electric energy, special emphasis should have been put on the predictive feature of electricity supply. Artificial Neural Network-based approaches have emerged to be a significant area of interest for electric load forecasting research. This paper proposed an Artificial Neural Network model based on the particle swarm optimization algorithm for improved electric load forecasting for Mymensingh, Bangladesh. The forecasting model is developed and simulated on the MATLAB environment with a large number of training datasets. The model is trained based on eight input parameters including historical load and weather data. The predicted load data are then compared with an available dataset for validation. The proposed neural network model is proved to be more reliable in terms of day-wise load forecasting for Mymensingh, Bangladesh.Keywords: Load forecasting, artificial neural network, particle swarm optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 68679 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease dataset, the study successfully identified key factors, and the results were consistent with previous studies.
Keywords: Lyme disease, Poisson generalized linear model, Ridge regression, Lasso Regression, elastic net regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12278 DCBOR: A Density Clustering Based on Outlier Removal
Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan
Abstract:
Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 193377 Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent
Authors: Zhifeng Kong
Abstract:
Over-parameterized neural networks have attracted a great deal of attention in recent deep learning theory research, as they challenge the classic perspective of over-fitting when the model has excessive parameters and have gained empirical success in various settings. While a number of theoretical works have been presented to demystify properties of such models, the convergence properties of such models are still far from being thoroughly understood. In this work, we study the convergence properties of training two-hidden-layer partially over-parameterized fully connected networks with the Rectified Linear Unit activation via gradient descent. To our knowledge, this is the first theoretical work to understand convergence properties of deep over-parameterized networks without the equally-wide-hidden-layer assumption and other unrealistic assumptions. We provide a probabilistic lower bound of the widths of hidden layers and proved linear convergence rate of gradient descent. We also conducted experiments on synthetic and real-world datasets to validate our theory.Keywords: Over-parameterization, Rectified Linear Units (ReLU), convergence, gradient descent, neural networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 897