Search results for: Waveform Datasets
99 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods
Authors: Cristina Vatamanu, Doina Cosovan, Dragoş Gavriluţ, Henri Luchian
Abstract:
In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.Keywords: Detection Rate, False Positives, Perceptron, One Side Class, Ensembles, Decision Tree, Hybrid methods, Feature Selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 328098 3D Point Cloud Model Color Adjustment by Combining Terrestrial Laser Scanner and Close Range Photogrammetry Datasets
Authors: M. Pepe, S. Ackermann, L. Fregonese, C. Achille
Abstract:
3D models obtained with advanced survey techniques such as close-range photogrammetry and laser scanner are nowadays particularly appreciated in Cultural Heritage and Archaeology fields. In order to produce high quality models representing archaeological evidences and anthropological artifacts, the appearance of the model (i.e. color) beyond the geometric accuracy, is not a negligible aspect. The integration of the close-range photogrammetry survey techniques with the laser scanner is still a topic of study and research. By combining point cloud data sets of the same object generated with both technologies, or with the same technology but registered in different moment and/or natural light condition, could construct a final point cloud with accentuated color dissimilarities. In this paper, a methodology to uniform the different data sets, to improve the chromatic quality and to highlight further details by balancing the point color will be presented.
Keywords: Color models, cultural heritage, laser scanner, photogrammetry, point cloud color.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 163197 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models
Authors: Chad Goldsworthy, B. Rajeswari Matam
Abstract:
The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.
Keywords: Convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 141996 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification
Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman
Abstract:
In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 269895 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms
Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang
Abstract:
Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.
Keywords: Bioassay, machine learning, preprocessing, virtual screen.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 98194 Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes
Authors: Ilona Mewes, Helena Jenzer, Farshideh Einsele
Abstract:
Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.
Keywords: Health informatics, data mining, nutritional and health databases, nutritional and chronical databases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172593 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data
Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch
Abstract:
It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 252392 Automatic Classification of Periodic Heart Sounds Using Convolutional Neural Network
Authors: Jia Xin Low, Keng Wah Choo
Abstract:
This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.Keywords: Convolutional neural network, discrete wavelet transform, deep learning, heart sound classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 114691 Improved Predictive Models for the IRMA Network Using Nonlinear Optimisation
Authors: Vishwesh Kulkarni, Nikhil Bellarykar
Abstract:
Cellular complexity stems from the interactions among thousands of different molecular species. Thanks to the emerging fields of systems and synthetic biology, scientists are beginning to unravel these regulatory, signaling, and metabolic interactions and to understand their coordinated action. Reverse engineering of biological networks has has several benefits but a poor quality of data combined with the difficulty in reproducing it limits the applicability of these methods. A few years back, many of the commonly used predictive algorithms were tested on a network constructed in the yeast Saccharomyces cerevisiae (S. cerevisiae) to resolve this issue. The network was a synthetic network of five genes regulating each other for the so-called in vivo reverse-engineering and modeling assessment (IRMA). The network was constructed in S. cereviase since it is a simple and well characterized organism. The synthetic network included a variety of regulatory interactions, thus capturing the behaviour of larger eukaryotic gene networks on a smaller scale. We derive a new set of algorithms by solving a nonlinear optimization problem and show how these algorithms outperform other algorithms on these datasets.Keywords: Synthetic gene network, network identification, nonlinear modeling, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 80090 Improved Feature Extraction Technique for Handling Occlusion in Automatic Facial Expression Recognition
Authors: Khadijat T. Bamigbade, Olufade F. W. Onifade
Abstract:
The field of automatic facial expression analysis has been an active research area in the last two decades. Its vast applicability in various domains has drawn so much attention into developing techniques and dataset that mirror real life scenarios. Many techniques such as Local Binary Patterns and its variants (CLBP, LBP-TOP) and lately, deep learning techniques, have been used for facial expression recognition. However, the problem of occlusion has not been sufficiently handled, making their results not applicable in real life situations. This paper develops a simple, yet highly efficient method tagged Local Binary Pattern-Histogram of Gradient (LBP-HOG) with occlusion detection in face image, using a multi-class SVM for Action Unit and in turn expression recognition. Our method was evaluated on three publicly available datasets which are JAFFE, CK, SFEW. Experimental results showed that our approach performed considerably well when compared with state-of-the-art algorithms and gave insight to occlusion detection as a key step to handling expression in wild.
Keywords: Automatic facial expression analysis, local binary pattern, LBP-HOG, occlusion detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 78189 Combining Fuzzy Logic and Neural Networks in Modeling Landfill Gas Production
Authors: Mohamed Abdallah, Mostafa Warith, Roberto Narbaitz, Emil Petriu, Kevin Kennedy
Abstract:
Heterogeneity of solid waste characteristics as well as the complex processes taking place within the landfill ecosystem motivated the implementation of soft computing methodologies such as artificial neural networks (ANN), fuzzy logic (FL), and their combination. The present work uses a hybrid ANN-FL model that employs knowledge-based FL to describe the process qualitatively and implements the learning algorithm of ANN to optimize model parameters. The model was developed to simulate and predict the landfill gas production at a given time based on operational parameters. The experimental data used were compiled from lab-scale experiment that involved various operating scenarios. The developed model was validated and statistically analyzed using F-test, linear regression between actual and predicted data, and mean squared error measures. Overall, the simulated landfill gas production rates demonstrated reasonable agreement with actual data. The discussion focused on the effect of the size of training datasets and number of training epochs.
Keywords: Adaptive neural fuzzy inference system (ANFIS), gas production, landfill
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 241588 A Genetic Algorithm Based Permutation and Non-Permutation Scheduling Heuristics for Finite Capacity Material Requirement Planning Problem
Authors: Watchara Songserm, Teeradej Wuttipornpun
Abstract:
This paper presents a genetic algorithm based permutation and non-permutation scheduling heuristics (GAPNP) to solve a multi-stage finite capacity material requirement planning (FCMRP) problem in automotive assembly flow shop with unrelated parallel machines. In the algorithm, the sequences of orders are iteratively improved by the GA characteristics, whereas the required operations are scheduled based on the presented permutation and non-permutation heuristics. Finally, a linear programming is applied to minimize the total cost. The presented GAPNP algorithm is evaluated by using real datasets from automotive companies. The required parameters for GAPNP are intently tuned to obtain a common parameter setting for all case studies. The results show that GAPNP significantly outperforms the benchmark algorithm about 30% on average.
Keywords: Finite capacity MRP, genetic algorithm, linear programming, flow shop, unrelated parallel machines, application in industries.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 110787 An Approach to Measure Snow Depth of Winter Accumulation at Basin Scale Using Satellite Data
Authors: M. Geetha Priya, D. Krishnaveni
Abstract:
Snow depth estimation and monitoring studies have been carried out for decades using empirical relationship or extrapolation of point measurements carried out in field. With the development of advanced satellite based remote sensing techniques, a modified approach is proposed in the present study to estimate the winter accumulated snow depth at basin scale. Assessment of snow depth by differencing Digital Elevation Model (DEM) generated at the beginning and end of winter season can be experimented for the region of interest (Himalayan and polar regions) accounting for winter accumulation (solid precipitation). The proposed approach is based on existing geodetic method that is being used for glacier mass balance estimation. Considering the satellite datasets purely acquired during beginning and end of winter season, it is possible to estimate the change in depth or thickness for the snow that is accumulated during the winter as it takes one year for the snow to get transformed into firn (snow that has survived one summer or one-year old snow).
Keywords: Digital elevation model, snow depth, geodetic method, snow cover.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 71686 Malaria Parasite Detection Using Deep Learning Methods
Authors: Kaustubh Chakradeo, Michael Delves, Sofya Titarenko
Abstract:
Malaria is a serious disease which affects hundreds of millions of people around the world, each year. If not treated in time, it can be fatal. Despite recent developments in malaria diagnostics, the microscopy method to detect malaria remains the most common. Unfortunately, the accuracy of microscopic diagnostics is dependent on the skill of the microscopist and limits the throughput of malaria diagnosis. With the development of Artificial Intelligence tools and Deep Learning techniques in particular, it is possible to lower the cost, while achieving an overall higher accuracy. In this paper, we present a VGG-based model and compare it with previously developed models for identifying infected cells. Our model surpasses most previously developed models in a range of the accuracy metrics. The model has an advantage of being constructed from a relatively small number of layers. This reduces the computer resources and computational time. Moreover, we test our model on two types of datasets and argue that the currently developed deep-learning-based methods cannot efficiently distinguish between infected and contaminated cells. A more precise study of suspicious regions is required.Keywords: Malaria, deep learning, DL, convolution neural network, CNN, thin blood smears.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 65585 A Study on Early Prediction of Fault Proneness in Software Modules using Genetic Algorithm
Authors: Parvinder S. Sandhu, Sunil Khullar, Satpreet Singh, Simranjit K. Bains, Manpreet Kaur, Gurvinder Singh
Abstract:
Fault-proneness of a software module is the probability that the module contains faults. To predict faultproneness of modules different techniques have been proposed which includes statistical methods, machine learning techniques, neural network techniques and clustering techniques. The aim of proposed study is to explore whether metrics available in the early lifecycle (i.e. requirement metrics), metrics available in the late lifecycle (i.e. code metrics) and metrics available in the early lifecycle (i.e. requirement metrics) combined with metrics available in the late lifecycle (i.e. code metrics) can be used to identify fault prone modules using Genetic Algorithm technique. This approach has been tested with real time defect C Programming language datasets of NASA software projects. The results show that the fusion of requirement and code metric is the best prediction model for detecting the faults as compared with commonly used code based model.Keywords: Genetic Algorithm, Fault Proneness, Software Faultand Software Quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 198484 Artificial Neural Network-Based Short-Term Load Forecasting for Mymensingh Area of Bangladesh
Authors: S. M. Anowarul Haque, Md. Asiful Islam
Abstract:
Electrical load forecasting is considered to be one of the most indispensable parts of a modern-day electrical power system. To ensure a reliable and efficient supply of electric energy, special emphasis should have been put on the predictive feature of electricity supply. Artificial Neural Network-based approaches have emerged to be a significant area of interest for electric load forecasting research. This paper proposed an Artificial Neural Network model based on the particle swarm optimization algorithm for improved electric load forecasting for Mymensingh, Bangladesh. The forecasting model is developed and simulated on the MATLAB environment with a large number of training datasets. The model is trained based on eight input parameters including historical load and weather data. The predicted load data are then compared with an available dataset for validation. The proposed neural network model is proved to be more reliable in terms of day-wise load forecasting for Mymensingh, Bangladesh.Keywords: Load forecasting, artificial neural network, particle swarm optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 68683 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease dataset, the study successfully identified key factors, and the results were consistent with previous studies.
Keywords: Lyme disease, Poisson generalized linear model, Ridge regression, Lasso Regression, elastic net regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12282 A Comparison of SVM-based Criteria in Evolutionary Method for Gene Selection and Classification of Microarray Data
Authors: Rameswar Debnath, Haruhisa Takahashi
Abstract:
An evolutionary method whose selection and recombination operations are based on generalization error-bounds of support vector machine (SVM) can select a subset of potentially informative genes for SVM classifier very efficiently [7]. In this paper, we will use the derivative of error-bound (first-order criteria) to select and recombine gene features in the evolutionary process, and compare the performance of the derivative of error-bound with the error-bound itself (zero-order) in the evolutionary process. We also investigate several error-bounds and their derivatives to compare the performance, and find the best criteria for gene selection and classification. We use 7 cancer-related human gene expression datasets to evaluate the performance of the zero-order and first-order criteria of error-bounds. Though both criteria have the same strategy in theoretically, experimental results demonstrate the best criterion for microarray gene expression data.Keywords: support vector machine, generalization error-bound, feature selection, evolutionary algorithm, microarray data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 153681 DCBOR: A Density Clustering Based on Outlier Removal
Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan
Abstract:
Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 193380 Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent
Authors: Zhifeng Kong
Abstract:
Over-parameterized neural networks have attracted a great deal of attention in recent deep learning theory research, as they challenge the classic perspective of over-fitting when the model has excessive parameters and have gained empirical success in various settings. While a number of theoretical works have been presented to demystify properties of such models, the convergence properties of such models are still far from being thoroughly understood. In this work, we study the convergence properties of training two-hidden-layer partially over-parameterized fully connected networks with the Rectified Linear Unit activation via gradient descent. To our knowledge, this is the first theoretical work to understand convergence properties of deep over-parameterized networks without the equally-wide-hidden-layer assumption and other unrealistic assumptions. We provide a probabilistic lower bound of the widths of hidden layers and proved linear convergence rate of gradient descent. We also conducted experiments on synthetic and real-world datasets to validate our theory.Keywords: Over-parameterization, Rectified Linear Units (ReLU), convergence, gradient descent, neural networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 89679 Cross Project Software Fault Prediction at Design Phase
Authors: Pradeep Singh, Shrish Verma
Abstract:
Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. Earlier we predicted the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven datasets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.Keywords: Software Metrics, Fault prediction, Cross project, Within project.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 254678 Swarmed Discriminant Analysis for Multifunction Prosthesis Control
Authors: Rami N. Khushaba, Ahmed Al-Ani, Adel Al-Jumaily
Abstract:
One of the approaches enabling people with amputated limbs to establish some sort of interface with the real world includes the utilization of the myoelectric signal (MES) from the remaining muscles of those limbs. The MES can be used as a control input to a multifunction prosthetic device. In this control scheme, known as the myoelectric control, a pattern recognition approach is usually utilized to discriminate between the MES signals that belong to different classes of the forearm movements. Since the MES is recorded using multiple channels, the feature vector size can become very large. In order to reduce the computational cost and enhance the generalization capability of the classifier, a dimensionality reduction method is needed to identify an informative yet moderate size feature set. This paper proposes a new fuzzy version of the well known Fisher-s Linear Discriminant Analysis (LDA) feature projection technique. Furthermore, based on the fact that certain muscles might contribute more to the discrimination process, a novel feature weighting scheme is also presented by employing Particle Swarm Optimization (PSO) for estimating the weight of each feature. The new method, called PSOFLDA, is tested on real MES datasets and compared with other techniques to prove its superiority.Keywords: Discriminant Analysis, Pattern Recognition, SignalProcessing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 155677 Segmentation of Lungs from CT Scan Images for Early Diagnosis of Lung Cancer
Authors: Nisar Ahmed Memon, Anwar Majid Mirza, S.A.M. Gilani
Abstract:
Segmentation is an important step in medical image analysis and classification for radiological evaluation or computer aided diagnosis. The CAD (Computer Aided Diagnosis ) of lung CT generally first segment the area of interest (lung) and then analyze the separately obtained area for nodule detection in order to diagnosis the disease. For normal lung, segmentation can be performed by making use of excellent contrast between air and surrounding tissues. However this approach fails when lung is affected by high density pathology. Dense pathologies are present in approximately a fifth of clinical scans, and for computer analysis such as detection and quantification of abnormal areas it is vital that the entire and perfectly lung part of the image is provided and no part, as present in the original image be eradicated. In this paper we have proposed a lung segmentation technique which accurately segment the lung parenchyma from lung CT Scan images. The algorithm was tested against the 25 datasets of different patients received from Ackron Univeristy, USA and AGA Khan Medical University, Karachi, Pakistan.Keywords: Computer Aided Diagnosis, Medical ImageProcessing, Region Growing, Segmentation, Thresholding,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 260076 Product Features Extraction from Opinions According to Time
Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou
Abstract:
Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.
Keywords: Opinion mining, product feature extraction, sentiment analysis, SentiWordNet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 130075 An In-Depth Analysis of Open Data Portals as an Emerging Public E-Service
Authors: Martin Lnenicka
Abstract:
Governments collect and produce large amounts of data. Increasingly, governments worldwide have started to implement open data initiatives and also launch open data portals to enable the release of these data in open and reusable formats. Therefore, a large number of open data repositories, catalogues and portals have been emerging in the world. The greater availability of interoperable and linkable open government data catalyzes secondary use of such data, so they can be used for building useful applications which leverage their value, allow insight, provide access to government services, and support transparency. The efficient development of successful open data portals makes it necessary to evaluate them systematic, in order to understand them better and assess the various types of value they generate, and identify the required improvements for increasing this value. Thus, the attention of this paper is directed particularly to the field of open data portals. The main aim of this paper is to compare the selected open data portals on the national level using content analysis and propose a new evaluation framework, which further improves the quality of these portals. It also establishes a set of considerations for involving businesses and citizens to create eservices and applications that leverage on the datasets available from these portals.
Keywords: Big data, content analysis, criteria comparison, data quality, open data, open data portals, public sector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 308274 ANN Based Model Development for Material Removal Rate in Dry Turning in Indian Context
Authors: Mangesh R. Phate, V. H. Tatwawadi
Abstract:
This paper is intended to develop an artificial neural network (ANN) based model of material removal rate (MRR) in the turning of ferrous and nonferrous material in a Indian small-scale industry. MRR of the formulated model was proved with the testing data and artificial neural network (ANN) model was developed for the analysis and prediction of the relationship between inputs and output parameters during the turning of ferrous and nonferrous materials. The input parameters of this model are operator, work-piece, cutting process, cutting tool, machine and the environment.
The ANN model consists of a three layered feedforward back propagation neural network. The network is trained with pairs of independent/dependent datasets generated when machining ferrous and nonferrous material. A very good performance of the neural network, in terms of contract with experimental data, was achieved. The model may be used for the testing and forecast of the complex relationship between dependent and the independent parameters in turning operations.
Keywords: Field data based model, Artificial neural network, Simulation, Convectional Turning, Material removal rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 197073 Overall Student Satisfaction at Tabor School of Education: An Examination of Key Factors Based on the AUSSE SEQ
Authors: Francisco Ben, Tracey Price, Chad Morrison, Victoria Warren, Willy Gollan, Robyn Dunbar, Frank Davies, Mark Sorrell
Abstract:
This paper focuses particularly on the educational aspects that contribute to the overall educational satisfaction rated by Tabor School of Education students who participated in the Australasian Survey of Student Engagement (AUSSE) conducted by the Australian Council for Educational Research (ACER) in 2010, 2012 and 2013. In all three years of participation, Tabor ranked first especially in the area of overall student satisfaction. By using a single level path analysis in relation to the AUSSE datasets collected using the Student Engagement Questionnaire (SEQ) for Tabor School of Education, seven aspects that contribute to overall student satisfaction have been identified. There appears to be a direct causal link between aspects of the Supportive Learning Environment, Work Integrated Learning, Career Readiness, Academic Challenge, and overall educational satisfaction levels. A further three aspects, being Student and Staff Interactions, Active Learning, and Enriching Educational Experiences, indirectly influence overall educational satisfaction levels.Keywords: Tertiary student satisfaction, tertiary educational experience, pre-service teacher education, path analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 123472 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy
Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie
Abstract:
In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.
Keywords: Data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 255771 Development of Fake News Model Using Machine Learning through Natural Language Processing
Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini
Abstract:
Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.
Keywords: Fake news detection, types of fake news, machine learning, natural language processing, classification techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 151270 A Neuron Model of Facial Recognition and Detection of an Authorized Entity Using Machine Learning System
Authors: J. K. Adedeji, M. O. Oyekanmi
Abstract:
This paper has critically examined the use of Machine Learning procedures in curbing unauthorized access into valuable areas of an organization. The use of passwords, pin codes, user’s identification in recent times has been partially successful in curbing crimes involving identities, hence the need for the design of a system which incorporates biometric characteristics such as DNA and pattern recognition of variations in facial expressions. The facial model used is the OpenCV library which is based on the use of certain physiological features, the Raspberry Pi 3 module is used to compile the OpenCV library, which extracts and stores the detected faces into the datasets directory through the use of camera. The model is trained with 50 epoch run in the database and recognized by the Local Binary Pattern Histogram (LBPH) recognizer contained in the OpenCV. The training algorithm used by the neural network is back propagation coded using python algorithmic language with 200 epoch runs to identify specific resemblance in the exclusive OR (XOR) output neurons. The research however confirmed that physiological parameters are better effective measures to curb crimes relating to identities.
Keywords: Biometric characters, facial recognition, neural network, OpenCV.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 695