Search results for: classifier ensemble
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 498

Search results for: classifier ensemble

438 A Machine Learning Approach to Detecting Evasive PDF Malware

Authors: Vareesha Masood, Ammara Gul, Nabeeha Areej, Muhammad Asif Masood, Hamna Imran

Abstract:

The universal use of PDF files has prompted hackers to use them for malicious intent by hiding malicious codes in their victim’s PDF machines. Machine learning has proven to be the most efficient in identifying benign files and detecting files with PDF malware. This paper has proposed an approach using a decision tree classifier with parameters. A modern, inclusive dataset CIC-Evasive-PDFMal2022, produced by Lockheed Martin’s Cyber Security wing is used. It is one of the most reliable datasets to use in this field. We designed a PDF malware detection system that achieved 99.2%. Comparing the suggested model to other cutting-edge models in the same study field, it has a great performance in detecting PDF malware. Accordingly, we provide the fastest, most reliable, and most efficient PDF Malware detection approach in this paper.

Keywords: PDF, PDF malware, decision tree classifier, random forest classifier

Procedia PDF Downloads 55
437 Ensemble Machine Learning Approach for Estimating Missing Data from CO₂ Time Series

Authors: Atbin Mahabbati, Jason Beringer, Matthias Leopold

Abstract:

To address the global challenges of climate and environmental changes, there is a need for quantifying and reducing uncertainties in environmental data, including observations of carbon, water, and energy. Global eddy covariance flux tower networks (FLUXNET), and their regional counterparts (i.e., OzFlux, AmeriFlux, China Flux, etc.) were established in the late 1990s and early 2000s to address the demand. Despite the capability of eddy covariance in validating process modelling analyses, field surveys and remote sensing assessments, there are some serious concerns regarding the challenges associated with the technique, e.g. data gaps and uncertainties. To address these concerns, this research has developed an ensemble model to fill the data gaps of CO₂ flux to avoid the limitations of using a single algorithm, and therefore, provide less error and decline the uncertainties associated with the gap-filling process. In this study, the data of five towers in the OzFlux Network (Alice Springs Mulga, Calperum, Gingin, Howard Springs and Tumbarumba) during 2013 were used to develop an ensemble machine learning model, using five feedforward neural networks (FFNN) with different structures combined with an eXtreme Gradient Boosting (XGB) algorithm. The former methods, FFNN, provided the primary estimations in the first layer, while the later, XGB, used the outputs of the first layer as its input to provide the final estimations of CO₂ flux. The introduced model showed slight superiority over each single FFNN and the XGB, while each of these two methods was used individually, overall RMSE: 2.64, 2.91, and 3.54 g C m⁻² yr⁻¹ respectively (3.54 provided by the best FFNN). The most significant improvement happened to the estimation of the extreme diurnal values (during midday and sunrise), as well as nocturnal estimations, which is generally considered as one of the most challenging parts of CO₂ flux gap-filling. The towers, as well as seasonality, showed different levels of sensitivity to improvements provided by the ensemble model. For instance, Tumbarumba showed more sensitivity compared to Calperum, where the differences between the Ensemble model on the one hand and the FFNNs and XGB, on the other hand, were the least of all 5 sites. Besides, the performance difference between the ensemble model and its components individually were more significant during the warm season (Jan, Feb, Mar, Oct, Nov, and Dec) compared to the cold season (Apr, May, Jun, Jul, Aug, and Sep) due to the higher amount of photosynthesis of plants, which led to a larger range of CO₂ exchange. In conclusion, the introduced ensemble model slightly improved the accuracy of CO₂ flux gap-filling and robustness of the model. Therefore, using ensemble machine learning models is potentially capable of improving data estimation and regression outcome when it seems to be no more room for improvement while using a single algorithm.

Keywords: carbon flux, Eddy covariance, extreme gradient boosting, gap-filling comparison, hybrid model, OzFlux network

Procedia PDF Downloads 109
436 Measurement of Coal Fineness, Air Fuel Ratio, and Fuel Weight Distribution in a Vertical Spindle Mill’s Pulverized Fuel Pipes at Classifier Vane 40%

Authors: Jayasiler Kunasagaram

Abstract:

In power generation, coal fineness is crucial to maintain flame stability, ensure combustion efficiency, and lower emissions to the environment. In order for the pulverized coal to react effectively in the boiler furnace, the size of coal particles needs to be at least 70% finer than 74 μm. This paper presents the experiment results of coal fineness, air fuel ratio and fuel weight distribution in pulverized fuel pipes at classifier vane 40%. The aim of this experiment is to extract the pulverized coal is kinetically and investigate the data accordingly. Dirty air velocity, coal sample extraction, and coal sieving experiments were performed to measure coal fineness. The experiment results show that required coal fineness can be achieved at 40 % classifier vane. However, this does not surpass the desired value by a great margin.

Keywords: coal power, emissions, isokinetic sampling, power generation

Procedia PDF Downloads 572
435 Triangular Geometric Feature for Offline Signature Verification

Authors: Zuraidasahana Zulkarnain, Mohd Shafry Mohd Rahim, Nor Anita Fairos Ismail, Mohd Azhar M. Arsad

Abstract:

Handwritten signature is accepted widely as a biometric characteristic for personal authentication. The use of appropriate features plays an important role in determining accuracy of signature verification; therefore, this paper presents a feature based on the geometrical concept. To achieve the aim, triangle attributes are exploited to design a new feature since the triangle possesses orientation, angle and transformation that would improve accuracy. The proposed feature uses triangulation geometric set comprising of sides, angles and perimeter of a triangle which is derived from the center of gravity of a signature image. For classification purpose, Euclidean classifier along with Voting-based classifier is used to verify the tendency of forgery signature. This classification process is experimented using triangular geometric feature and selected global features. Based on an experiment that was validated using Grupo de Senales 960 (GPDS-960) signature database, the proposed triangular geometric feature achieves a lower Average Error Rates (AER) value with a percentage of 34% as compared to 43% of the selected global feature. As a conclusion, the proposed triangular geometric feature proves to be a more reliable feature for accurate signature verification.

Keywords: biometrics, euclidean classifier, features extraction, offline signature verification, voting-based classifier

Procedia PDF Downloads 347
434 Classifier for Liver Ultrasound Images

Authors: Soumya Sajjan

Abstract:

Liver cancer is the most common cancer disease worldwide in men and women, and is one of the few cancers still on the rise. Liver disease is the 4th leading cause of death. According to new NHS (National Health Service) figures, deaths from liver diseases have reached record levels, rising by 25% in less than a decade; heavy drinking, obesity, and hepatitis are believed to be behind the rise. In this study, we focus on Development of Diagnostic Classifier for Ultrasound liver lesion. Ultrasound (US) Sonography is an easy-to-use and widely popular imaging modality because of its ability to visualize many human soft tissues/organs without any harmful effect. This paper will provide an overview of underlying concepts, along with algorithms for processing of liver ultrasound images Naturaly, Ultrasound liver lesion images are having more spackle noise. Developing classifier for ultrasound liver lesion image is a challenging task. We approach fully automatic machine learning system for developing this classifier. First, we segment the liver image by calculating the textural features from co-occurrence matrix and run length method. For classification, Support Vector Machine is used based on the risk bounds of statistical learning theory. The textural features for different features methods are given as input to the SVM individually. Performance analysis train and test datasets carried out separately using SVM Model. Whenever an ultrasonic liver lesion image is given to the SVM classifier system, the features are calculated, classified, as normal and diseased liver lesion. We hope the result will be helpful to the physician to identify the liver cancer in non-invasive method.

Keywords: segmentation, Support Vector Machine, ultrasound liver lesion, co-occurance Matrix

Procedia PDF Downloads 382
433 Non-Targeted Adversarial Image Classification Attack-Region Modification Methods

Authors: Bandar Alahmadi, Lethia Jackson

Abstract:

Machine Learning model is used today in many real-life applications. The safety and security of such model is important, so the results of the model are as accurate as possible. One challenge of machine learning model security is the adversarial examples attack. Adversarial examples are designed by the attacker to cause the machine learning model to misclassify the input. We propose a method to generate adversarial examples to attack image classifiers. We are modifying the successfully classified images, so a classifier misclassifies them after the modification. In our method, we do not update the whole image, but instead we detect the important region, modify it, place it back to the original image, and then run it through a classifier. The algorithm modifies the detected region using two methods. First, it will add abstract image matrix on back of the detected image matrix. Then, it will perform a rotation attack to rotate the detected region around its axes, and embed the trace of image in image background. Finally, the attacked region is placed in its original position, from where it was removed, and a smoothing filter is applied to smooth the background with foreground. We test our method in cascade classifier, and the algorithm is efficient, the classifier confident has dropped to almost zero. We also try it in CNN (Convolutional neural network) with higher setting and the algorithm was successfully worked.

Keywords: adversarial examples, attack, computer vision, image processing

Procedia PDF Downloads 309
432 An ANN Approach for Detection and Localization of Fatigue Damage in Aircraft Structures

Authors: Reza Rezaeipour Honarmandzad

Abstract:

In this paper we propose an ANN for detection and localization of fatigue damage in aircraft structures. We used network of piezoelectric transducers for Lamb-wave measurements in order to calculate damage indices. Data gathered by the sensors was given to neural network classifier. A set of neural network electors of different architecture cooperates to achieve consensus concerning the state of each monitored path. Sensed signal variations in the ROI, detected by the networks at each path, were used to assess the state of the structure as well as to localize detected damage and to filter out ambient changes. The classifier has been extensively tested on large data sets acquired in the tests of specimens with artificially introduced notches as well as the results of numerous fatigue experiments. Effect of the classifier structure and test data used for training on the results was evaluated.

Keywords: ANN, fatigue damage, aircraft structures, piezoelectric transducers, lamb-wave measurements

Procedia PDF Downloads 386
431 Breast Cancer Detection Using Machine Learning Algorithms

Authors: Jiwan Kumar, Pooja, Sandeep Negi, Anjum Rouf, Amit Kumar, Naveen Lakra

Abstract:

In modern times where, health issues are increasing day by day, breast cancer is also one of them, which is very crucial and really important to find in the early stages. Doctors can use this model in order to tell their patients whether a cancer is not harmful (benign) or harmful (malignant). We have used the knowledge of machine learning in order to produce the model. we have used algorithms like Logistic Regression, Random forest, support Vector Classifier, Bayesian Network and Radial Basis Function. We tried to use the data of crucial parts and show them the results in pictures in order to make it easier for doctors. By doing this, we're making ML better at finding breast cancer, which can lead to saving more lives and better health care.

Keywords: Bayesian network, radial basis function, ensemble learning, understandable, data making better, random forest, logistic regression, breast cancer

Procedia PDF Downloads 11
430 Study of Functional Relevant Conformational Mobility of β-2 Adrenoreceptor by Means of Molecular Dynamics Simulation

Authors: G. V. Novikov, V. S. Sivozhelezov, S. S. Kolesnikov, K. V. Shaitan

Abstract:

The study reports about the influence of binding of orthosteric ligands as well as point mutations on the conformational dynamics of β-2-adrenoreceptor. Using molecular dynamics simulation we found that there was a little fraction of active states of the receptor in its apo (ligand free) ensemble corresponded to its constitutive activity. Analysis of MD trajectories indicated that such spontaneous activation of the receptor is accompanied by the motion in intracellular part of its alpha-helices. Thus receptor’s constitutive activity directly results from its conformational dynamics. On the other hand the binding of a full agonist resulted in a significant shift of the initial equilibrium towards its active state. Finally, the binding of the inverse agonist stabilized the receptor in its inactive state. It is likely that the binding of inverse agonists might be a universal way of constitutive activity inhibition in vivo. Our results indicate that ligand binding redistribute pre-existing conformational degrees of freedom (in accordance to the Monod-Wyman-Changeux-Model) of the receptor rather than cause induced fit in it. Therefore, the ensemble of biologically relevant receptor conformations is encoded in its spatial structure, and individual conformations from that ensemble might be used by the cell in conformity with the physiological behaviour.

Keywords: seven-transmembrane receptors, constitutive activity, activation, x-ray crystallography, principal component analysis, molecular dynamics simulation

Procedia PDF Downloads 228
429 A Study on the Acquisition of Chinese Classifiers by Vietnamese Learners

Authors: Quoc Hung Le Pham

Abstract:

In the field of language study, classifier is an interesting research feature. In the world’s languages, some languages have classifier system, some do not. Mandarin Chinese and Vietnamese languages are a rich classifier system, however, because of the language system, the cognitive, cultural differences, so that the syntactic structure of classifier of them also dissimilar. When using Mandarin Chinese classifiers must collocate with nouns or verbs, in the lexical category it is not like nouns or verbs, belong to the open class. But some scholars believe that Mandarin Chinese measure words are similar to English and other Indo European languages. The word hanging on the structure and word formation (suffix), is a closed class. Compared to other languages, such as Chinese, Vietnamese, Thai and other Asian languages are still belonging to the classifier language’s second type, this type of language is classifier, it is in the majority of quantity must exist, and following deictic, anaphoric or quantity appearing together, not separation between its modified noun, also known as numeral classifier language. Main syntactic structure of Chinese classifiers are as follows: ‘quantity+measure+noun’, ‘pronoun+measure+noun’, ‘pronoun+quantity+measure+noun’, ‘prefix+quantity+measure +noun’, ‘quantity +adjective + measure +noun’, ‘ quantity (above 10 whole number), + duo (多)measure +noun’, ‘ quantity (around 10) + measure + duo (多) +noun’. Main syntactic structure of Vietnamese classifiers are: ‘quantity+measure+noun’, ‘ measure+noun+pronoun’, ‘quantity+measure+noun+pronoun’, ‘measure+noun+prefix+ quantity’, ‘quantity+measure+noun+adjective', ‘duo (多) +quanlity+measure+noun’, ‘quantity+measure+adjective+pronoun (quantity word could not be 1)’, ‘measure+adjective+pronoun’, ‘measure+pronoun’. In daily life, classifiers are commonly used, if Chinese learners failed to standardize this using catergory, because the negative impact might occur on their verbal communication. The richness of the Chinese classifier system contributes to the complexity in the study of the system by foreign learners, especially in the inter language of Vietnamese learners. As above mentioned, Vietnamese language also has a rich system of classifiers, however, the basic structure order of two languages are similar but both still have differences. These similarities and dissimilarities between Chinese and Vietnamese classifier systems contribute significantly to the common errors made by Vietnamese students while they acquire Chinese, which are distinct from the errors made by students from the other language background. This article from a comparative perspective of language, has an orientation towards Chinese and Vietnamese languages commonly used in classifiers semantics and structural form two aspects. This comparative study aims to identity Vietnamese students while learning Chinese classifiers may face some negative transference of mother language, beside that through the analysis of the classifiers questionnaire, find out the causes and patterns of the errors they made. As the preliminary analysis shows, Vietnamese students while learning Chinese classifiers made some errors such as: overuse classifier ‘ge’(个); misuse the other classifiers ‘*yi zhang ri ji’(yi pian ri ji), ‘*yi zuo fang zi’(yi jian fang zi), ‘*si zhang jin pai’(si mei jin pai); homonym words ‘dui, shuang, fu, tao’ (对、双、副、套), ‘ke, li’ (颗、粒).

Keywords: acquisition, classifiers, negative transfer, Vietnamse learners

Procedia PDF Downloads 420
428 Detecting Music Enjoyment Level Using Electroencephalogram Signals and Machine Learning Techniques

Authors: Raymond Feng, Shadi Ghiasi

Abstract:

An electroencephalogram (EEG) is a non-invasive technique that records electrical activity in the brain using scalp electrodes. Researchers have studied the use of EEG to detect emotions and moods by collecting signals from participants and analyzing how those signals correlate with their activities. In this study, researchers investigated the relationship between EEG signals and music enjoyment. Participants listened to music while data was collected. During the signal-processing phase, power spectral densities (PSDs) were computed from the signals, and dominant brainwave frequencies were extracted from the PSDs to form a comprehensive feature matrix. A machine learning approach was then taken to find correlations between the processed data and the music enjoyment level indicated by the participants. To improve on previous research, multiple machine learning models were employed, including K-Nearest Neighbors Classifier, Support Vector Classifier, and Decision Tree Classifier. Hyperparameters were used to fine-tune each model to further increase its performance. The experiments showed that a strong correlation exists, with the Decision Tree Classifier with hyperparameters yielding 85% accuracy. This study proves that EEG is a reliable means to detect music enjoyment and has future applications, including personalized music recommendation, mood adjustment, and mental health therapy.

Keywords: EEG, electroencephalogram, machine learning, mood, music enjoyment, physiological signals

Procedia PDF Downloads 21
427 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder

Authors: Dua Hişam, Serhat İkizoğlu

Abstract:

Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.

Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting

Procedia PDF Downloads 42
426 Ethnic Identity Formation in Diaspora of Bajau Samah: An Ethnomusicological Study of Bertitik Music Ensemble in the Northwest Coast of Sabah, Malaysia

Authors: Mohd Hassan Abdullah, Mohd Azam Sulong, Mohd Nizam Nasrifan, Nor Azman Mohd Ramli, Suflan Faidzal Arshad

Abstract:

The Bajau Samah is a maritime ethnic community that inhabits the west coast of Sabah, Malaysia. The majority of these ethnicities embrace Islam and practice their own culture. Bertitik music ensemble is one of the musical practices performed in various social events, especially weddings. The ensemble, which combines several musical instruments including gongs, drums and kulintangan is played by six musicians to accompany various social events in the community. The position of the Bajau Samah in a multi-ethnic community such as Kadazandusun, Rungus, Suluk, Malay, Iranun and others exposes to the cultural activities with various artistic elements of the surrounding community. Western influences have also played an important role in the process of hybridity and acculturation in this society. Cultural change and the influx of foreign cultures have threatened the sustainability of this musical practice. This study aims to musicologically analyze the elements of bertitik ensemble that form the uniqueness of the cultural identity of the Bajau Samah Ethnic group. An ethnomusicological approach has been used to parse the essence of the bertitik music repertoire in depth. Ethnographic study design which comprises fieldwork, interviews, observations and document analysis as the main methods were utilized to collect data. Music recordings were transcribed in the form of musical notation and then analyzed based on the theory of "the norms of musical styles". This study reveals that musical elements featured in the ensemble represent the symbol and cultural identity to this ethnic group. The findings of the study were documented in the form of musicological analysis, audio and video as well as transcriptions of the musical notation of the repertoire of the music ensemble. This study is in line with the National cultural policy gazetted by the government, which is "Conservation, preservation and development of culture towards strengthening the foundations of National Culture through joint research, development, education, expansion and cultural relations" It will benefit various parties including students, teachers, academics, cultural arts activists and so on towards preserving the nation's cultural heritage as well as strengthening the spirit of nationhood among the people of various races and ethnic group in Malaysia.

Keywords: ethnomusicology, ethnic music, Malaysian music, cultural identity

Procedia PDF Downloads 110
425 Multiple Relaxation Times in the Gibbs Ensemble Monte Carlo Simulation of Phase Separation

Authors: Bina Kumari, Subir K. Sarkar, Pradipta Bandyopadhyay

Abstract:

The autocorrelation function of the density fluctuation is studied in each of the two phases in a Gibbs Ensemble Monte Carlo (GEMC) simulation of the problem of phase separation for a square well potential with various values of its range. We find that the normalized autocorrelation function is described very well as a linear combination of an exponential function with a time scale τ₂ and a stretched exponential function with a time scale τ₁ and an exponent α. Dependence of (α, τ₁, τ₂) on the parameters of the GEMC algorithm and the range of the square well potential is investigated and interpreted. We also analyse the issue of how to choose the parameters of the GEMC simulation optimally.

Keywords: autocorrelation function, density fluctuation, GEMC, simulation

Procedia PDF Downloads 156
424 Design of an Ensemble Learning Behavior Anomaly Detection Framework

Authors: Abdoulaye Diop, Nahid Emad, Thierry Winter, Mohamed Hilia

Abstract:

Data assets protection is a crucial issue in the cybersecurity field. Companies use logical access control tools to vault their information assets and protect them against external threats, but they lack solutions to counter insider threats. Nowadays, insider threats are the most significant concern of security analysts. They are mainly individuals with legitimate access to companies information systems, which use their rights with malicious intents. In several fields, behavior anomaly detection is the method used by cyber specialists to counter the threats of user malicious activities effectively. In this paper, we present the step toward the construction of a user and entity behavior analysis framework by proposing a behavior anomaly detection model. This model combines machine learning classification techniques and graph-based methods, relying on linear algebra and parallel computing techniques. We show the utility of an ensemble learning approach in this context. We present some detection methods tests results on an representative access control dataset. The use of some explored classifiers gives results up to 99% of accuracy.

Keywords: cybersecurity, data protection, access control, insider threat, user behavior analysis, ensemble learning, high performance computing

Procedia PDF Downloads 100
423 The Optimization of Decision Rules in Multimodal Decision-Level Fusion Scheme

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

This paper introduces an original method of parametric optimization of the structure for multimodal decision-level fusion scheme which combines the results of the partial solution of the classification task obtained from assembly of the mono-modal classifiers. As a result, a multimodal fusion classifier which has the minimum value of the total error rate has been obtained.

Keywords: classification accuracy, fusion solution, total error rate, multimodal fusion classifier

Procedia PDF Downloads 434
422 Pipat Ensemble and Music for Ligkey in Amphur Muaeng, Chachoengsao Province

Authors: Prasan Briboonnanggoul

Abstract:

The major objective of this research study was to explore some aspects of the performance culture of musical folk drama called Ligkey. This study was undertaken in an effect to focus on the specific functions of orchestra which accompanied Ligkey on Thai musical instruments in Chachoengsao Province. The process of study and exploration consisted of questionnaire, interview, a tape recording of an interview and photographs of performances which all of them were analyzed for the finding. The information obtained from the study indicated that Ligkey still received stable attention from people despite lesser performances affected by economics crisis. Almost all of the performances were organized and supported by both the public sector and the private sector. Based on the summary and finding of this study, a) there were ten Ligkey ensemble and ten orchestra which were Mon orchestra, not the precedent and the predecessor known as Thai orchestra; b) a variety of functions performed by musicians must harmonize discipline, punctuality, patience, no negligence, proficiency in performance; c) folklore melodies known as Plengnapad were performed as usual, but folklore melodies and songs known as Plangsongchan got lesser and got a tendency towards extinction because of the plot which corresponded with a market-driven entertainment. Therefore, a purpose-built schema of the preservation of Thai folklore songs was that they should have been recognized by both the performers and the audiences and patronized by the public sector via the government media to publicize the value of popular art form.

Keywords: Pipat Ensemble, Ligkey, Amphur Muaeng, Chachoengsao Province

Procedia PDF Downloads 301
421 SEMCPRA-Sar-Esembled Model for Climate Prediction in Remote Area

Authors: Kamalpreet Kaur, Renu Dhir

Abstract:

Climate prediction is an essential component of climate research, which helps evaluate possible effects on economies, communities, and ecosystems. Climate prediction involves short-term weather prediction, seasonal prediction, and long-term climate change prediction. Climate prediction can use the information gathered from satellites, ground-based stations, and ocean buoys, among other sources. The paper's four architectures, such as ResNet50, VGG19, Inception-v3, and Xception, have been combined using an ensemble approach for overall performance and robustness. An ensemble of different models makes a prediction, and the majority vote determines the final prediction. The various architectures such as ResNet50, VGG19, Inception-v3, and Xception efficiently classify the dataset RSI-CB256, which contains satellite images into cloudy and non-cloudy. The generated ensembled S-E model (Sar-ensembled model) provides an accuracy of 99.25%.

Keywords: climate, satellite images, prediction, classification

Procedia PDF Downloads 33
420 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 327
419 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
418 Automated Localization of Palpebral Conjunctiva and Hemoglobin Determination Using Smart Phone Camera

Authors: Faraz Tahir, M. Usman Akram, Albab Ahmad Khan, Mujahid Abbass, Ahmad Tariq, Nuzhat Qaiser

Abstract:

The objective of this study was to evaluate the Degree of anemia by taking the picture of the palpebral conjunctiva using Smartphone Camera. We have first localized the region of interest from the image and then extracted certain features from that Region of interest and trained SVM classifier on those features and then, as a result, our system classifies the image in real-time on their level of hemoglobin. The proposed system has given an accuracy of 70%. We have trained our classifier on a locally gathered dataset of 30 patients.

Keywords: anemia, palpebral conjunctiva, SVM, smartphone

Procedia PDF Downloads 453
417 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 21
416 Optimizing Approach for Sifting Process to Solve a Common Type of Empirical Mode Decomposition Mode Mixing

Authors: Saad Al-Baddai, Karema Al-Subari, Elmar Lang, Bernd Ludwig

Abstract:

Empirical mode decomposition (EMD), a new data-driven of time-series decomposition, has the advantage of supposing that a time series is non-linear or non-stationary, as is implicitly achieved in Fourier decomposition. However, the EMD suffers of mode mixing problem in some cases. The aim of this paper is to present a solution for a common type of signals causing of EMD mode mixing problem, in case a signal suffers of an intermittency. By an artificial example, the solution shows superior performance in terms of cope EMD mode mixing problem comparing with the conventional EMD and Ensemble Empirical Mode decomposition (EEMD). Furthermore, the over-sifting problem is also completely avoided; and computation load is reduced roughly six times compared with EEMD, an ensemble number of 50.

Keywords: empirical mode decomposition (EMD), mode mixing, sifting process, over-sifting

Procedia PDF Downloads 362
415 Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

Authors: Hamed Alqahtani, Manolya Kavakli-Thorne

Abstract:

The large pose discrepancy is one of the critical challenges in face recognition during video surveillance. Due to the entanglement of pose attributes with identity information, the conventional approaches for pose-independent representation lack in providing quality results in recognizing largely posed faces. In this paper, we propose a practical approach to disentangle the pose attribute from the identity information followed by synthesis of a face using a classifier network in latent space. The proposed approach employs a modified generative adversarial network framework consisting of an encoder-decoder structure embedded with a classifier in manifold space for carrying out factorization on the latent encoding. It can be further generalized to other face and non-face attributes for real-life video frames containing faces with significant attribute variations. Experimental results and comparison with state of the art in the field prove that the learned representation of the proposed approach synthesizes more compelling perceptual images through a combination of adversarial and classification losses.

Keywords: disentanglement, face detection, generative adversarial networks, video surveillance

Procedia PDF Downloads 92
414 The Design of a Vehicle Traffic Flow Prediction Model for a Gauteng Freeway Based on an Ensemble of Multi-Layer Perceptron

Authors: Tebogo Emma Makaba, Barnabas Ndlovu Gatsheni

Abstract:

The cities of Johannesburg and Pretoria both located in the Gauteng province are separated by a distance of 58 km. The traffic queues on the Ben Schoeman freeway which connects these two cities can stretch for almost 1.5 km. Vehicle traffic congestion impacts negatively on the business and the commuter’s quality of life. The goal of this paper is to identify variables that influence the flow of traffic and to design a vehicle traffic prediction model, which will predict the traffic flow pattern in advance. The model will unable motorist to be able to make appropriate travel decisions ahead of time. The data used was collected by Mikro’s Traffic Monitoring (MTM). Multi-Layer perceptron (MLP) was used individually to construct the model and the MLP was also combined with Bagging ensemble method to training the data. The cross—validation method was used for evaluating the models. The results obtained from the techniques were compared using predictive and prediction costs. The cost was computed using combination of the loss matrix and the confusion matrix. The predicted models designed shows that the status of the traffic flow on the freeway can be predicted using the following parameters travel time, average speed, traffic volume and day of month. The implications of this work is that commuters will be able to spend less time travelling on the route and spend time with their families. The logistics industry will save more than twice what they are currently spending.

Keywords: bagging ensemble methods, confusion matrix, multi-layer perceptron, vehicle traffic flow

Procedia PDF Downloads 313
413 Enhanced Extra Trees Classifier for Epileptic Seizure Prediction

Authors: Maurice Ntahobari, Levin Kuhlmann, Mario Boley, Zhinoos Razavi Hesabi

Abstract:

For machine learning based epileptic seizure prediction, it is important for the model to be implemented in small implantable or wearable devices that can be used to monitor epilepsy patients; however, current state-of-the-art methods are complex and computationally intensive. We use Shapley Additive Explanation (SHAP) to find relevant intracranial electroencephalogram (iEEG) features and improve the computational efficiency of a state-of-the-art seizure prediction method based on the extra trees classifier while maintaining prediction performance. Results for a small contest dataset and a much larger dataset with continuous recordings of up to 3 years per patient from 15 patients yield better than chance prediction performance (p < 0.004). Moreover, while the performance of the SHAP-based model is comparable to that of the benchmark, the overall training and prediction time of the model has been reduced by a factor of 1.83. It can also be noted that the feature called zero crossing value is the best EEG feature for seizure prediction. These results suggest state-of-the-art seizure prediction performance can be achieved using efficient methods based on optimal feature selection.

Keywords: machine learning, seizure prediction, extra tree classifier, SHAP, epilepsy

Procedia PDF Downloads 79
412 Enhancing Sell-In and Sell-Out Forecasting Using Ensemble Machine Learning Method

Authors: Vishal Das, Tianyi Mao, Zhicheng Geng, Carmen Flores, Diego Pelloso, Fang Wang

Abstract:

Accurate sell-in and sell-out forecasting is a ubiquitous problem in the retail industry. It is an important element of any demand planning activity. As a global food and beverage company, Nestlé has hundreds of products in each geographical location that they operate in. Each product has its sell-in and sell-out time series data, which are forecasted on a weekly and monthly scale for demand and financial planning. To address this challenge, Nestlé Chilein collaboration with Amazon Machine Learning Solutions Labhas developed their in-house solution of using machine learning models for forecasting. Similar products are combined together such that there is one model for each product category. In this way, the models learn from a larger set of data, and there are fewer models to maintain. The solution is scalable to all product categories and is developed to be flexible enough to include any new product or eliminate any existing product in a product category based on requirements. We show how we can use the machine learning development environment on Amazon Web Services (AWS) to explore a set of forecasting models and create business intelligence dashboards that can be used with the existing demand planning tools in Nestlé. We explored recent deep learning networks (DNN), which show promising results for a variety of time series forecasting problems. Specifically, we used a DeepAR autoregressive model that can group similar time series together and provide robust predictions. To further enhance the accuracy of the predictions and include domain-specific knowledge, we designed an ensemble approach using DeepAR and XGBoost regression model. As part of the ensemble approach, we interlinked the sell-out and sell-in information to ensure that a future sell-out influences the current sell-in predictions. Our approach outperforms the benchmark statistical models by more than 50%. The machine learning (ML) pipeline implemented in the cloud is currently being extended for other product categories and is getting adopted by other geomarkets.

Keywords: sell-in and sell-out forecasting, demand planning, DeepAR, retail, ensemble machine learning, time-series

Procedia PDF Downloads 213
411 Myers-Briggs Type Index Personality Type Classification Based on an Individual’s Spotify Playlists

Authors: Sefik Can Karakaya, Ibrahim Demir

Abstract:

In this study, the relationship between musical preferences and personality traits has been investigated in terms of Spotify audio analysis features. The aim of this paper is to build such a classifier capable of segmenting people into their Myers-Briggs Type Index (MBTI) personality type based on their Spotify playlists. Music takes an important place in the lives of people all over the world and online music streaming platforms make it easier to reach musical contents. In this context, the motivation to build such a classifier is allowing people to gain access to their MBTI personality type and perhaps for more reliably and more quickly. For this purpose, logistic regression and deep neural networks have been selected for classifier and their performances are compared. In conclusion, it has been found that musical preferences differ statistically between personality traits, and evaluated models are able to distinguish personality types based on given musical data structure with over %60 accuracy rate.

Keywords: myers-briggs type indicator, music psychology, Spotify, behavioural user profiling, deep neural networks, logistic regression

Procedia PDF Downloads 103
410 Intelligent Rheumatoid Arthritis Identification System Based Image Processing and Neural Classifier

Authors: Abdulkader Helwan

Abstract:

Rheumatoid joint inflammation is characterized as a perpetual incendiary issue which influences the joints by hurting body tissues Therefore, there is an urgent need for an effective intelligent identification system of knee Rheumatoid arthritis especially in its early stages. This paper is to develop a new intelligent system for the identification of Rheumatoid arthritis of the knee utilizing image processing techniques and neural classifier. The system involves two principle stages. The first one is the image processing stage in which the images are processed using some techniques such as RGB to gryascale conversion, rescaling, median filtering, background extracting, images subtracting, segmentation using canny edge detection, and features extraction using pattern averaging. The extracted features are used then as inputs for the neural network which classifies the X-ray knee images as normal or abnormal (arthritic) based on a backpropagation learning algorithm which involves training of the network on 400 X-ray normal and abnormal knee images. The system was tested on 400 x-ray images and the network shows good performance during that phase, resulting in a good identification rate 97%.

Keywords: rheumatoid arthritis, intelligent identification, neural classifier, segmentation, backpropoagation

Procedia PDF Downloads 508
409 Coal Preparation Plant:Technology Overview and New Adaptations

Authors: Amit Kumar Sinha

Abstract:

A coal preparation plant typically operates with multiple beneficiation circuits to process individual size fractions of coal obtained from mine so that the targeted overall plant efficiency in terms of yield and ash is achieved. Conventional coal beneficiation plant in India or overseas operates generally in two methods of processing; coarse beneficiation with treatment in dense medium cyclones or in baths and fines beneficiation with treatment in flotation cell. This paper seeks to address the proven application of intermediate circuit along with coarse and fines circuit in Jamadoba New Coal Preparation Plant of capacity 2 Mt/y to treat -0.5 mm+0.25 mm size particles in reflux classifier. Previously this size of particles was treated directly in Flotation cell which had operational and metallurgical limitations which will be discussed in brief in this paper. The paper also details test work results performed on the representative samples of TSL coal washeries to determine the top size of intermediate and fines circuit and discusses about the overlapping process of intermediate circuit and how it is process wise suitable to beneficiate misplaced particles from coarse circuit and fines circuit. This paper also compares the separation efficiency (Ep) of various intermediate circuit process equipment and tries to validate the use of reflux classifier over fine coal DMC or spirals. An overview of Modern coal preparation plant treating Indian coal especially Washery Grade IV coal with reference to Jamadoba New Coal Preparation Plant which was commissioned in 2018 with basis of selection of equipment and plant profile, application of reflux classifier in intermediate circuit and process design criteria is also outlined in this paper.

Keywords: intermediate circuit, overlapping process, reflux classifier

Procedia PDF Downloads 107