Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 340

Search results for: supervised%20classification

280 A Review of Deep Learning Methods in Computer-Aided Detection and Diagnosis Systems based on Whole Mammogram and Ultrasound Scan Classification

Abstract:

Breast cancer remains to be one of the deadliest cancers for women worldwide, with the risk of developing tumors being as high as 50 percent in Sub-Saharan African countries like Kenya. With as many as 42 percent of these cases set to be diagnosed late when cancer has metastasized and or the prognosis has become terminal, Full Field Digital [FFD] Mammography remains an effective screening technique that leads to early detection where in most cases, successful interventions can be made to control or eliminate the tumors altogether. FFD Mammograms have been proven to multiply more effective when used together with Computer-Aided Detection and Diagnosis [CADe] systems, relying on algorithmic implementations of Deep Learning techniques in Computer Vision to carry out deep pattern recognition that is comparable to the level of a human radiologist and decipher whether specific areas of interest in the mammogram scan image portray abnormalities if any and whether these abnormalities are indicative of a benign or malignant tumor. Within this paper, we review emergent Deep Learning techniques that will prove relevant to the development of State-of-The-Art FFD Mammogram CADe systems. These techniques will span self-supervised learning for context-encoded occlusion, self-supervised learning for pre-processing and labeling automation, as well as the creation of a standardized large-scale mammography dataset as a benchmark for CADe systems' evaluation. Finally, comparisons are drawn between existing practices that pre-date these techniques and how the development of CADe systems that incorporate them will be different.

Keywords: breast cancer diagnosis, computer aided detection and diagnosis, deep learning, whole mammogram classfication, ultrasound classification, computer vision

Procedia PDF Downloads 67

279 Early Gastric Cancer Prediction from Diet and Epidemiological Data Using Machine Learning in Mizoram Population

Authors: Brindha Senthil Kumar, Payel Chakraborty, Senthil Kumar Nachimuthu, Arindam Maitra, Prem Nath

Abstract:

Gastric cancer is predominantly caused by demographic and diet factors as compared to other cancer types. The aim of the study is to predict Early Gastric Cancer (ECG) from diet and lifestyle factors using supervised machine learning algorithms. For this study, 160 healthy individual and 80 cases were selected who had been followed for 3 years (2016-2019), at Civil Hospital, Aizawl, Mizoram. A dataset containing 11 features that are core risk factors for the gastric cancer were extracted. Supervised machine algorithms: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Multilayer perceptron, and Random Forest were used to analyze the dataset using Python Jupyter Notebook Version 3. The obtained classified results had been evaluated using metrics parameters: minimum_false_positives, brier_score, accuracy, precision, recall, F1_score, and Receiver Operating Characteristics (ROC) curve. Data analysis results showed Naive Bayes - 88, 0.11; Random Forest - 83, 0.16; SVM - 77, 0.22; Logistic Regression - 75, 0.25 and Multilayer perceptron - 72, 0.27 with respect to accuracy and brier_score in percent. Naive Bayes algorithm out performs with very low false positive rates as well as brier_score and good accuracy. Naive Bayes algorithm classification results in predicting ECG showed very satisfactory results using only diet cum lifestyle factors which will be very helpful for the physicians to educate the patients and public, thereby mortality of gastric cancer can be reduced/avoided with this knowledge mining work.

Keywords: Early Gastric cancer, Machine Learning, Diet, Lifestyle Characteristics

Procedia PDF Downloads 112

278 An Integrated Label Propagation Network for Structural Condition Assessment

Authors: Qingsong Xiong, Cheng Yuan, Qingzhao Kong, Haibei Xiong

Abstract:

Deep-learning-driven approaches based on vibration responses have attracted larger attention in rapid structural condition assessment while obtaining sufficient measured training data with corresponding labels is relevantly costly and even inaccessible in practical engineering. This study proposes an integrated label propagation network for structural condition assessment, which is able to diffuse the labels from continuously-generating measurements by intact structure to those of missing labels of damage scenarios. The integrated network is embedded with damage-sensitive features extraction by deep autoencoder and pseudo-labels propagation by optimized fuzzy clustering, the architecture and mechanism which are elaborated. With a sophisticated network design and specified strategies for improving performance, the present network achieves to extends the superiority of self-supervised representation learning, unsupervised fuzzy clustering and supervised classification algorithms into an integration aiming at assessing damage conditions. Both numerical simulations and full-scale laboratory shaking table tests of a two-story building structure were conducted to validate its capability of detecting post-earthquake damage. The identifying accuracy of a present network was 0.95 in numerical validations and an average 0.86 in laboratory case studies, respectively. It should be noted that the whole training procedure of all involved models in the network stringently doesn’t rely upon any labeled data of damage scenarios but only several samples of intact structure, which indicates a significant superiority in model adaptability and feasible applicability in practice.

Keywords: autoencoder, condition assessment, fuzzy clustering, label propagation

Procedia PDF Downloads 68

277 Normalizing Flow to Augmented Posterior: Conditional Density Estimation with Interpretable Dimension Reduction for High Dimensional Data

Authors: Cheng Zeng, George Michailidis, Hitoshi Iyatomi, Leo L. Duan

Abstract:

The conditional density characterizes the distribution of a response variable y given other predictor x and plays a key role in many statistical tasks, including classification and outlier detection. Although there has been abundant work on the problem of Conditional Density Estimation (CDE) for a low-dimensional response in the presence of a high-dimensional predictor, little work has been done for a high-dimensional response such as images. The promising performance of normalizing flow (NF) neural networks in unconditional density estimation acts as a motivating starting point. In this work, the authors extend NF neural networks when external x is present. Specifically, they use the NF to parameterize a one-to-one transform between a high-dimensional y and a latent z that comprises two components [zₚ, zₙ]. The zₚ component is a low-dimensional subvector obtained from the posterior distribution of an elementary predictive model for x, such as logistic/linear regression. The zₙ component is a high-dimensional independent Gaussian vector, which explains the variations in y not or less related to x. Unlike existing CDE methods, the proposed approach coined Augmented Posterior CDE (AP-CDE) only requires a simple modification of the common normalizing flow framework while significantly improving the interpretation of the latent component since zₚ represents a supervised dimension reduction. In image analytics applications, AP-CDE shows good separation of 𝑥-related variations due to factors such as lighting condition and subject id from the other random variations. Further, the experiments show that an unconditional NF neural network based on an unsupervised model of z, such as a Gaussian mixture, fails to generate interpretable results.

Keywords: conditional density estimation, image generation, normalizing flow, supervised dimension reduction

Procedia PDF Downloads 62

276 Machine Learning Techniques for COVID-19 Detection: A Comparative Analysis

Authors: Abeer A. Aljohani

Abstract:

COVID-19 virus spread has been one of the extreme pandemics across the globe. It is also referred to as coronavirus, which is a contagious disease that continuously mutates into numerous variants. Currently, the B.1.1.529 variant labeled as omicron is detected in South Africa. The huge spread of COVID-19 disease has affected several lives and has surged exceptional pressure on the healthcare systems worldwide. Also, everyday life and the global economy have been at stake. This research aims to predict COVID-19 disease in its initial stage to reduce the death count. Machine learning (ML) is nowadays used in almost every area. Numerous COVID-19 cases have produced a huge burden on the hospitals as well as health workers. To reduce this burden, this paper predicts COVID-19 disease is based on the symptoms and medical history of the patient. This research presents a unique architecture for COVID-19 detection using ML techniques integrated with feature dimensionality reduction. This paper uses a standard UCI dataset for predicting COVID-19 disease. This dataset comprises symptoms of 5434 patients. This paper also compares several supervised ML techniques to the presented architecture. The architecture has also utilized 10-fold cross validation process for generalization and the principal component analysis (PCA) technique for feature reduction. Standard parameters are used to evaluate the proposed architecture including F1-Score, precision, accuracy, recall, receiver operating characteristic (ROC), and area under curve (AUC). The results depict that decision tree, random forest, and neural networks outperform all other state-of-the-art ML techniques. This achieved result can help effectively in identifying COVID-19 infection cases.

Keywords: supervised machine learning, COVID-19 prediction, healthcare analytics, random forest, neural network

Procedia PDF Downloads 60

275 Predicting Loss of Containment in Surface Pipeline using Computational Fluid Dynamics and Supervised Machine Learning Model to Improve Process Safety in Oil and Gas Operations

Authors: Muhammmad Riandhy Anindika Yudhy, Harry Patria, Ramadhani Santoso

Abstract:

Loss of containment is the primary hazard that process safety management is concerned within the oil and gas industry. Escalation to more serious consequences all begins with the loss of containment, starting with oil and gas release from leakage or spillage from primary containment resulting in pool fire, jet fire and even explosion when reacted with various ignition sources in the operations. Therefore, the heart of process safety management is avoiding loss of containment and mitigating its impact through the implementation of safeguards. The most effective safeguard for the case is an early detection system to alert Operations to take action prior to a potential case of loss of containment. The detection system value increases when applied to a long surface pipeline that is naturally difficult to monitor at all times and is exposed to multiple causes of loss of containment, from natural corrosion to illegal tapping. Based on prior researches and studies, detecting loss of containment accurately in the surface pipeline is difficult. The trade-off between cost-effectiveness and high accuracy has been the main issue when selecting the traditional detection method. The current best-performing method, Real-Time Transient Model (RTTM), requires analysis of closely positioned pressure, flow and temperature (PVT) points in the pipeline to be accurate. Having multiple adjacent PVT sensors along the pipeline is expensive, hence generally not a viable alternative from an economic standpoint.A conceptual approach to combine mathematical modeling using computational fluid dynamics and a supervised machine learning model has shown promising results to predict leakage in the pipeline. Mathematical modeling is used to generate simulation data where this data is used to train the leak detection and localization models. Mathematical models and simulation software have also been shown to provide comparable results with experimental data with very high levels of accuracy. While the supervised machine learning model requires a large training dataset for the development of accurate models, mathematical modeling has been shown to be able to generate the required datasets to justify the application of data analytics for the development of model-based leak detection systems for petroleum pipelines. This paper presents a review of key leak detection strategies for oil and gas pipelines, with a specific focus on crude oil applications, and presents the opportunities for the use of data analytics tools and mathematical modeling for the development of robust real-time leak detection and localization system for surface pipelines. A case study is also presented.

Keywords: pipeline, leakage, detection, AI

Procedia PDF Downloads 143

274 DNA Methylation Score Development for In utero Exposure to Paternal Smoking Using a Supervised Machine Learning Approach

Authors: Cristy Stagnar, Nina Hubig, Diana Ivankovic

Abstract:

The epigenome is a compelling candidate for mediating long-term responses to environmental effects modifying disease risk. The main goal of this research is to develop a machine learning-based DNA methylation score, which will be valuable in delineating the unique contribution of paternal epigenetic modifications to the germline impacting childhood health outcomes. It will also be a useful tool in validating self-reports of nonsmoking and in adjusting epigenome-wide DNA methylation association studies for this early-life exposure. Using secondary data from two population-based methylation profiling studies, our DNA methylation score is based on CpG DNA methylation measurements from cord blood gathered from children whose fathers smoked pre- and peri-conceptually. Each child’s mother and father fell into one of three class labels in the accompanying questionnaires -never smoker, former smoker, or current smoker. By applying different machine learning algorithms to the accessible resource for integrated epigenomic studies (ARIES) sub-study of the Avon longitudinal study of parents and children (ALSPAC) data set, which we used for training and testing of our model, the best-performing algorithm for classifying the father smoker and mother never smoker was selected based on Cohen’s κ. Error in the model was identified and optimized. The final DNA methylation score was further tested and validated in an independent data set. This resulted in a linear combination of methylation values of selected probes via a logistic link function that accurately classified each group and contributed the most towards classification. The result is a unique, robust DNA methylation score which combines information on DNA methylation and early life exposure of offspring to paternal smoking during pregnancy and which may be used to examine the paternal contribution to offspring health outcomes.

Keywords: epigenome, health outcomes, paternal preconception environmental exposures, supervised machine learning

Procedia PDF Downloads 163

273 Assessment of Rangeland Condition in a Dryland System Using UAV-Based Multispectral Imagery

Authors: Vistorina Amputu, Katja Tielboerger, Nichola Knox

Abstract:

Primary productivity in dry savannahs is constraint by moisture availability and under increasing anthropogenic pressure. Thus, considering climate change and the unprecedented pace and scale of rangeland deterioration, methods for assessing the status of such rangelands should be easy to apply, yield reliable and repeatable results that can be applied over large spatial scales. Global and local scale monitoring of rangelands through satellite data and labor-intensive field measurements respectively, are limited in accurately assessing the spatiotemporal heterogeneity of vegetation dynamics to provide crucial information that detects degradation in its early stages. Fortunately, newly emerging techniques such as unmanned aerial vehicles (UAVs), associated miniaturized sensors and improving digital photogrammetric software provide an opportunity to transcend these limitations. Yet, they have not been extensively calibrated in natural systems to encompass their complexities if they are to be integrated for long-term monitoring. Limited research using drone technology has been conducted in arid savannas, for example to assess the health status of this dynamic two-layer vegetation ecosystem. In our study, we fill this gap by testing the relationship between UAV-estimated cover of rangeland functional attributes and field data collected in discrete sample plots in a Namibian dryland savannah along a degradation gradient. The first results are based on a supervised classification performed on the ultra-high resolution multispectral imagery to distinguish between rangeland functional attributes (bare, non-woody, and woody), with a relatively good match to the field observations. Integrating UAV-based observations to improve rangeland monitoring could greatly assist in climate-adapted rangeland management.

Keywords: arid savannah, degradation gradient, field observations, narrow-band sensor, supervised classification

Procedia PDF Downloads 94

272 An Approximation Technique to Automate Tron

Authors: P. Jayashree, S. Rajkumar

Abstract:

With the trend of virtual and augmented reality environments booming to provide a life like experience, gaming is a major tool in supporting such learning environments. In this work, a variant of Voronoi heuristics, employing supervised learning for the TRON game is proposed. The paper discusses the features that would be really useful when a machine learning bot is to be used as an opponent against a human player. Various game scenarios, nature of the bot and the experimental results are provided for the proposed variant to prove that the approach is better than those that are currently followed.

Keywords: artificial Intelligence, automation, machine learning, TRON game, Voronoi heuristics

Procedia PDF Downloads 432

271 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 138

270 Towards Learning Query Expansion

Authors: Ahlem Bouziri, Chiraz Latiri, Eric Gaussier

Abstract:

The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to extract dependencies between terms, namely a generic basis of association rules between terms. The key feature of our approach is a better trade off between the size of the mining result and the conveyed knowledge. Thus, face to the huge number of derived association rules and in order to select the optimal combination of query terms from the generic basis, we propose to model the problem as a classification problem and solve it using a supervised learning algorithm such as SVM or k-means. For this purpose, we first generate a training set using a genetic algorithm based approach that explores the association rules space in order to find an optimal set of expansion terms, improving the MAP of the search results. The experiments were performed on SDA 95 collection, a data collection for information retrieval. It was found that the results were better in both terms of MAP and NDCG. The main observation is that the hybridization of text mining techniques and query expansion in an intelligent way allows us to incorporate the good features of all of them. As this is a preliminary attempt in this direction, there is a large scope for enhancing the proposed method.

Keywords: supervised leaning, classification, query expansion, association rules

Procedia PDF Downloads 297

269 Flood Hazard Assessment and Land Cover Dynamics of the Orai Khola Watershed, Bardiya, Nepal

Authors: Loonibha Manandhar, Rajendra Bhandari, Kumud Raj Kafle

Abstract:

Nepal’s Terai region is a part of the Ganges river basin which is one of the most disaster-prone areas of the world, with recurrent monsoon flooding causing millions in damage and the death and displacement of hundreds of people and households every year. The vulnerability of human settlements to natural disasters such as floods is increasing, and mapping changes in land use practices and hydro-geological parameters is essential in developing resilient communities and strong disaster management policies. The objective of this study was to develop a flood hazard zonation map of Orai Khola watershed and map the decadal land use/land cover dynamics of the watershed. The watershed area was delineated using SRTM DEM, and LANDSAT images were classified into five land use classes (forest, grassland, sediment and bare land, settlement area and cropland, and water body) using pixel-based semi-automated supervised maximum likelihood classification. Decadal changes in each class were then quantified using spatial modelling. Flood hazard mapping was performed by assigning weights to factors slope, rainfall distribution, distance from the river and land use/land cover on the basis of their estimated influence in causing flood hazard and performing weighed overlay analysis to identify areas that are highly vulnerable. The forest and grassland coverage increased by 11.53 km² (3.8%) and 1.43 km² (0.47%) from 1996 to 2016. The sediment and bare land areas decreased by 12.45 km² (4.12%) from 1996 to 2016 whereas settlement and cropland areas showed a consistent increase to 14.22 km² (4.7%). Waterbody coverage also increased to 0.3 km² (0.09%) from 1996-2016. 1.27% (3.65 km²) of total watershed area was categorized into very low hazard zone, 20.94% (60.31 km²) area into low hazard zone, 37.59% (108.3 km²) area into moderate hazard zone, 29.25% (84.27 km²) area into high hazard zone and 31 villages which comprised 10.95% (31.55 km²) were categorized into high hazard zone area.

Keywords: flood hazard, land use/land cover, Orai river, supervised maximum likelihood classification, weighed overlay analysis

Procedia PDF Downloads 314

268 Deleterious SNP’s Detection Using Machine Learning

Authors: Hamza Zidoum

Abstract:

This paper investigates the impact of human genetic variation on the function of human proteins using machine-learning algorithms. Single-Nucleotide Polymorphism represents the most common form of human genome variation. We focus on the single amino-acid polymorphism located in the coding region as they can affect the protein function leading to pathologic phenotypic change. We use several supervised Machine Learning methods to identify structural properties correlated with increased risk of the missense mutation being damaging. SVM associated with Principal Component Analysis give the best performance.

Keywords: single-nucleotide polymorphism, machine learning, feature selection, SVM

Procedia PDF Downloads 346

267 Identification and Classification of Fiber-Fortified Semolina by Near-Infrared Spectroscopy (NIR)

Authors: Amanda T. Badaró, Douglas F. Barbin, Sofia T. Garcia, Maria Teresa P. S. Clerici, Amanda R. Ferreira

Abstract:

Food fortification is the intentional addition of a nutrient in a food matrix and has been widely used to overcome the lack of nutrients in the diet or increasing the nutritional value of food. Fortified food must meet the demand of the population, taking into account their habits and risks that these foods may cause. Wheat and its by-products, such as semolina, has been strongly indicated to be used as a food vehicle since it is widely consumed and used in the production of other foods. These products have been strategically used to add some nutrients, such as fibers. Methods of analysis and quantification of these kinds of components are destructive and require lengthy sample preparation and analysis. Therefore, the industry has searched for faster and less invasive methods, such as Near-Infrared Spectroscopy (NIR). NIR is a rapid and cost-effective method, however, it is based on indirect measurements, yielding high amount of data. Therefore, NIR spectroscopy requires calibration with mathematical and statistical tools (Chemometrics) to extract analytical information from the corresponding spectra, as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). PCA is well suited for NIR, once it can handle many spectra at a time and be used for non-supervised classification. Advantages of the PCA, which is also a data reduction technique, is that it reduces the data spectra to a smaller number of latent variables for further interpretation. On the other hand, LDA is a supervised method that searches the Canonical Variables (CV) with the maximum separation among different categories. In LDA, the first CV is the direction of maximum ratio between inter and intra-class variances. The present work used a portable infrared spectrometer (NIR) for identification and classification of pure and fiber-fortified semolina samples. The fiber was added to semolina in two different concentrations, and after the spectra acquisition, the data was used for PCA and LDA to identify and discriminate the samples. The results showed that NIR spectroscopy associate to PCA was very effective in identifying pure and fiber-fortified semolina. Additionally, the classification range of the samples using LDA was between 78.3% and 95% for calibration and 75% and 95% for cross-validation. Thus, after the multivariate analysis such as PCA and LDA, it was possible to verify that NIR associated to chemometric methods is able to identify and classify the different samples in a fast and non-destructive way.

Keywords: Chemometrics, fiber, linear discriminant analysis, near-infrared spectroscopy, principal component analysis, semolina

Procedia PDF Downloads 183

266 Child Protection Decision Making in England and Finland: A Comparative Analysis

Authors: Rachel Falconer

Abstract:

Background: The United Nations Convention on the Rights of the Child sets out the duties placed on signatory nations to take measures to protect children from all forms of violence, abuse, neglect and maltreatment. The systems for ensuring this protection vary globally, shaped by national welfare policies. In England and Finland, past research has highlighted differences in how child protection issues are framed and how state agencies respond. However, less is known about how such differences impact processes of social work judgment and decision making in practice. Method: Data was collected as part of a wider PhD project in three stages. First, social workers in sites across England and Finland were asked to complete a short questionnaire. Participants were then asked to comment on two constructed case vignettes, and were interviewed about their experiences of child protection decision making at the point of referral. Interviews were analyzed using NVivo to draw out key themes. Findings: There were similarities in how the English and Finnish social workers responded to the case vignettes; for example, participants in both countries expressed concerns about similar risk factors and all felt further assessment was needed. Differences were observed, in particular, in regard to the sources of support and guidance participants referred to, with the English social workers appearing to rely more upon managerial input for their decisions than the Finnish social workers. These findings suggest evidence for two distinct decision making approaches: ‘supervised’ and ‘supported’ judgement. Implications for practice: The findings have relevance to the conference theme of research and evaluation of social work practice, and support the findings of previous studies that have emphasized the significance of organizational factors in child protection decision making. The comparative methodology has also helped to demonstrate how organizational factors can influence practice in different child protection system ‘orientations’. The presentation will discuss the potential practice implications of ‘supervised’, manager-led approaches to decision making as contrasted with ‘supported’, team-led approaches, inviting discussion about the relevance of these findings for social work in other countries.

Keywords: child protection, comparative research, decision making, social work, vignettes

Procedia PDF Downloads 233

265 Extracting Attributes for Twitter Hashtag Communities

Authors: Ashwaq Alsulami, Jianhua Shao

Abstract:

Various organisations often need to understand discussions on social media, such as what trending topics are and characteristics of the people engaged in the discussion. A number of approaches have been proposed to extract attributes that would characterise a discussion group. However, these approaches are largely based on supervised learning, and as such they require a large amount of labelled data. We propose an approach in this paper that does not require labelled data, but rely on lexical sources to detect meaningful attributes for online discussion groups. Our findings show an acceptable level of accuracy in detecting attributes for Twitter discussion groups.

Keywords: attributed community, attribute detection, community, social network

Procedia PDF Downloads 126

264 Landsat Data from Pre Crop Season to Estimate the Area to Be Planted with Summer Crops

Authors: Valdir Moura, Raniele dos Anjos de Souza, Fernando Gomes de Souza, Jose Vagner da Silva, Jerry Adriani Johann

Abstract:

The estimate of the Area of Land to be planted with annual crops and its stratification by the municipality are important variables in crop forecast. Nowadays in Brazil, these information’s are obtained by the Brazilian Institute of Geography and Statistics (IBGE) and published under the report Assessment of the Agricultural Production. Due to the high cloud cover in the main crop growing season (October to March) it is difficult to acquire good orbital images. Thus, one alternative is to work with remote sensing data from dates before the crop growing season. This work presents the use of multitemporal Landsat data gathered on July and September (before the summer growing season) in order to estimate the area of land to be planted with summer crops in an area of São Paulo State, Brazil. Geographic Information Systems (GIS) and digital image processing techniques were applied for the treatment of the available data. Supervised and non-supervised classifications were used for data in digital number and reflectance formats and the multitemporal Normalized Difference Vegetation Index (NDVI) images. The objective was to discriminate the tracts with higher probability to become planted with summer crops. Classification accuracies were evaluated using a sampling system developed basically for this study region. The estimated areas were corrected using the error matrix derived from these evaluations. The classification techniques presented an excellent level according to the kappa index. The proportion of crops stratified by municipalities was derived by a field work during the crop growing season. These proportion coefficients were applied onto the area of land to be planted with summer crops (derived from Landsat data). Thus, it was possible to derive the area of each summer crop by the municipality. The discrepancies between official statistics and our results were attributed to the sampling and the stratification procedures. Nevertheless, this methodology can be improved in order to provide good crop area estimates using remote sensing data, despite the cloud cover during the growing season.

Keywords: area intended for summer culture, estimated area planted, agriculture, Landsat, planting schedule

Procedia PDF Downloads 114

263 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 512

262 A Machine Learning Approach for Classification of Directional Valve Leakage in the Hydraulic Final Test

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Due to increasing cost pressure in global markets, artificial intelligence is becoming a technology that is decisive for competition. Predictive quality enables machinery and plant manufacturers to ensure product quality by using data-driven forecasts via machine learning models as a decision-making basis for test results. The use of cross-process Bosch production data along the value chain of hydraulic valves is a promising approach to classifying the quality characteristics of workpieces.

Keywords: predictive quality, hydraulics, machine learning, classification, supervised learning

Procedia PDF Downloads 201

261 Remote Sensing and GIS for Land Use Change Assessment: Case Study of Oued Bou Hamed Watershed, Southern Tunisia

Authors: Ouerchefani Dalel, Mahdhaoui Basma

Abstract:

Land use change is one of the important factors needed to evaluate later on the impact of human actions on land degradation. This work present the application of a methodology based on remote sensing for evaluation land use change in an arid region of Tunisia. This methodology uses Landsat TM and ETM+ images to produce land use maps by supervised classification based on ground truth region of interests. This study showed that it was possible to rely on radiometric values of the pixels to define each land use class in the field. It was also possible to generate 3 land use classes of the same study area between 1988 and 2011.

Keywords: land use, change, remote sensing, GIS

Procedia PDF Downloads 514

260 Determination of Carbofuran Residue in Brinjal (Solanum melongena L.) and Soil of Brinjal Field

Authors: R. Islam, M. A. Haque, K. H. Kabir

Abstract:

A supervised trail was set with brinjal at research field, Entomology Division, Bangladesh Agricultural Research Institute, Joydebpur, Gazipur to determine the residue of Carbofuran in soil and fruit samples at different days after application (DAA) of Furadan 5 G @ 2 kg AI/ ha. Field collected samples were analyzed by GCMS-EI. Results of the experiment indicated the presence of Carbofuran residue up to 60 DAA in soil samples and 25 DAA in brinjal fruit samples. In case of soil samples, the detected residues were 7.04, 2.78, 0.79, 0.43, 0.12, 0.06 and 0.05 ppm at 0, 2, 5, 10, 20, 30 and 60 DAA respectively. On the other hand, in brinjal fruit samples Carbofuran residues were 0.005 ppm, 0.095 ppm, 0.084 ppm, 0.065 ppm, 0.063 ppm, 0.056 ppm, 0.050 ppm, 0.030 ppm and 0.016 ppm at 0, 2, 4, 6, 8, 10, 12, 15 and 25-DAA, respectively. None of this amount was above the recommended MRL (0.1 mg / kg crop) of Carborufan for agricultural crops.

Keywords: brinjal, carbofuran, MRL, residue

Procedia PDF Downloads 480

259 Human Identification Using Local Roughness Patterns in Heartbeat Signal

Authors: Md. Khayrul Bashar, Md. Saiful Islam, Kimiko Yamashita, Yano Midori

Abstract:

Despite having some progress in human authentication, conventional biometrics (e.g., facial features, fingerprints, retinal scans, gait, voice patterns) are not robust against falsification because they are neither confidential nor secret to an individual. As a non-invasive tool, electrocardiogram (ECG) has recently shown a great potential in human recognition due to its unique rhythms characterizing the variability of human heart structures (chest geometry, sizes, and positions). Moreover, ECG has a real-time vitality characteristic that signifies the live signs, which ensure legitimate individual to be identified. However, the detection accuracy of the current ECG-based methods is not sufficient due to a high variability of the individual’s heartbeats at a different instance of time. These variations may occur due to muscle flexure, the change of mental or emotional states, and the change of sensor positions or long-term baseline shift during the recording of ECG signal. In this study, a new method is proposed for human identification, which is based on the extraction of the local roughness of ECG heartbeat signals. First ECG signal is preprocessed using a second order band-pass Butterworth filter having cut-off frequencies of 0.00025 and 0.04. A number of local binary patterns are then extracted by applying a moving neighborhood window along the ECG signal. At each instant of the ECG signal, the pattern is formed by comparing the ECG intensities at neighboring time points with the central intensity in the moving window. Then, binary weights are multiplied with the pattern to come up with the local roughness description of the signal. Finally, histograms are constructed that describe the heartbeat signals of individual subjects in the database. One advantage of the proposed feature is that it does not depend on the accuracy of detecting QRS complex, unlike the conventional methods. Supervised recognition methods are then designed using minimum distance to mean and Bayesian classifiers to identify authentic human subjects. An experiment with sixty (60) ECG signals from sixty adult subjects from National Metrology Institute of Germany (NMIG) - PTB database, showed that the proposed new method is promising compared to a conventional interval and amplitude feature-based method.

Keywords: human identification, ECG biometrics, local roughness patterns, supervised classification

Procedia PDF Downloads 375

258 Leveraging Unannotated Data to Improve Question Answering for French Contract Analysis

Authors: Touila Ahmed, Elie Louis, Hamza Gharbi

Abstract:

State of the art question answering models have recently shown impressive performance especially in a zero-shot setting. This approach is particularly useful when confronted with a highly diverse domain such as the legal field, in which it is increasingly difficult to have a dataset covering every notion and concept. In this work, we propose a flexible generative question answering approach to contract analysis as well as a weakly supervised procedure to leverage unannotated data and boost our models’ performance in general, and their zero-shot performance in particular.

Keywords: question answering, contract analysis, zero-shot, natural language processing, generative models, self-supervision

Procedia PDF Downloads 142

257 A Machine Learning Approach for the Leakage Classification in the Hydraulic Final Test

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

The widespread use of machine learning applications in production is significantly accelerated by improved computing power and increasing data availability. Predictive quality enables the assurance of product quality by using machine learning models as a basis for decisions on test results. The use of real Bosch production data based on geometric gauge blocks from machining, mating data from assembly and hydraulic measurement data from final testing of directional valves is a promising approach to classifying the quality characteristics of workpieces.

Keywords: machine learning, classification, predictive quality, hydraulics, supervised learning

Procedia PDF Downloads 160

256 Double Clustering as an Unsupervised Approach for Order Picking of Distributed Warehouses

Authors: Hsin-Yi Huang, Ming-Sheng Liu, Jiun-Yan Shiau

Abstract:

Planning the order picking lists of warehouses to achieve when the costs associated with logistics on the operational performance is a significant challenge. In e-commerce era, this task is especially important productive processes are high. Nowadays, many order planning techniques employ supervised machine learning algorithms. However, the definition of which features should be processed by such algorithms is not a simple task, being crucial to the proposed technique’s success. Against this background, we consider whether unsupervised algorithms can enhance the planning of order-picking lists. A Zone2 picking approach, which is based on using clustering algorithms twice, is developed. A simplified example is given to demonstrate the merit of our approach.

Keywords: order picking, warehouse, clustering, unsupervised learning

Procedia PDF Downloads 120

255 Optimize Data Evaluation Metrics for Fraud Detection Using Machine Learning

Authors: Jennifer Leach, Umashanger Thayasivam

Abstract:

The use of technology has benefited society in more ways than one ever thought possible. Unfortunately, though, as society’s knowledge of technology has advanced, so has its knowledge of ways to use technology to manipulate people. This has led to a simultaneous advancement in the world of fraud. Machine learning techniques can offer a possible solution to help decrease this advancement. This research explores how the use of various machine learning techniques can aid in detecting fraudulent activity across two different types of fraudulent data, and the accuracy, precision, recall, and F1 were recorded for each method. Each machine learning model was also tested across five different training and testing splits in order to discover which testing split and technique would lead to the most optimal results.

Keywords: data science, fraud detection, machine learning, supervised learning

Procedia PDF Downloads 156

254 Identification of Damage Mechanisms in Interlock Reinforced Composites Using a Pattern Recognition Approach of Acoustic Emission Data

Authors: M. Kharrat, G. Moreau, Z. Aboura

Abstract:

The latest advances in the weaving industry, combined with increasingly sophisticated means of materials processing, have made it possible to produce complex 3D composite structures. Mainly used in aeronautics, composite materials with 3D architecture offer better mechanical properties than 2D reinforced composites. Nevertheless, these materials require a good understanding of their behavior. Because of the complexity of such materials, the damage mechanisms are multiple, and the scenario of their appearance and evolution depends on the nature of the exerted solicitations. The AE technique is a well-established tool for discriminating between the damage mechanisms. Suitable sensors are used during the mechanical test to monitor the structural health of the material. Relevant AE-features are then extracted from the recorded signals, followed by a data analysis using pattern recognition techniques. In order to better understand the damage scenarios of interlock composite materials, a multi-instrumentation was set-up in this work for tracking damage initiation and development, especially in the vicinity of the first significant damage, called macro-damage. The deployed instrumentation includes video-microscopy, Digital Image Correlation, Acoustic Emission (AE) and micro-tomography. In this study, a multi-variable AE data analysis approach was developed for the discrimination between the different signal classes representing the different emission sources during testing. An unsupervised classification technique was adopted to perform AE data clustering without a priori knowledge. The multi-instrumentation and the clustered data served to label the different signal families and to build a learning database. This latter is useful to construct a supervised classifier that can be used for automatic recognition of the AE signals. Several materials with different ingredients were tested under various solicitations in order to feed and enrich the learning database. The methodology presented in this work was useful to refine the damage threshold for the new generation materials. The damage mechanisms around this threshold were highlighted. The obtained signal classes were assigned to the different mechanisms. The isolation of a 'noise' class makes it possible to discriminate between the signals emitted by damages without resorting to spatial filtering or increasing the AE detection threshold. The approach was validated on different material configurations. For the same material and the same type of solicitation, the identified classes are reproducible and little disturbed. The supervised classifier constructed based on the learning database was able to predict the labels of the classified signals.

Keywords: acoustic emission, classifier, damage mechanisms, first damage threshold, interlock composite materials, pattern recognition

Procedia PDF Downloads 130

253 Unsupervised Learning of Spatiotemporally Coherent Metrics

Authors: Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun

Abstract:

Current state-of-the-art classification and detection algorithms rely on supervised training. In this work we study unsupervised feature learning in the context of temporally coherent video data. We focus on feature learning from unlabeled video data, using the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We establish a connection between slow feature learning to metric learning and show that the trained encoder can be used to define a more temporally and semantically coherent metric.

Keywords: machine learning, pattern clustering, pooling, classification

Procedia PDF Downloads 419

252 Spatial and Temporal Analysis of Forest Cover Change with Special Reference to Anthropogenic Activities in Kullu Valley, North-Western Indian Himalayan Region

Authors: Krisala Joshi, Sayanta Ghosh, Renu Lata, Jagdish C. Kuniyal

Abstract:

Throughout the world, monitoring and estimating the changing pattern of forests across diverse landscapes through remote sensing is instrumental in understanding the interactions of human activities and the ecological environment with the changing climate. Forest change detection using satellite imageries has emerged as an important means to gather information on a regional scale. Kullu valley in Himachal Pradesh, India is situated in a transitional zone between the lesser and the greater Himalayas. Thus, it presents a typical rugged mountainous terrain with moderate to high altitude which varies from 1200 meters to over 6000 meters. Due to changes in agricultural cropping patterns, urbanization, industrialization, hydropower generation, climate change, tourism, and anthropogenic forest fire, it has undergone a tremendous transformation in forest cover in the past three decades. The loss and degradation of forest cover results in soil erosion, loss of biodiversity including damage to wildlife habitats, and degradation of watershed areas, and deterioration of the overall quality of nature and life. The supervised classification of LANDSAT satellite data was performed to assess the changes in forest cover in Kullu valley over the years 2000 to 2020. Normalized Burn Ratio (NBR) was calculated to discriminate between burned and unburned areas of the forest. Our study reveals that in Kullu valley, the increasing number of forest fire incidents specifically, those due to anthropogenic activities has been on a rise, each subsequent year. The main objective of the present study is, therefore, to estimate the change in the forest cover of Kullu valley and to address the various social aspects responsible for the anthropogenic forest fires. Also, to assess its impact on the significant changes in the regional climatic factors, specifically, temperature, humidity, and precipitation over three decades, with the help of satellite imageries and ground data. The main outcome of the paper, we believe, will be helpful for the administration for making a quantitative assessment of the forest cover area changes due to anthropogenic activities and devising long-term measures for creating awareness among the local people of the area.

Keywords: Anthropogenic Activities, Forest Change Detection, Normalized Burn Ratio (NBR), Supervised Classification

Procedia PDF Downloads 146

251 Contextual Sentiment Analysis with Untrained Annotators

Authors: Lucas A. Silva, Carla R. Aguiar

Abstract:

This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.

Keywords: sentiment analysis, untrained annotators, naive bayes, entrepreneurship, contextualized classifier

Procedia PDF Downloads 362