Search results for: effect-range classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2088

Search results for: effect-range classification

1248 A Qualitative Research of Online Fraud Decision-Making Process

Authors: Semire Yekta

Abstract:

Many online retailers set up manual review teams to overcome the limitations of automated online fraud detection systems. This study critically examines the strategies they adapt in their decision-making process to set apart fraudulent individuals from non-fraudulent online shoppers. The study uses a mix method research approach. 32 in-depth interviews have been conducted alongside with participant observation and auto-ethnography. The study found out that all steps of the decision-making process are significantly affected by a level of subjectivity, personal understandings of online fraud, preferences and judgments and not necessarily by objectively identifiable facts. Rather clearly knowing who the fraudulent individuals are, the team members have to predict whether they think the customer might be a fraudster. Common strategies used are relying on the classification and fraud scorings in the automated fraud detection systems, weighing up arguments for and against the customer and making a decision, using cancellation to test customers’ reaction and making use of personal experiences and “the sixth sense”. The interaction in the team also plays a significant role given that some decisions turn into a group discussion. While customer data represent the basis for the decision-making, fraud management teams frequently make use of Google search and Google Maps to find out additional information about the customer and verify whether the customer is the person they claim to be. While this, on the one hand, raises ethical concerns, on the other hand, Google Street View on the address and area of the customer puts customers living in less privileged housing and areas at a higher risk of being classified as fraudsters. Phone validation is used as a final measurement to make decisions for or against the customer when previous strategies and Google Search do not suffice. However, phone validation is also characterized by individuals’ subjectivity, personal views and judgment on customer’s reaction on the phone that results in a final classification as genuine or fraudulent.

Keywords: online fraud, data mining, manual review, social construction

Procedia PDF Downloads 327
1247 Histopathological Features of Basal Cell Carcinoma: A Ten Year Retrospective Statistical Study in Egypt

Authors: Hala M. El-hanbuli, Mohammed F. Darweesh

Abstract:

The incidence rates of any tumor vary hugely with geographical location. Basal Cell Carcinoma (BCC) is one of the most common skin cancer that has many histopathologic subtypes. Objective: The aim was to study the histopathological features of BCC cases that were received in the Pathology Department, Kasr El-Aini hospital, Cairo University, Egypt during the period from Jan 2004 to Dec 2013 and to evaluate the clinical characters through the patient data available in the request sheets. Methods: Slides and data of BCC cases were collected from the archives of the pathology department, Kasr El-Aini hospital. Revision of all available slides and histological classification of BCC according to WHO (2006) was done. Results: A total number of 310 cases of BCC representing about 65% from the total number of malignant skin tumors examined during the 10-years duration in the department. The age ranged from 8 to 84 years, the mean age was (55.7 ± 15.5). Most of the patients (85%) were above the age of 40 years. There was a slight male predominance (55%). Ulcerated BCC was the most common gross picture (60%), followed by nodular lesion (30%) and finally the ulcerated nodule (10%). Most of the lesions situated in the high-risk sites (77%) where the nose was the most common site (35%) followed by the periocular area (22%), then periauricular (15%) and finally perioral (5%). No lesion was reported outside the head. The tumor size was less than 2 centimeters in 65% of cases, and from 2-5 centimeters in the lesions' greatest dimension in the rest of cases. Histopathological reclassification revealed that the nodular BCC was the most common (68%) followed by the pigmented nodular (18.75%). The histologic high-risk groups represented (7.5%) about half of them (3.75%) being basosquamous carcinoma. The total incidence for multiple BCC and 2nd primary was 12%. Recurrent BCC represented 8%. All of the recurrent lesions of BCC belonged to the histologic high-risk group. Conclusion: Basal Cell Carcinoma is the most common skin cancer in the 10-year survey. Histopathological diagnosis and classification of BCC cases are essential for the determination of the tumor type and its biological behavior.

Keywords: basal cell carcinoma, high risk, histopathological features, statistical analysis

Procedia PDF Downloads 134
1246 A Methodology for Developing New Technology Ideas to Avoid Patent Infringement: F-Term Based Patent Analysis

Authors: Kisik Song, Sungjoo Lee

Abstract:

With the growing importance of intangible assets recently, the impact of patent infringement on the business of a company has become more evident. Accordingly, it is essential for firms to estimate the risk of patent infringement risk before developing a technology and create new technology ideas to avoid the risk. Recognizing the needs, several attempts have been made to help develop new technology opportunities and most of them have focused on identifying emerging vacant technologies from patent analysis. In these studies, the IPC (International Patent Classification) system or keywords from text-mining application to patent documents was generally used to define vacant technologies. Unlike those studies, this study adopted F-term, which classifies patent documents according to the technical features of the inventions described in them. Since the technical features are analyzed by various perspectives by F-term, F-term provides more detailed information about technologies compared to IPC while more systematic information compared to keywords. Therefore, if well utilized, it can be a useful guideline to create a new technology idea. Recognizing the potential of F-term, this paper aims to suggest a novel approach to developing new technology ideas to avoid patent infringement based on F-term. For this purpose, we firstly collected data about F-term and then applied text-mining to the descriptions about classification criteria and attributes. From the text-mining results, we could identify other technologies with similar technical features of the existing one, the patented technology. Finally, we compare the technologies and extract the technical features that are commonly used in other technologies but have not been used in the existing one. These features are presented in terms of “purpose”, “function”, “structure”, “material”, “method”, “processing and operation procedure” and “control means” and so are useful for creating new technology ideas that help avoid infringing patent rights of other companies. Theoretically, this is one of the earliest attempts to adopt F-term to patent analysis; the proposed methodology can show how to best take advantage of F-term with the wealth of technical information. In practice, the proposed methodology can be valuable in the ideation process for successful product and service innovation without infringing the patents of other companies.

Keywords: patent infringement, new technology ideas, patent analysis, F-term

Procedia PDF Downloads 252
1245 Categorical Metadata Encoding Schemes for Arteriovenous Fistula Blood Flow Sound Classification: Scaling Numerical Representations Leads to Improved Performance

Authors: George Zhou, Yunchan Chen, Candace Chien

Abstract:

Kidney replacement therapy is the current standard of care for end-stage renal diseases. In-center or home hemodialysis remains an integral component of the therapeutic regimen. Arteriovenous fistulas (AVF) make up the vascular circuit through which blood is filtered and returned. Naturally, AVF patency determines whether adequate clearance and filtration can be achieved and directly influences clinical outcomes. Our aim was to build a deep learning model for automated AVF stenosis screening based on the sound of blood flow through the AVF. A total of 311 patients with AVF were enrolled in this study. Blood flow sounds were collected using a digital stethoscope. For each patient, blood flow sounds were collected at 6 different locations along the patient’s AVF. The 6 locations are artery, anastomosis, distal vein, middle vein, proximal vein, and venous arch. A total of 1866 sounds were collected. The blood flow sounds are labeled as “patent” (normal) or “stenotic” (abnormal). The labels are validated from concurrent ultrasound. Our dataset included 1527 “patent” and 339 “stenotic” sounds. We show that blood flow sounds vary significantly along the AVF. For example, the blood flow sound is loudest at the anastomosis site and softest at the cephalic arch. Contextualizing the sound with location metadata significantly improves classification performance. How to encode and incorporate categorical metadata is an active area of research1. Herein, we study ordinal (i.e., integer) encoding schemes. The numerical representation is concatenated to the flattened feature vector. We train a vision transformer (ViT) on spectrogram image representations of the sound and demonstrate that using scalar multiples of our integer encodings improves classification performance. Models are evaluated using a 10-fold cross-validation procedure. The baseline performance of our ViT without any location metadata achieves an AuROC and AuPRC of 0.68 ± 0.05 and 0.28 ± 0.09, respectively. Using the following encodings of Artery:0; Arch: 1; Proximal: 2; Middle: 3; Distal 4: Anastomosis: 5, the ViT achieves an AuROC and AuPRC of 0.69 ± 0.06 and 0.30 ± 0.10, respectively. Using the following encodings of Artery:0; Arch: 10; Proximal: 20; Middle: 30; Distal 40: Anastomosis: 50, the ViT achieves an AuROC and AuPRC of 0.74 ± 0.06 and 0.38 ± 0.10, respectively. Using the following encodings of Artery:0; Arch: 100; Proximal: 200; Middle: 300; Distal 400: Anastomosis: 500, the ViT achieves an AuROC and AuPRC of 0.78 ± 0.06 and 0.43 ± 0.11. respectively. Interestingly, we see that using increasing scalar multiples of our integer encoding scheme (i.e., encoding “venous arch” as 1,10,100) results in progressively improved performance. In theory, the integer values do not matter since we are optimizing the same loss function; the model can learn to increase or decrease the weights associated with location encodings and converge on the same solution. However, in the setting of limited data and computation resources, increasing the importance at initialization either leads to faster convergence or helps the model escape a local minimum.

Keywords: arteriovenous fistula, blood flow sounds, metadata encoding, deep learning

Procedia PDF Downloads 63
1244 Study and Analysis of the Factors Affecting Road Safety Using Decision Tree Algorithms

Authors: Naina Mahajan, Bikram Pal Kaur

Abstract:

The purpose of traffic accident analysis is to find the possible causes of an accident. Road accidents cannot be totally prevented but by suitable traffic engineering and management the accident rate can be reduced to a certain extent. This paper discusses the classification techniques C4.5 and ID3 using the WEKA Data mining tool. These techniques use on the NH (National highway) dataset. With the C4.5 and ID3 technique it gives best results and high accuracy with less computation time and error rate.

Keywords: C4.5, ID3, NH(National highway), WEKA data mining tool

Procedia PDF Downloads 311
1243 Fault-Tolerant Control Study and Classification: Case Study of a Hydraulic-Press Model Simulated in Real-Time

Authors: Jorge Rodriguez-Guerra, Carlos Calleja, Aron Pujana, Iker Elorza, Ana Maria Macarulla

Abstract:

Society demands more reliable manufacturing processes capable of producing high quality products in shorter production cycles. New control algorithms have been studied to satisfy this paradigm, in which Fault-Tolerant Control (FTC) plays a significant role. It is suitable to detect, isolate and adapt a system when a harmful or faulty situation appears. In this paper, a general overview about FTC characteristics are exposed; highlighting the properties a system must ensure to be considered faultless. In addition, a research to identify which are the main FTC techniques and a classification based on their characteristics is presented in two main groups: Active Fault-Tolerant Controllers (AFTCs) and Passive Fault-Tolerant Controllers (PFTCs). AFTC encompasses the techniques capable of re-configuring the process control algorithm after the fault has been detected, while PFTC comprehends the algorithms robust enough to bypass the fault without further modifications. The mentioned re-configuration requires two stages, one focused on detection, isolation and identification of the fault source and the other one in charge of re-designing the control algorithm by two approaches: fault accommodation and control re-design. From the algorithms studied, one has been selected and applied to a case study based on an industrial hydraulic-press. The developed model has been embedded under a real-time validation platform, which allows testing the FTC algorithms and analyse how the system will respond when a fault arises in similar conditions as a machine will have on factory. One AFTC approach has been picked up as the methodology the system will follow in the fault recovery process. In a first instance, the fault will be detected, isolated and identified by means of a neural network. In a second instance, the control algorithm will be re-configured to overcome the fault and continue working without human interaction.

Keywords: fault-tolerant control, electro-hydraulic actuator, fault detection and isolation, control re-design, real-time

Procedia PDF Downloads 158
1242 Mondoc: Informal Lightweight Ontology for Faceted Semantic Classification of Hypernymy

Authors: M. Regina Carreira-Lopez

Abstract:

Lightweight ontologies seek to concrete union relationships between a parent node, and a secondary node, also called "child node". This logic relation (L) can be formally defined as a triple ontological relation (LO) equivalent to LO in ⟨LN, LE, LC⟩, and where LN represents a finite set of nodes (N); LE is a set of entities (E), each of which represents a relationship between nodes to form a rooted tree of ⟨LN, LE⟩; and LC is a finite set of concepts (C), encoded in a formal language (FL). Mondoc enables more refined searches on semantic and classified facets for retrieving specialized knowledge about Atlantic migrations, from the Declaration of Independence of the United States of America (1776) and to the end of the Spanish Civil War (1939). The model looks forward to increasing documentary relevance by applying an inverse frequency of co-ocurrent hypernymy phenomena for a concrete dataset of textual corpora, with RMySQL package. Mondoc profiles archival utilities implementing SQL programming code, and allows data export to XML schemas, for achieving semantic and faceted analysis of speech by analyzing keywords in context (KWIC). The methodology applies random and unrestricted sampling techniques with RMySQL to verify the resonance phenomena of inverse documentary relevance between the number of co-occurrences of the same term (t) in more than two documents of a set of texts (D). Secondly, the research also evidences co-associations between (t) and their corresponding synonyms and antonyms (synsets) are also inverse. The results from grouping facets or polysemic words with synsets in more than two textual corpora within their syntagmatic context (nouns, verbs, adjectives, etc.) state how to proceed with semantic indexing of hypernymy phenomena for subject-heading lists and for authority lists for documentary and archival purposes. Mondoc contributes to the development of web directories and seems to achieve a proper and more selective search of e-documents (classification ontology). It can also foster on-line catalogs production for semantic authorities, or concepts, through XML schemas, because its applications could be used for implementing data models, by a prior adaptation of the based-ontology to structured meta-languages, such as OWL, RDF (descriptive ontology). Mondoc serves to the classification of concepts and applies a semantic indexing approach of facets. It enables information retrieval, as well as quantitative and qualitative data interpretation. The model reproduces a triple tuple ⟨LN, LE, LT, LCF L, BKF⟩ where LN is a set of entities that connect with other nodes to concrete a rooted tree in ⟨LN, LE⟩. LT specifies a set of terms, and LCF acts as a finite set of concepts, encoded in a formal language, L. Mondoc only resolves partial problems of linguistic ambiguity (in case of synonymy and antonymy), but neither the pragmatic dimension of natural language nor the cognitive perspective is addressed. To achieve this goal, forthcoming programming developments should target at oriented meta-languages with structured documents in XML.

Keywords: hypernymy, information retrieval, lightweight ontology, resonance

Procedia PDF Downloads 110
1241 The Effects of Lithofacies on Oil Enrichment in Lucaogou Formation Fine-Grained Sedimentary Rocks in Santanghu Basin, China

Authors: Guoheng Liu, Zhilong Huang

Abstract:

For more than the past ten years, oil and gas production from marine shale such as the Barnett shale. In addition, in recent years, major breakthroughs have also been made in lacustrine shale gas exploration, such as the Yanchang Formation of the Ordos Basin in China. Lucaogou Formation shale, which is also lacustrine shale, has also yielded a high production in recent years, for wells such as M1, M6, and ML2, yielding a daily oil production of 5.6 tons, 37.4 tons and 13.56 tons, respectively. Lithologic identification and classification of reservoirs are the base and keys to oil and gas exploration. Lithology and lithofacies obviously control the distribution of oil and gas in lithological reservoirs, so it is of great significance to describe characteristics of lithology and lithofacies of reservoirs finely. Lithofacies is an intrinsic property of rock formed under certain conditions of sedimentation. Fine-grained sedimentary rocks such as shale formed under different sedimentary conditions display great particularity and distinctiveness. Hence, to our best knowledge, no constant and unified criteria and methods exist for fine-grained sedimentary rocks regarding lithofacies definition and classification. Consequently, multi-parameters and multi-disciplines are necessary. A series of qualitative descriptions and quantitative analysis were used to figure out the lithofacies characteristics and its effect on oil accumulation of Lucaogou formation fine-grained sedimentary rocks in Santanghu basin. The qualitative description includes core description, petrographic thin section observation, fluorescent thin-section observation, cathode luminescence observation and scanning electron microscope observation. The quantitative analyses include X-ray diffraction, total organic content analysis, ROCK-EVAL.II Methodology, soxhlet extraction, porosity and permeability analysis and oil saturation analysis. Three types of lithofacies were mainly well-developed in this study area, which is organic-rich massive shale lithofacies, organic-rich laminated and cloddy hybrid sedimentary lithofacies and organic-lean massive carbonate lithofacies. Organic-rich massive shale lithofacies mainly include massive shale and tuffaceous shale, of which quartz and clay minerals are the major components. Organic-rich laminated and cloddy hybrid sedimentary lithofacies contain lamina and cloddy structure. Rocks from this lithofacies chiefly consist of dolomite and quartz. Organic-lean massive carbonate lithofacies mainly contains massive bedding fine-grained carbonate rocks, of which fine-grained dolomite accounts for the main part. Organic-rich massive shale lithofacies contain the highest content of free hydrocarbon and solid organic matter. Moreover, more pores were developed in organic-rich massive shale lithofacies. Organic-lean massive carbonate lithofacies contain the lowest content solid organic matter and develop the least amount of pores. Organic-rich laminated and cloddy hybrid sedimentary lithofacies develop the largest number of cracks and fractures. To sum up, organic-rich massive shale lithofacies is the most favorable type of lithofacies. Organic-lean massive carbonate lithofacies is impossible for large scale oil accumulation.

Keywords: lithofacies classification, tuffaceous shale, oil enrichment, Lucaogou formation

Procedia PDF Downloads 193
1240 Reconnaissance Investigation of Thermal Springs in the Middle Benue Trough, Nigeria by Remote Sensing

Authors: N. Tochukwu, M. Mukhopadhyay, A. Mohamed

Abstract:

It is no new that Nigeria faces a continual power shortage problem due to its vast population power demand and heavy reliance on nonrenewable forms of energy such as thermal power or fossil fuel. Many researchers have recommended using geothermal energy as an alternative; however, Past studies focus on the geophysical & geochemical investigation of this energy in the sedimentary and basement complex; only a few studies incorporated the remote sensing methods. Therefore, in this study, the preliminary examination of geothermal resources in the Middle Benue was carried out using satellite imagery in ArcMap. Landsat 8 scene (TIR, NIR, Red spectral bands) was used to estimate the Land Surface Temperature (LST). The Maximum Likelihood Classification (MLC) technique was used to classify sites with very low, low, moderate, and high LST. The intermediate and high classification happens to be possible geothermal zones, and they occupy 49% of the study area (38077km2). Riverline were superimposed on the LST layer, and the identification tool was used to locate high temperate sites. Streams that overlap on the selected sites were regarded as geothermal springs as. Surprisingly, the LST results show lower temperatures (<36°C) at the famous thermal springs (Awe & Wukari) than some unknown rivers/streams found in Kwande (38°C), Ussa, (38°C), Gwer East (37°C), Yola Cross & Ogoja (36°C). Studies have revealed that temperature increases with depth. However, this result shows excellent geothermal resources potential as it is expected to exceed the minimum geothermal gradient of 25.47 with an increase in depth. Therefore, further investigation is required to estimate the depth of the causative body, geothermal gradients, and the sustainability of the reservoirs by geophysical and field exploration. This method has proven to be cost-effective in locating geothermal resources in the study area. Consequently, the same procedure is recommended to be applied in other regions of the Precambrian basement complex and the sedimentary basins in Nigeria to save a preliminary field survey cost.

Keywords: ArcMap, geothermal resources, Landsat 8, LST, thermal springs, MLC

Procedia PDF Downloads 161
1239 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 277
1238 Recurrent Neural Networks for Classifying Outliers in Electronic Health Record Clinical Text

Authors: Duncan Wallace, M-Tahar Kechadi

Abstract:

In recent years, Machine Learning (ML) approaches have been successfully applied to an analysis of patient symptom data in the context of disease diagnosis, at least where such data is well codified. However, much of the data present in Electronic Health Records (EHR) are unlikely to prove suitable for classic ML approaches. Furthermore, as scores of data are widely spread across both hospitals and individuals, a decentralized, computationally scalable methodology is a priority. The focus of this paper is to develop a method to predict outliers in an out-of-hours healthcare provision center (OOHC). In particular, our research is based upon the early identification of patients who have underlying conditions which will cause them to repeatedly require medical attention. OOHC act as an ad-hoc delivery of triage and treatment, where interactions occur without recourse to a full medical history of the patient in question. Medical histories, relating to patients contacting an OOHC, may reside in several distinct EHR systems in multiple hospitals or surgeries, which are unavailable to the OOHC in question. As such, although a local solution is optimal for this problem, it follows that the data under investigation is incomplete, heterogeneous, and comprised mostly of noisy textual notes compiled during routine OOHC activities. Through the use of Deep Learning methodologies, the aim of this paper is to provide the means to identify patient cases, upon initial contact, which are likely to relate to such outliers. To this end, we compare the performance of Long Short-Term Memory, Gated Recurrent Units, and combinations of both with Convolutional Neural Networks. A further aim of this paper is to elucidate the discovery of such outliers by examining the exact terms which provide a strong indication of positive and negative case entries. While free-text is the principal data extracted from EHRs for classification, EHRs also contain normalized features. Although the specific demographical features treated within our corpus are relatively limited in scope, we examine whether it is beneficial to include such features among the inputs to our neural network, or whether these features are more successfully exploited in conjunction with a different form of a classifier. In this section, we compare the performance of randomly generated regression trees and support vector machines and determine the extent to which our classification program can be improved upon by using either of these machine learning approaches in conjunction with the output of our Recurrent Neural Network application. The output of our neural network is also used to help determine the most significant lexemes present within the corpus for determining high-risk patients. By combining the confidence of our classification program in relation to lexemes within true positive and true negative cases, with an inverse document frequency of the lexemes related to these cases, we can determine what features act as the primary indicators of frequent-attender and non-frequent-attender cases, providing a human interpretable appreciation of how our program classifies cases.

Keywords: artificial neural networks, data-mining, machine learning, medical informatics

Procedia PDF Downloads 108
1237 A Kruskal Based Heuxistic for the Application of Spanning Tree

Authors: Anjan Naidu

Abstract:

In this paper we first discuss the minimum spanning tree, then we use the Kruskal algorithm to obtain minimum spanning tree. Based on Kruskal algorithm we propose Kruskal algorithm to apply an application to find minimum cost applying the concept of spanning tree.

Keywords: Minimum Spanning tree, algorithm, Heuxistic, application, classification of Sub 97K90

Procedia PDF Downloads 424
1236 Towards Learning Query Expansion

Authors: Ahlem Bouziri, Chiraz Latiri, Eric Gaussier

Abstract:

The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to extract dependencies between terms, namely a generic basis of association rules between terms. The key feature of our approach is a better trade off between the size of the mining result and the conveyed knowledge. Thus, face to the huge number of derived association rules and in order to select the optimal combination of query terms from the generic basis, we propose to model the problem as a classification problem and solve it using a supervised learning algorithm such as SVM or k-means. For this purpose, we first generate a training set using a genetic algorithm based approach that explores the association rules space in order to find an optimal set of expansion terms, improving the MAP of the search results. The experiments were performed on SDA 95 collection, a data collection for information retrieval. It was found that the results were better in both terms of MAP and NDCG. The main observation is that the hybridization of text mining techniques and query expansion in an intelligent way allows us to incorporate the good features of all of them. As this is a preliminary attempt in this direction, there is a large scope for enhancing the proposed method.

Keywords: supervised leaning, classification, query expansion, association rules

Procedia PDF Downloads 306
1235 Human Gait Recognition Using Moment with Fuzzy

Authors: Jyoti Bharti, Navneet Manjhi, M. K.Gupta, Bimi Jain

Abstract:

A reliable gait features are required to extract the gait sequences from an images. In this paper suggested a simple method for gait identification which is based on moments. Moment values are extracted on different number of frames of gray scale and silhouette images of CASIA database. These moment values are considered as feature values. Fuzzy logic and nearest neighbour classifier are used for classification. Both achieved higher recognition.

Keywords: gait, fuzzy logic, nearest neighbour, recognition rate, moments

Procedia PDF Downloads 731
1234 Adolescent-Parent Relationship as the Most Important Factor in Preventing Mood Disorders in Adolescents: An Application of Artificial Intelligence to Social Studies

Authors: Elżbieta Turska

Abstract:

Introduction: One of the most difficult times in a person’s life is adolescence. The experiences in this period may shape the future life of this person to a large extent. This is the reason why many young people experience sadness, dejection, hopelessness, sense of worthlessness, as well as losing interest in various activities and social relationships, all of which are often classified as mood disorders. As many as 15-40% adolescents experience depressed moods and for most of them they resolve and are not carried into adulthood. However, (5-6%) of those affected by mood disorders develop the depressive syndrome and as many as (1-3%) develop full-blown clinical depression. Materials: A large questionnaire was given to 2508 students, aged 13–16 years old, and one of its parts was the Burns checklist, i.e. the standard test for identifying depressed mood. The questionnaire asked about many aspects of the student’s life, it included a total of 53 questions, most of which had subquestions. It is important to note that the data suffered from many problems, the most important of which were missing data and collinearity. Aim: In order to identify the correlates of mood disorders we built predictive models which were then trained and validated. Our aim was not to be able to predict which students suffer from mood disorders but rather to explore the factors influencing mood disorders. Methods: The problems with data described above practically excluded using all classical statistical methods. For this reason, we attempted to use the following Artificial Intelligence (AI) methods: classification trees with surrogate variables, random forests and xgboost. All analyses were carried out with the use of the mlr package for the R programming language. Resuts: The predictive model built by classification trees algorithm outperformed the other algorithms by a large margin. As a result, we were able to rank the variables (questions and subquestions from the questionnaire) from the most to least influential as far as protection against mood disorder is concerned. Thirteen out of twenty most important variables reflect the relationships with parents. This seems to be a really significant result both from the cognitive point of view and also from the practical point of view, i.e. as far as interventions to correct mood disorders are concerned.

Keywords: mood disorders, adolescents, family, artificial intelligence

Procedia PDF Downloads 85
1233 Detecting Covid-19 Fake News Using Deep Learning Technique

Authors: AnjalI A. Prasad

Abstract:

Nowadays, social media played an important role in spreading misinformation or fake news. This study analyzes the fake news related to the COVID-19 pandemic spread in social media. This paper aims at evaluating and comparing different approaches that are used to mitigate this issue, including popular deep learning approaches, such as CNN, RNN, LSTM, and BERT algorithm for classification. To evaluate models’ performance, we used accuracy, precision, recall, and F1-score as the evaluation metrics. And finally, compare which algorithm shows better result among the four algorithms.

Keywords: BERT, CNN, LSTM, RNN

Procedia PDF Downloads 190
1232 Assessing the Utility of Unmanned Aerial Vehicle-Borne Hyperspectral Image and Photogrammetry Derived 3D Data for Wetland Species Distribution Quick Mapping

Authors: Qiaosi Li, Frankie Kwan Kit Wong, Tung Fung

Abstract:

Lightweight unmanned aerial vehicle (UAV) loading with novel sensors offers a low cost approach for data acquisition in complex environment. This study established a framework for applying UAV system in complex environment quick mapping and assessed the performance of UAV-based hyperspectral image and digital surface model (DSM) derived from photogrammetric point clouds for 13 species classification in wetland area Mai Po Inner Deep Bay Ramsar Site, Hong Kong. The study area was part of shallow bay with flat terrain and the major species including reedbed and four mangroves: Kandelia obovata, Aegiceras corniculatum, Acrostichum auerum and Acanthus ilicifolius. Other species involved in various graminaceous plants, tarbor, shrub and invasive species Mikania micrantha. In particular, invasive species climbed up to the mangrove canopy caused damage and morphology change which might increase species distinguishing difficulty. Hyperspectral images were acquired by Headwall Nano sensor with spectral range from 400nm to 1000nm and 0.06m spatial resolution image. A sequence of multi-view RGB images was captured with 0.02m spatial resolution and 75% overlap. Hyperspectral image was corrected for radiative and geometric distortion while high resolution RGB images were matched to generate maximum dense point clouds. Furtherly, a 5 cm grid digital surface model (DSM) was derived from dense point clouds. Multiple feature reduction methods were compared to identify the efficient method and to explore the significant spectral bands in distinguishing different species. Examined methods including stepwise discriminant analysis (DA), support vector machine (SVM) and minimum noise fraction (MNF) transformation. Subsequently, spectral subsets composed of the first 20 most importance bands extracted by SVM, DA and MNF, and multi-source subsets adding extra DSM to 20 spectrum bands were served as input in maximum likelihood classifier (MLC) and SVM classifier to compare the classification result. Classification results showed that feature reduction methods from best to worst are MNF transformation, DA and SVM. MNF transformation accuracy was even higher than all bands input result. Selected bands frequently laid along the green peak, red edge and near infrared. Additionally, DA found that chlorophyll absorption red band and yellow band were also important for species classification. In terms of 3D data, DSM enhanced the discriminant capacity among low plants, arbor and mangrove. Meanwhile, DSM largely reduced misclassification due to the shadow effect and morphological variation of inter-species. In respect to classifier, nonparametric SVM outperformed than MLC for high dimension and multi-source data in this study. SVM classifier tended to produce higher overall accuracy and reduce scattered patches although it costs more time than MLC. The best result was obtained by combining MNF components and DSM in SVM classifier. This study offered a precision species distribution survey solution for inaccessible wetland area with low cost of time and labour. In addition, findings relevant to the positive effect of DSM as well as spectral feature identification indicated that the utility of UAV-borne hyperspectral and photogrammetry deriving 3D data is promising in further research on wetland species such as bio-parameters modelling and biological invasion monitoring.

Keywords: digital surface model (DSM), feature reduction, hyperspectral, photogrammetric point cloud, species mapping, unmanned aerial vehicle (UAV)

Procedia PDF Downloads 237
1231 Flood Hazard Assessment and Land Cover Dynamics of the Orai Khola Watershed, Bardiya, Nepal

Authors: Loonibha Manandhar, Rajendra Bhandari, Kumud Raj Kafle

Abstract:

Nepal’s Terai region is a part of the Ganges river basin which is one of the most disaster-prone areas of the world, with recurrent monsoon flooding causing millions in damage and the death and displacement of hundreds of people and households every year. The vulnerability of human settlements to natural disasters such as floods is increasing, and mapping changes in land use practices and hydro-geological parameters is essential in developing resilient communities and strong disaster management policies. The objective of this study was to develop a flood hazard zonation map of Orai Khola watershed and map the decadal land use/land cover dynamics of the watershed. The watershed area was delineated using SRTM DEM, and LANDSAT images were classified into five land use classes (forest, grassland, sediment and bare land, settlement area and cropland, and water body) using pixel-based semi-automated supervised maximum likelihood classification. Decadal changes in each class were then quantified using spatial modelling. Flood hazard mapping was performed by assigning weights to factors slope, rainfall distribution, distance from the river and land use/land cover on the basis of their estimated influence in causing flood hazard and performing weighed overlay analysis to identify areas that are highly vulnerable. The forest and grassland coverage increased by 11.53 km² (3.8%) and 1.43 km² (0.47%) from 1996 to 2016. The sediment and bare land areas decreased by 12.45 km² (4.12%) from 1996 to 2016 whereas settlement and cropland areas showed a consistent increase to 14.22 km² (4.7%). Waterbody coverage also increased to 0.3 km² (0.09%) from 1996-2016. 1.27% (3.65 km²) of total watershed area was categorized into very low hazard zone, 20.94% (60.31 km²) area into low hazard zone, 37.59% (108.3 km²) area into moderate hazard zone, 29.25% (84.27 km²) area into high hazard zone and 31 villages which comprised 10.95% (31.55 km²) were categorized into high hazard zone area.

Keywords: flood hazard, land use/land cover, Orai river, supervised maximum likelihood classification, weighed overlay analysis

Procedia PDF Downloads 323
1230 Characterization of Agroforestry Systems in Burkina Faso Using an Earth Observation Data Cube

Authors: Dan Kanmegne

Abstract:

Africa will become the most populated continent by the end of the century, with around 4 billion inhabitants. Food security and climate changes will become continental issues since agricultural practices depend on climate but also contribute to global emissions and land degradation. Agroforestry has been identified as a cost-efficient and reliable strategy to address these two issues. It is defined as the integrated management of trees and crops/animals in the same land unit. Agroforestry provides benefits in terms of goods (fruits, medicine, wood, etc.) and services (windbreaks, fertility, etc.), and is acknowledged to have a great potential for carbon sequestration; therefore it can be integrated into reduction mechanisms of carbon emissions. Particularly in sub-Saharan Africa, the constraint stands in the lack of information about both areas under agroforestry and the characterization (composition, structure, and management) of each agroforestry system at the country level. This study describes and quantifies “what is where?”, earliest to the quantification of carbon stock in different systems. Remote sensing (RS) is the most efficient approach to map such a dynamic technology as agroforestry since it gives relatively adequate and consistent information over a large area at nearly no cost. RS data fulfill the good practice guidelines of the Intergovernmental Panel On Climate Change (IPCC) that is to be used in carbon estimation. Satellite data are getting more and more accessible, and the archives are growing exponentially. To retrieve useful information to support decision-making out of this large amount of data, satellite data needs to be organized so to ensure fast processing, quick accessibility, and ease of use. A new solution is a data cube, which can be understood as a multi-dimensional stack (space, time, data type) of spatially aligned pixels and used for efficient access and analysis. A data cube for Burkina Faso has been set up from the cooperation project between the international service provider WASCAL and Germany, which provides an accessible exploitation architecture of multi-temporal satellite data. The aim of this study is to map and characterize agroforestry systems using the Burkina Faso earth observation data cube. The approach in its initial stage is based on an unsupervised image classification of a normalized difference vegetation index (NDVI) time series from 2010 to 2018, to stratify the country based on the vegetation. Fifteen strata were identified, and four samples per location were randomly assigned to define the sampling units. For safety reasons, the northern part will not be part of the fieldwork. A total of 52 locations will be visited by the end of the dry season in February-March 2020. The field campaigns will consist of identifying and describing different agroforestry systems and qualitative interviews. A multi-temporal supervised image classification will be done with a random forest algorithm, and the field data will be used for both training the algorithm and accuracy assessment. The expected outputs are (i) map(s) of agroforestry dynamics, (ii) characteristics of different systems (main species, management, area, etc.); (iii) assessment report of Burkina Faso data cube.

Keywords: agroforestry systems, Burkina Faso, earth observation data cube, multi-temporal image classification

Procedia PDF Downloads 124
1229 Fault Diagnosis of Manufacturing Systems Using AntTreeStoch with Parameter Optimization by ACO

Authors: Ouahab Kadri, Leila Hayet Mouss

Abstract:

In this paper, we present three diagnostic modules for complex and dynamic systems. These modules are based on three ant colony algorithms, which are AntTreeStoch, Lumer & Faieta and Binary ant colony. We chose these algorithms for their simplicity and their wide application range. However, we cannot use these algorithms in their basement forms as they have several limitations. To use these algorithms in a diagnostic system, we have proposed three variants. We have tested these algorithms on datasets issued from two industrial systems, which are clinkering system and pasteurization system.

Keywords: ant colony algorithms, complex and dynamic systems, diagnosis, classification, optimization

Procedia PDF Downloads 284
1228 Vertical and Horizantal Distribution Patterns of Major and Trace Elements: Surface and Subsurface Sediments of Endhorheic Lake Acigol Basin, Denizli Turkey

Authors: M. Budakoglu, M. Karaman

Abstract:

Lake Acıgöl is located in area with limited influences from urban and industrial pollution sources, there is nevertheless a need to understand all potential lithological and anthropogenic sources of priority contaminants in this closed basin. This study discusses vertical and horizontal distribution pattern of major, trace elements of recent lake sediments to better understand their current geochemical analog with lithological units in the Lake Acıgöl basin. This study also provides reliable background levels for the region by the detailed surfaced lithological units data. The detail results of surface, subsurface and shallow core sediments from these relatively unperturbed ecosystems, highlight its importance as conservation area, despite the high-scale industrial salt production activity. While P2O5/TiO2 versus MgO/CaO classification diagram indicate magmatic and sedimentary origin of lake sediment, Log(SiO2/Al2O3) versus Log(Na2O/K2O) classification diagrams express lithological assemblages of shale, iron-shale, vacke and arkose. The plot between TiO2 vs. SiO2 and P2O5/TiO2 vs. MgO/CaO also supports the origin of the primary magma source. The average compositions of the 20 different lithological units used as a proxy for geochemical background in the study area. As expected from weathered rock materials, there is a large variation in the major element content for all analyzed lake samples. The A-CN-K and A-CNK-FM ternary diagrams were used to deduce weathering trends. Surface and subsurface sediments display an intense weathering history according to these ternary diagrams. The most of the sediments samples plot around UCC and TTG, suggesting a low to moderate weathering history for the provenance. The sediments plot in a region clearly suggesting relative similar contents in Al2O3, CaO, Na2O, and K2O from those of lithological samples.

Keywords: Lake Acıgöl, recent lake sediment, geochemical speciation of major and trace elements, heavy metals, Denizli, Turkey

Procedia PDF Downloads 384
1227 Spatial Patterns of Urban Expansion in Kuwait City between 1989 and 2001

Authors: Saad Algharib, Jay Lee

Abstract:

Urbanization is a complex phenomenon that occurs during the city’s development from one form to another. In other words, it is the process when the activities in the land use/land cover change from rural to urban. Since the oil exploration, Kuwait City has been growing rapidly due to its urbanization and population growth by both natural growth and inward immigration. The main objective of this study is to detect changes in urban land use/land cover and to examine the changing spatial patterns of urban growth in and around Kuwait City between 1989 and 2001. In addition, this study also evaluates the spatial patterns of the changes detected and how they can be related to the spatial configuration of the city. Recently, the use of remote sensing and geographic information systems became very useful and important tools in urban studies because of the integration of them can allow and provide the analysts and planners to detect, monitor and analyze the urban growth in a region effectively. Moreover, both planners and users can predict the trends of the growth in urban areas in the future with remotely sensed and GIS data because they can be effectively updated with required precision levels. In order to identify the new urban areas between 1989 and 2001, the study uses satellite images of the study area and remote sensing technology for classifying these images. Unsupervised classification method was applied to classify images to land use and land cover data layers. After finishing the unsupervised classification method, GIS overlay function was applied to the classified images for detecting the locations and patterns of the new urban areas that developed during the study period. GIS was also utilized to evaluate the distribution of the spatial patterns. For example, Moran’s index was applied for all data inputs to examine the urban growth distribution. Furthermore, this study assesses if the spatial patterns and process of these changes take place in a random fashion or with certain identifiable trends. During the study period, the result of this study indicates that the urban growth has occurred and expanded 10% from 32.4% in 1989 to 42.4% in 2001. Also, the results revealed that the largest increase of the urban area occurred between the major highways after the forth ring road from the center of Kuwait City. Moreover, the spatial distribution of urban growth occurred in cluster manners.

Keywords: geographic information systems, remote sensing, urbanization, urban growth

Procedia PDF Downloads 149
1226 Normalized Compression Distance Based Scene Alteration Analysis of a Video

Authors: Lakshay Kharbanda, Aabhas Chauhan

Abstract:

In this paper, an application of Normalized Compression Distance (NCD) to detect notable scene alterations occurring in videos is presented. Several research groups have been developing methods to perform image classification using NCD, a computable approximation to Normalized Information Distance (NID) by studying the degree of similarity in images. The timeframes where significant aberrations between the frames of a video have occurred have been identified by obtaining a threshold NCD value, using two compressors: LZMA and BZIP2 and defining scene alterations using Pixel Difference Percentage metrics.

Keywords: image compression, Kolmogorov complexity, normalized compression distance, root mean square error

Procedia PDF Downloads 313
1225 Recognition of Tifinagh Characters with Missing Parts Using Neural Network

Authors: El Mahdi Barrah, Said Safi, Abdessamad Malaoui

Abstract:

In this paper, we present an algorithm for reconstruction from incomplete 2D scans for tifinagh characters. This algorithm is based on using correlation between the lost block and its neighbors. This system proposed contains three main parts: pre-processing, features extraction and recognition. In the first step, we construct a database of tifinagh characters. In the second step, we will apply “shape analysis algorithm”. In classification part, we will use Neural Network. The simulation results demonstrate that the proposed method give good results.

Keywords: Tifinagh character recognition, neural networks, local cost computation, ANN

Procedia PDF Downloads 312
1224 Classification of Sturm-Liouville Problems at Infinity

Authors: Kishor J. shinde

Abstract:

We determine the values of k and p such that the Sturm-Liouville differential operator τu=-(d^2 u)/(dx^2) + kx^p u is in limit point case or limit circle case at infinity. In particular it is shown that τ is in the limit point case when (i) for p=2 and ∀k, (ii) for ∀p and k=0, (iii) for all p and k>0, (iv) for 0≤p≤2 and k<0, (v) for p<0 and k<0. τ is in the limit circle case when (i) for p>2 and k<0.

Keywords: limit point case, limit circle case, Sturm-Liouville, infinity

Procedia PDF Downloads 344
1223 Rice Area Determination Using Landsat-Based Indices and Land Surface Temperature Values

Authors: Burçin Saltık, Levent Genç

Abstract:

In this study, it was aimed to determine a route for identification of rice cultivation areas within Thrace and Marmara regions of Turkey using remote sensing and GIS. Landsat 8 (OLI-TIRS) imageries acquired in production season of 2013 with 181/32 Path/Row number were used. Four different seasonal images were generated utilizing original bands and different transformation techniques. All images were classified individually using supervised classification techniques and Land Use Land Cover Maps (LULC) were generated with 8 classes. Areas (ha, %) of each classes were calculated. In addition, district-based rice distribution maps were developed and results of these maps were compared with Turkish Statistical Institute (TurkSTAT; TSI)’s actual rice cultivation area records. Accuracy assessments were conducted, and most accurate map was selected depending on accuracy assessment and coherency with TSI results. Additionally, rice areas on over 4° slope values were considered as mis-classified pixels and they eliminated using slope map and GIS tools. Finally, randomized rice zones were selected to obtain maximum-minimum value ranges of each date (May, June, July, August, September images separately) NDVI, LSWI, and LST images to test whether they may be used for rice area determination via raster calculator tool of ArcGIS. The most accurate classification for rice determination was obtained from seasonal LSWI LULC map, and considering TSI data and accuracy assessment results and mis-classified pixels were eliminated from this map. According to results, 83151.5 ha of rice areas exist within study area. However, this result is higher than TSI records with an area of 12702.3 ha. Use of maximum-minimum range of rice area NDVI, LSWI, and LST was tested in Meric district. It was seen that using the value ranges obtained from July imagery, gave the closest results to TSI records, and the difference was only 206.4 ha. This difference is normal due to relatively low resolution of images. Thus, employment of images with higher spectral, spatial, temporal and radiometric resolutions may provide more reliable results.

Keywords: landsat 8 (OLI-TIRS), LST, LSWI, LULC, NDVI, rice

Procedia PDF Downloads 206
1222 Comprehensive Machine Learning-Based Glucose Sensing from Near-Infrared Spectra

Authors: Bitewulign Mekonnen

Abstract:

Context: This scientific paper focuses on the use of near-infrared (NIR) spectroscopy to determine glucose concentration in aqueous solutions accurately and rapidly. The study compares six different machine learning methods for predicting glucose concentration and also explores the development of a deep learning model for classifying NIR spectra. The objective is to optimize the detection model and improve the accuracy of glucose prediction. This research is important because it provides a comprehensive analysis of various machine-learning techniques for estimating aqueous glucose concentrations. Research Aim: The aim of this study is to compare and evaluate different machine-learning methods for predicting glucose concentration from NIR spectra. Additionally, the study aims to develop and assess a deep-learning model for classifying NIR spectra. Methodology: The research methodology involves the use of machine learning and deep learning techniques. Six machine learning regression models, including support vector machine regression, partial least squares regression, extra tree regression, random forest regression, extreme gradient boosting, and principal component analysis-neural network, are employed to predict glucose concentration. The NIR spectra data is randomly divided into train and test sets, and the process is repeated ten times to increase generalization ability. In addition, a convolutional neural network is developed for classifying NIR spectra. Findings: The study reveals that the SVMR, ETR, and PCA-NN models exhibit excellent performance in predicting glucose concentration, with correlation coefficients (R) > 0.99 and determination coefficients (R²)> 0.985. The deep learning model achieves high macro-averaging scores for precision, recall, and F1-measure. These findings demonstrate the effectiveness of machine learning and deep learning methods in optimizing the detection model and improving glucose prediction accuracy. Theoretical Importance: This research contributes to the field by providing a comprehensive analysis of various machine-learning techniques for estimating glucose concentrations from NIR spectra. It also explores the use of deep learning for the classification of indistinguishable NIR spectra. The findings highlight the potential of machine learning and deep learning in enhancing the prediction accuracy of glucose-relevant features. Data Collection and Analysis Procedures: The NIR spectra and corresponding references for glucose concentration are measured in increments of 20 mg/dl. The data is randomly divided into train and test sets, and the models are evaluated using regression analysis and classification metrics. The performance of each model is assessed based on correlation coefficients, determination coefficients, precision, recall, and F1-measure. Question Addressed: The study addresses the question of whether machine learning and deep learning methods can optimize the detection model and improve the accuracy of glucose prediction from NIR spectra. Conclusion: The research demonstrates that machine learning and deep learning methods can effectively predict glucose concentration from NIR spectra. The SVMR, ETR, and PCA-NN models exhibit superior performance, while the deep learning model achieves high classification scores. These findings suggest that machine learning and deep learning techniques can be used to improve the prediction accuracy of glucose-relevant features. Further research is needed to explore their clinical utility in analyzing complex matrices, such as blood glucose levels.

Keywords: machine learning, signal processing, near-infrared spectroscopy, support vector machine, neural network

Procedia PDF Downloads 69
1221 Hybrid GNN Based Machine Learning Forecasting Model For Industrial IoT Applications

Authors: Atish Bagchi, Siva Chandrasekaran

Abstract:

Background: According to World Bank national accounts data, the estimated global manufacturing value-added output in 2020 was 13.74 trillion USD. These manufacturing processes are monitored, modelled, and controlled by advanced, real-time, computer-based systems, e.g., Industrial IoT, PLC, SCADA, etc. These systems measure and manipulate a set of physical variables, e.g., temperature, pressure, etc. Despite the use of IoT, SCADA etc., in manufacturing, studies suggest that unplanned downtime leads to economic losses of approximately 864 billion USD each year. Therefore, real-time, accurate detection, classification and prediction of machine behaviour are needed to minimise financial losses. Although vast literature exists on time-series data processing using machine learning, the challenges faced by the industries that lead to unplanned downtimes are: The current algorithms do not efficiently handle the high-volume streaming data from industrial IoTsensors and were tested on static and simulated datasets. While the existing algorithms can detect significant 'point' outliers, most do not handle contextual outliers (e.g., values within normal range but happening at an unexpected time of day) or subtle changes in machine behaviour. Machines are revamped periodically as part of planned maintenance programmes, which change the assumptions on which original AI models were created and trained. Aim: This research study aims to deliver a Graph Neural Network(GNN)based hybrid forecasting model that interfaces with the real-time machine control systemand can detect, predict machine behaviour and behavioural changes (anomalies) in real-time. This research will help manufacturing industries and utilities, e.g., water, electricity etc., reduce unplanned downtimes and consequential financial losses. Method: The data stored within a process control system, e.g., Industrial-IoT, Data Historian, is generally sampled during data acquisition from the sensor (source) and whenpersistingin the Data Historian to optimise storage and query performance. The sampling may inadvertently discard values that might contain subtle aspects of behavioural changes in machines. This research proposed a hybrid forecasting and classification model which combines the expressive and extrapolation capability of GNN enhanced with the estimates of entropy and spectral changes in the sampled data and additional temporal contexts to reconstruct the likely temporal trajectory of machine behavioural changes. The proposed real-time model belongs to the Deep Learning category of machine learning and interfaces with the sensors directly or through 'Process Data Historian', SCADA etc., to perform forecasting and classification tasks. Results: The model was interfaced with a Data Historianholding time-series data from 4flow sensors within a water treatment plantfor45 days. The recorded sampling interval for a sensor varied from 10 sec to 30 min. Approximately 65% of the available data was used for training the model, 20% for validation, and the rest for testing. The model identified the anomalies within the water treatment plant and predicted the plant's performance. These results were compared with the data reported by the plant SCADA-Historian system and the official data reported by the plant authorities. The model's accuracy was much higher (20%) than that reported by the SCADA-Historian system and matched the validated results declared by the plant auditors. Conclusions: The research demonstrates that a hybrid GNN based approach enhanced with entropy calculation and spectral information can effectively detect and predict a machine's behavioural changes. The model can interface with a plant's 'process control system' in real-time to perform forecasting and classification tasks to aid the asset management engineers to operate their machines more efficiently and reduce unplanned downtimes. A series of trialsare planned for this model in the future in other manufacturing industries.

Keywords: GNN, Entropy, anomaly detection, industrial time-series, AI, IoT, Industry 4.0, Machine Learning

Procedia PDF Downloads 129
1220 Automatic Target Recognition in SAR Images Based on Sparse Representation Technique

Authors: Ahmet Karagoz, Irfan Karagoz

Abstract:

Synthetic Aperture Radar (SAR) is a radar mechanism that can be integrated into manned and unmanned aerial vehicles to create high-resolution images in all weather conditions, regardless of day and night. In this study, SAR images of military vehicles with different azimuth and descent angles are pre-processed at the first stage. The main purpose here is to reduce the high speckle noise found in SAR images. For this, the Wiener adaptive filter, the mean filter, and the median filters are used to reduce the amount of speckle noise in the images without causing loss of data. During the image segmentation phase, pixel values are ordered so that the target vehicle region is separated from other regions containing unnecessary information. The target image is parsed with the brightest 20% pixel value of 255 and the other pixel values of 0. In addition, by using appropriate parameters of statistical region merging algorithm, segmentation comparison is performed. In the step of feature extraction, the feature vectors belonging to the vehicles are obtained by using Gabor filters with different orientation, frequency and angle values. A number of Gabor filters are created by changing the orientation, frequency and angle parameters of the Gabor filters to extract important features of the images that form the distinctive parts. Finally, images are classified by sparse representation method. In the study, l₁ norm analysis of sparse representation is used. A joint database of the feature vectors generated by the target images of military vehicle types is obtained side by side and this database is transformed into the matrix form. In order to classify the vehicles in a similar way, the test images of each vehicle is converted to the vector form and l₁ norm analysis of the sparse representation method is applied through the existing database matrix form. As a result, correct recognition has been performed by matching the target images of military vehicles with the test images by means of the sparse representation method. 97% classification success of SAR images of different military vehicle types is obtained.

Keywords: automatic target recognition, sparse representation, image classification, SAR images

Procedia PDF Downloads 345
1219 Regeneration of Geological Models Using Support Vector Machine Assisted by Principal Component Analysis

Authors: H. Jung, N. Kim, B. Kang, J. Choe

Abstract:

History matching is a crucial procedure for predicting reservoir performances and making future decisions. However, it is difficult due to uncertainties of initial reservoir models. Therefore, it is important to have reliable initial models for successful history matching of highly heterogeneous reservoirs such as channel reservoirs. In this paper, we proposed a novel scheme for regenerating geological models using support vector machine (SVM) and principal component analysis (PCA). First, we perform PCA for figuring out main geological characteristics of models. Through the procedure, permeability values of each model are transformed to new parameters by principal components, which have eigenvalues of large magnitude. Secondly, the parameters are projected into two-dimensional plane by multi-dimensional scaling (MDS) based on Euclidean distances. Finally, we train an SVM classifier using 20% models which show the most similar or dissimilar well oil production rates (WOPR) with the true values (10% for each). Then, the other 80% models are classified by trained SVM. We select models on side of low WOPR errors. One hundred channel reservoir models are initially generated by single normal equation simulation. By repeating the classification process, we can select models which have similar geological trend with the true reservoir model. The average field of the selected models is utilized as a probability map for regeneration. Newly generated models can preserve correct channel features and exclude wrong geological properties maintaining suitable uncertainty ranges. History matching with the initial models cannot provide trustworthy results. It fails to find out correct geological features of the true model. However, history matching with the regenerated ensemble offers reliable characterization results by figuring out proper channel trend. Furthermore, it gives dependable prediction of future performances with reduced uncertainties. We propose a novel classification scheme which integrates PCA, MDS, and SVM for regenerating reservoir models. The scheme can easily sort out reliable models which have similar channel trend with the reference in lowered dimension space.

Keywords: history matching, principal component analysis, reservoir modelling, support vector machine

Procedia PDF Downloads 141