Search results for: Predictive Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7676

Search results for: Predictive Data Mining

7226 Prediction of a Human Facial Image by ANN using Image Data and its Content on Web Pages

Authors: Chutimon Thitipornvanid, Siripun Sanguansintukul

Abstract:

Choosing the right metadata is a critical, as good information (metadata) attached to an image will facilitate its visibility from a pile of other images. The image-s value is enhanced not only by the quality of attached metadata but also by the technique of the search. This study proposes a technique that is simple but efficient to predict a single human image from a website using the basic image data and the embedded metadata of the image-s content appearing on web pages. The result is very encouraging with the prediction accuracy of 95%. This technique may become a great assist to librarians, researchers and many others for automatically and efficiently identifying a set of human images out of a greater set of images.

Keywords: Metadata, Prediction, Multi-layer perceptron, Human facial image, Image mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1193
7225 Field Trial of Resin-Based Composite Materials for the Treatment of Surface Collapses Associated with Former Shallow Coal Mining

Authors: Philip T. Broughton, Mark P. Bettney, Isla L. Smail

Abstract:

Effective treatment of ground instability is essential when managing the impacts associated with historic mining. A field trial was undertaken by the Coal Authority to investigate the geotechnical performance and potential use of composite materials comprising resin and fill or stone to safely treat surface collapses, such as crown-holes, associated with shallow mining. Test pits were loosely filled with various granular fill materials. The fill material was injected with commercially available silicate and polyurethane resin foam products. In situ and laboratory testing was undertaken to assess the geotechnical properties of the resultant composite materials. The test pits were subsequently excavated to assess resin permeation. Drilling and resin injection was easiest through clean limestone fill materials. Recycled building waste fill material proved difficult to inject with resin; this material is thus considered unsuitable for use in resin composites. Incomplete resin permeation in several of the test pits created irregular ‘blocks’ of composite. Injected resin foams significantly improve the stiffness and resistance (strength) of the un-compacted fill material. The stiffness of the treated fill material appears to be a function of the stone particle size, its associated compaction characteristics (under loose tipping) and the proportion of resin foam matrix. The type of fill material is more critical than the type of resin to the geotechnical properties of the composite materials. Resin composites can effectively support typical design imposed loads. Compared to other traditional treatment options, such as cement grouting, the use of resin composites is potentially less disruptive, particularly for sites with limited access, and thus likely to achieve significant reinstatement cost savings. The use of resin composites is considered a suitable option for the future treatment of shallow mining collapses.

Keywords: Composite material, ground improvement, mining legacy, resin.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1515
7224 Opinion Mining Framework in the Education Domain

Authors: A. M. H. Elyasir, K. S. M. Anbananthen

Abstract:

The internet is growing larger and becoming the most popular platform for the people to share their opinion in different interests. We choose the education domain specifically comparing some Malaysian universities against each other. This comparison produces benchmark based on different criteria shared by the online users in various online resources including Twitter, Facebook and web pages. The comparison is accomplished using opinion mining framework to extract, process the unstructured text and classify the result to positive, negative or neutral (polarity). Hence, we divide our framework to three main stages; opinion collection (extraction), unstructured text processing and polarity classification. The extraction stage includes web crawling, HTML parsing, Sentence segmentation for punctuation classification, Part of Speech (POS) tagging, the second stage processes the unstructured text with stemming and stop words removal and finally prepare the raw text for classification using Named Entity Recognition (NER). Last phase is to classify the polarity and present overall result for the comparison among the Malaysian universities. The final result is useful for those who are interested to study in Malaysia, in which our final output declares clear winners based on the public opinions all over the web.

Keywords: Entity Recognition, Education Domain, Opinion Mining, Unstructured Text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2947
7223 A Note on Metallurgy at Khanak: An Indus Site in Tosham Mining Area, Haryana

Authors: Ravindra N. Singh, Dheerendra P. Singh

Abstract:

Recent discoveries of Bronze Age artefacts, tin slag, furnaces and crucibles, together with new geological evidence on tin deposits in Tosham area of Bhiwani district in Haryana (India) provide the opportunity to survey the evidence for possible sources of tin and the use of bronze in the Harappan sites of north western India. Earlier, Afghanistan emerged as the most promising eastern source of tin utilized by Indus Civilization copper-smiths. Our excavations conducted at Khanak near Tosham mining area during 2014 and 2016 revealed ample evidence of metallurgical activities as attested by the occurrence of slag, ores and evidences of ashes and fragments of furnaces in addition to the bronze objects. We have conducted petrological, XRD, EDAX, TEM, SEM and metallography on the slag, ores, crucible fragments and bronze objects samples recovered from Khanak excavations. This has given positive indication of mining and metallurgy of poly-mettalic Tin at the site; however, it can only be ascertained after the detailed scientific examination of the materials which is underway. In view of the importance of site, we intend to excavate the site horizontally in future so as to obtain more samples for scientific studies.

Keywords: Archaeometallurgy, problem of tin, metallography, Indus civilization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1975
7222 A Brain Inspired Approach for Multi-View Patterns Identification

Authors: Yee Ling Boo, Damminda Alahakoon

Abstract:

Biologically human brain processes information in both unimodal and multimodal approaches. In fact, information is progressively abstracted and seamlessly fused. Subsequently, the fusion of multimodal inputs allows a holistic understanding of a problem. The proliferation of technology has exponentially produced various sources of data, which could be likened to being the state of multimodality in human brain. Therefore, this is an inspiration to develop a methodology for exploring multimodal data and further identifying multi-view patterns. Specifically, we propose a brain inspired conceptual model that allows exploration and identification of patterns at different levels of granularity, different types of hierarchies and different types of modalities. A structurally adaptive neural network is deployed to implement the proposed model. Furthermore, the acquisition of multi-view patterns with the proposed model is demonstrated and discussed with some experimental results.

Keywords: Multimodal, Granularity, Hierarchical Clustering, Growing Self Organising Maps, Data Mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
7221 A Distance Function for Data with Missing Values and Its Application

Authors: Loai AbdAllah, Ilan Shimshoni

Abstract:

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Keywords: Missing values, Distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2726
7220 Design and Development of Real-Time Optimal Energy Management System for Hybrid Electric Vehicles

Authors: Masood Roohi, Amir Taghavipour

Abstract:

This paper describes a strategy to develop an energy management system (EMS) for a charge-sustaining power-split hybrid electric vehicle. This kind of hybrid electric vehicles (HEVs) benefit from the advantages of both parallel and series architecture. However, it gets relatively more complicated to manage power flow between the battery and the engine optimally. The applied strategy in this paper is based on nonlinear model predictive control approach. First of all, an appropriate control-oriented model which was accurate enough and simple was derived. Towards utilization of this controller in real-time, the problem was solved off-line for a vast area of reference signals and initial conditions and stored the computed manipulated variables inside look-up tables. Look-up tables take a little amount of memory. Also, the computational load dramatically decreased, because to find required manipulated variables the controller just needed a simple interpolation between tables.

Keywords: Hybrid electric vehicles, energy management system, nonlinear model predictive control, real-time.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1343
7219 Periodic Control of a Wastewater Treatment Process to Improve Productivity

Authors: Muhammad Rizwan Azhar, Emadadeen Ali

Abstract:

In this paper, periodic force operation of a wastewater treatment process has been studied for the improved process performance. A previously developed dynamic model for the process is used to conduct the performance analysis. The static version of the model was utilized first to determine the optimal productivity conditions for the process. Then, feed flow rate in terms of dilution rate i.e. (D) is transformed into sinusoidal function. Nonlinear model predictive control algorithm is utilized to regulate the amplitude and period of the sinusoidal function. The parameters of the feed cyclic functions are determined which resulted in improved productivity than the optimal productivity under steady state conditions. The improvement in productivity is found to be marginal and is satisfactory in substrate conversion compared to that of the optimal condition and to the steady state condition, which corresponds to the average value of the periodic function. Successful results were also obtained in the presence of modeling errors and external disturbances.

Keywords: Dilution rate, nonlinear model predictive control, sinusoidal function, wastewater treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2190
7218 A Study on the Differential Diagnostic Model for Newborn Hearing Loss Screening

Authors: Chun-Lang Chang

Abstract:

According to the statistics, the prevalence of congenital hearing loss in Taiwan is approximately six thousandths; furthermore, one thousandths of infants have severe hearing impairment. Hearing ability during infancy has significant impact in the development of children-s oral expressions, language maturity, cognitive performance, education ability and social behaviors in the future. Although most children born with hearing impairment have sensorineural hearing loss, almost every child more or less still retains some residual hearing. If provided with a hearing aid or cochlear implant (a bionic ear) timely in addition to hearing speech training, even severely hearing-impaired children can still learn to talk. On the other hand, those who failed to be diagnosed and thus unable to begin hearing and speech rehabilitations on a timely manner might lose an important opportunity to live a complete and healthy life. Eventually, the lack of hearing and speaking ability will affect the development of both mental and physical functions, intelligence, and social adaptability. Not only will this problem result in an irreparable regret to the hearing-impaired child for the life time, but also create a heavy burden for the family and society. Therefore, it is necessary to establish a set of computer-assisted predictive model that can accurately detect and help diagnose newborn hearing loss so that early interventions can be provided timely to eliminate waste of medical resources. This study uses information from the neonatal database of the case hospital as the subjects, adopting two different analysis methods of using support vector machine (SVM) for model predictions and using logistic regression to conduct factor screening prior to model predictions in SVM to examine the results. The results indicate that prediction accuracy is as high as 96.43% when the factors are screened and selected through logistic regression. Hence, the model constructed in this study will have real help in clinical diagnosis for the physicians and actually beneficial to the early interventions of newborn hearing impairment.

Keywords: Data mining, Hearing impairment, Logistic regression analysis, Support vector machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1785
7217 Combined Model Predictive Controller Technique for Enhancing NAO Gait Stabilization

Authors: Brahim Brahmi, Mohammed Hamza Laraki, Mohammad Habibur Rahman, Islam M. Rasedul, M. Assad Uz-Zaman

Abstract:

The humanoid robot, specifically the NAO robot must be able to provide a highly dynamic performance on the soccer field. Maintaining the balance of the humanoid robot during the required motion is considered as one of a challenging problems especially when the robot is subject to external disturbances, as contact with other robots. In this paper, a dynamic controller is proposed in order to ensure a robust walking (stabilization) and to improve the dynamic balance of the robot during its contact with the environment (external disturbances). The generation of the trajectory of the center of mass (CoM) is done by a model predictive controller (MPC) conjoined with zero moment point (ZMP) technique. Taking into account the properties of the rotational dynamics of the whole-body system, a modified previous control mixed with feedback control is employed to manage the angular momentum and the CoM’s acceleration, respectively. This latter is dedicated to provide a robust gait of the robot in the presence of the external disturbances. Simulation results are presented to show the feasibility of the proposed strategy.

Keywords: Preview control, walking, stabilization, humanoid robot.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 571
7216 Assessing Habitat-Suitability Models with a Virtual Species at Khao Nan National Park, Thailand

Authors: W. Srisang, K. Jaroensutasinee, M. Jaroensutasinee

Abstract:

This study examined a habitat-suitability assessment method namely the Ecological Niche Factor Analysis (ENFA). A virtual species was created and then dispatched in a geographic information system model of a real landscape in three historic scenarios: (1) spreading, (2) equilibrium, and (3) overabundance. In each scenario, the virtual species was sampled and these simulated data sets were used as inputs for the ENFA to reconstruct the habitat suitability model. The 'equilibrium' scenario gives the highest quantity and quality among three scenarios. ENFA was sensitive to the distribution scenarios but not sensitive to sample sizes. The use of a virtual species proved to be a very efficient method, allowing one to fully control the quality of the input data as well as to accurately evaluate the predictive power of the analyses.

Keywords: Habitat-Suitability Models, Ecological niche factoranalysis, Climatic factors, Geographic information system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790
7215 Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction

Authors: Jérôme Azé, Mathieu Roche, Yves Kodratoff, Michèle Sebag

Abstract:

Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).

Keywords: Text-mining, Terminology Extraction, Evolutionary algorithm, ROC Curve.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643
7214 Predictive Factors of Exercise Behaviors of Junior High School Students in Chonburi Province

Authors: Tanida Julvanichpong

Abstract:

Exercise has been regarded as a necessary and important aspect to enhance physical performance and psychology health. Body weight statistics of students in junior high school students in Chonburi Province beyond a standard risk of obesity. Promoting exercise among Junior high school students in Chonburi Province, essential knowledge concerning factors influencing exercise is needed. Therefore, this study aims to (1) determine the levels of perceived exercise behavior, exercise behavior in the past, perceived barriers to exercise, perceived benefits of exercise, perceived self-efficacy to exercise, feelings associated with exercise behavior, influence of the family to exercise, influence of friends to exercise, and the perceived influence of the environment on exercise. (2) examine the predicting ability of each of the above factors while including personal factors (sex, educational level) for exercise behavior. Pender’s Health Promotion Model was used as a guide for the study. Sample included 652 students in junior high schools, Chonburi Provience. The samples were selected by Multi-Stage Random Sampling. Data Collection has been done by using self-administered questionnaires. Data were analyzed using descriptive statistics, Pearson’s product moment correlation coefficient, Eta, and stepwise multiple regression analysis. The research results showed that: 1. Perceived benefits of exercise, influence of teacher, influence of environmental, feelings associated with exercise behavior were at a high level. Influence of the family to exercise, exercise behavior, exercise behavior in the past, perceived self-efficacy to exercise and influence of friends were at a moderate level. Perceived barriers to exercise were at a low level. 2. Exercise behavior was positively significant related to perceived benefits of exercise, influence of the family to exercise, exercise behavior in the past, perceived self-efficacy to exercise, influence of friends, influence of teacher, influence of environmental and feelings associated with exercise behavior (p < .01, respectively) and was negatively significant related to educational level and perceived barriers to exercise (p < .01, respectively). Exercise behavior was significant related to sex (Eta = 0.243, p=.000). 3. Exercise behavior in the past, influence of the family to exercise significantly contributed 60.10 percent of the variance to the prediction of exercise behavior in male students (p < .01). Exercise behavior in the past, perceived self-efficacy to exercise, perceived barriers to exercise, and educational level significantly contributed 52.60 percent of the variance to the prediction of exercise behavior in female students (p < .01).

Keywords: Predictive factors, exercise behaviors, junior high school.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1161
7213 Hydrogeological Risk and Mining Tunnels: the Fontane-Rodoretto Mine Turin (Italy)

Authors: Paola Gattinoni, Laura Scesi, Elena Cerino Adbin, Daniele Cremonesi

Abstract:

The interaction of tunneling or mining with groundwater has become a very relevant problem not only due to the need to guarantee the safety of workers and to assure the efficiency of the tunnel drainage systems, but also to safeguard water resources from impoverishment and pollution risk. Therefore it is very important to forecast the drainage processes (i.e., the evaluation of drained discharge and drawdown caused by the excavation). The aim of this study was to know better the system and to quantify the flow drained from the Fontane mines, located in Val Germanasca (Turin, Italy). This allowed to understand the hydrogeological local changes in time. The work has therefore been structured as follows: the reconstruction of the conceptual model with the geological, hydrogeological and geological-structural study; the calculation of the tunnel inflows (through the use of structural methods) and the comparison with the measured flow rates; the water balance at the basin scale. In this way it was possible to understand what are the relationships between rainfall, groundwater level variations and the effect of the presence of tunnels as a means of draining water. Subsequently, it the effects produced by the excavation of the mining tunnels was quantified, through numerical modeling. In particular, the modeling made it possible to observe the drawdown variation as a function of number, excavation depth and different mines linings.

Keywords: Groundwater, Italy, numerical model, tunneling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1902
7212 An Engineering Approach to Forecast Volatility of Financial Indices

Authors: Irwin Ma, Tony Wong, Thiagas Sankar

Abstract:

By systematically applying different engineering methods, difficult financial problems become approachable. Using a combination of theory and techniques such as wavelet transform, time series data mining, Markov chain based discrete stochastic optimization, and evolutionary algorithms, this work formulated a strategy to characterize and forecast non-linear time series. It attempted to extract typical features from the volatility data sets of S&P100 and S&P500 indices that include abrupt drops, jumps and other non-linearity. As a result, accuracy of forecasting has reached an average of over 75% surpassing any other publicly available results on the forecast of any financial index.

Keywords: Discrete stochastic optimization, genetic algorithms, genetic programming, volatility forecast

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1613
7211 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

Authors: Tatjana Eitrich, Bruno Lang

Abstract:

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.

Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1420
7210 Utilization of Process Mapping Tool to Enhance Production Drilling in Underground Metal Mining Operations

Authors: Sidharth Talan, Sanjay Kumar Sharma, Eoin Joseph Wallace, Nikita Agrawal

Abstract:

Underground mining is at the core of rapidly evolving metals and minerals sector due to the increasing mineral consumption globally. Even though the surface mines are still more abundant on earth, the scales of industry are slowly tipping towards underground mining due to rising depth and complexities of orebodies. Thus, the efficient and productive functioning of underground operations depends significantly on the synchronized performance of key elements such as operating site, mining equipment, manpower and mine services. Production drilling is the process of conducting long hole drilling for the purpose of charging and blasting these holes for the production of ore in underground metal mines. Thus, production drilling is the crucial segment in the underground metal mining value chain. This paper presents the process mapping tool to evaluate the production drilling process in the underground metal mining operation by dividing the given process into three segments namely Input, Process and Output. The three segments are further segregated into factors and sub-factors. As per the study, the major input factors crucial for the efficient functioning of production drilling process are power, drilling water, geotechnical support of the drilling site, skilled drilling operators, services installation crew, oils and drill accessories for drilling machine, survey markings at drill site, proper housekeeping, regular maintenance of drill machine, suitable transportation for reaching the drilling site and finally proper ventilation. The major outputs for the production drilling process are ore, waste as a result of dilution, timely reporting and investigation of unsafe practices, optimized process time and finally well fragmented blasted material within specifications set by the mining company. The paper also exhibits the drilling loss matrix, which is utilized to appraise the loss in planned production meters per day in a mine on account of availability loss in the machine due to breakdowns, underutilization of the machine and productivity loss in the machine measured in drilling meters per unit of percussion hour with respect to its planned productivity for the day. The given three losses would be essential to detect the bottlenecks in the process map of production drilling operation so as to instigate the action plan to suppress or prevent the causes leading to the operational performance deficiency. The given tool is beneficial to mine management to focus on the critical factors negatively impacting the production drilling operation and design necessary operational and maintenance strategies to mitigate them. 

Keywords: Process map, drilling loss matrix, availability, utilization, productivity, percussion rate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1057
7209 Sequential Partitioning Brainbow Image Segmentation Using Bayesian

Authors: Yayun Hsu, Henry Horng-Shing Lu

Abstract:

This paper proposes a data-driven, biology-inspired neural segmentation method of 3D drosophila Brainbow images. We use Bayesian Sequential Partitioning algorithm for probabilistic modeling, which can be used to detect somas and to eliminate crosstalk effects. This work attempts to develop an automatic methodology for neuron image segmentation, which nowadays still lacks a complete solution due to the complexity of the image. The proposed method does not need any predetermined, risk-prone thresholds, since biological information is inherently included inside the image processing procedure. Therefore, it is less sensitive to variations in neuron morphology; meanwhile, its flexibility would be beneficial for tracing the intertwining structure of neurons.

Keywords: Brainbow, 3D imaging, image segmentation, neuron morphology, biological data mining, non-parametric learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2230
7208 Inverse Sets-based Recognition of Video Clips

Authors: Alexei M. Mikhailov

Abstract:

The paper discusses the mathematics of pattern indexing and its applications to recognition of visual patterns that are found in video clips. It is shown that (a) pattern indexes can be represented by collections of inverted patterns, (b) solutions to pattern classification problems can be found as intersections and histograms of inverted patterns and, thus, matching of original patterns avoided.

Keywords: Artificial neural cortex, computational biology, data mining, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2097
7207 Influenza Pattern Analysis System through Mining Weblogs

Authors: Pei Lin Khoo, Yunli Lee

Abstract:

Weblogs are resource of social structure to discover and track the various type of information written by blogger. In this paper, we proposed to use mining weblogs technique for identifying the trends of influenza where blogger had disseminated their opinion for the anomaly disease. In order to identify the trends, web crawler is applied to perform a search and generated a list of visited links based on a set of influenza keywords. This information is used to implement the analytics report system for monitoring and analyzing the pattern and trends of influenza (H1N1). Statistical and graphical analysis reports are generated. Both types of the report have shown satisfactory reports that reflect the awareness of Malaysian on the issue of influenza outbreak through blogs.

Keywords: H1N1, Weblogs, Web Crawler, Analytics Report System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2447
7206 An Artificial Neural Network Based Model for Predicting H2 Production Rates in a Sucrose-Based Bioreactor System

Authors: Nikhil, Bestamin Özkaya, Ari Visa, Chiu-Yue Lin, Jaakko A. Puhakka, Olli Yli-Harja

Abstract:

The performance of a sucrose-based H2 production in a completely stirred tank reactor (CSTR) was modeled by neural network back-propagation (BP) algorithm. The H2 production was monitored over a period of 450 days at 35±1 ºC. The proposed model predicts H2 production rates based on hydraulic retention time (HRT), recycle ratio, sucrose concentration and degradation, biomass concentrations, pH, alkalinity, oxidation-reduction potential (ORP), acids and alcohols concentrations. Artificial neural networks (ANNs) have an ability to capture non-linear information very efficiently. In this study, a predictive controller was proposed for management and operation of large scale H2-fermenting systems. The relevant control strategies can be activated by this method. BP based ANNs modeling results was very successful and an excellent match was obtained between the measured and the predicted rates. The efficient H2 production and system control can be provided by predictive control method combined with the robust BP based ANN modeling tool.

Keywords: Back-propagation, biohydrogen, bioprocessmodeling, neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745
7205 IT Systems of the US Federal Courts, Justice, and Governance

Authors: Joseph Zernik

Abstract:

Validity, integrity, and impacts of the IT systems of the US federal courts have been studied as part of the Human Rights Alert-NGO (HRA) submission for the 2015 Universal Periodic Review (UPR) of human rights in the United States by the Human Rights Council (HRC) of the United Nations (UN). The current report includes overview of IT system analysis, data-mining and case studies. System analysis and data-mining show: Development and implementation with no lawful authority, servers of unverified identity, invalidity in implementation of electronic signatures, authentication instruments and procedures, authorities and permissions; discrimination in access against the public and unrepresented (pro se) parties and in favor of attorneys; widespread publication of invalid judicial records and dockets, leading to their false representation and false enforcement. A series of case studies documents the impacts on individuals' human rights, on banking regulation, and on international matters. Significance is discussed in the context of various media and expert reports, which opine unprecedented corruption of the US justice system today, and which question, whether the US Constitution was in fact suspended. Similar findings were previously reported in IT systems of the State of California and the State of Israel, which were incorporated, subject to professional HRC staff review, into the UN UPR reports (2010 and 2013). Solutions are proposed, based on the principles of publicity of the law and the separation of power: Reliance on US IT and legal experts under accountability to the legislative branch, enhancing transparency, ongoing vigilance by human rights and internet activists. IT experts should assume more prominent civic duties in the safeguard of civil society in our era.

Keywords: E-justice, federal courts, United States, human rights, banking regulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2120
7204 The Robust Clustering with Reduction Dimension

Authors: Dyah E. Herwindiati

Abstract:

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper

Keywords: Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1681
7203 Improved C-Fuzzy Decision Tree for Intrusion Detection

Authors: Krishnamoorthi Makkithaya, N. V. Subba Reddy, U. Dinesh Acharya

Abstract:

As the number of networked computers grows, intrusion detection is an essential component in keeping networks secure. Various approaches for intrusion detection are currently being in use with each one has its own merits and demerits. This paper presents our work to test and improve the performance of a new class of decision tree c-fuzzy decision tree to detect intrusion. The work also includes identifying best candidate feature sub set to build the efficient c-fuzzy decision tree based Intrusion Detection System (IDS). We investigated the usefulness of c-fuzzy decision tree for developing IDS with a data partition based on horizontal fragmentation. Empirical results indicate the usefulness of our approach in developing the efficient IDS.

Keywords: Data mining, Decision tree, Feature selection, Fuzzyc- means clustering, Intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559
7202 Comparison of Machine Learning Techniques for Single Imputation on Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125 Hz to 8000 Hz. The data contain patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R2 values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R2 values for the best models for KNN ranges from .89 to .95. The best imputation models received R2 between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our imputation models versus constant imputations by a two percent increase.

Keywords: Machine Learning, audiograms, data imputations, single imputations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 99
7201 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

Authors: Benjamin D. Leiby, Darryl K. Ahner

Abstract:

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known  values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions, while presenting a need for further refinement that mimics predictive mean matching.

Keywords: Correlation, country conflict, imputation, stochastic regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 389
7200 Integration of Support Vector Machine and Bayesian Neural Network for Data Mining and Classification

Authors: Essam Al-Daoud

Abstract:

Several combinations of the preprocessing algorithms, feature selection techniques and classifiers can be applied to the data classification tasks. This study introduces a new accurate classifier, the proposed classifier consist from four components: Signal-to- Noise as a feature selection technique, support vector machine, Bayesian neural network and AdaBoost as an ensemble algorithm. To verify the effectiveness of the proposed classifier, seven well known classifiers are applied to four datasets. The experiments show that using the suggested classifier enhances the classification rates for all datasets.

Keywords: AdaBoost, Bayesian neural network, Signal-to-Noise, support vector machine, MCMC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999
7199 Time Series Simulation by Conditional Generative Adversarial Net

Authors: Rao Fu, Jie Chen, Shutian Zeng, Yiping Zhuang, Agus Sudjianto

Abstract:

Generative Adversarial Net (GAN) has proved to be a powerful machine learning tool in image data analysis and generation. In this paper, we propose to use Conditional Generative Adversarial Net (CGAN) to learn and simulate time series data. The conditions include both categorical and continuous variables with different auxiliary information. Our simulation studies show that CGAN has the capability to learn different types of normal and heavy-tailed distributions, as well as dependent structures of different time series. It also has the capability to generate conditional predictive distributions consistent with training data distributions. We also provide an in-depth discussion on the rationale behind GAN and the neural networks as hierarchical splines to establish a clear connection with existing statistical methods of distribution generation. In practice, CGAN has a wide range of applications in market risk and counterparty risk analysis: it can be applied to learn historical data and generate scenarios for the calculation of Value-at-Risk (VaR) and Expected Shortfall (ES), and it can also predict the movement of the market risk factors. We present a real data analysis including a backtesting to demonstrate that CGAN can outperform Historical Simulation (HS), a popular method in market risk analysis to calculate VaR. CGAN can also be applied in economic time series modeling and forecasting. In this regard, we have included an example of hypothetical shock analysis for economic models and the generation of potential CCAR scenarios by CGAN at the end of the paper.

Keywords: Conditional Generative Adversarial Net, market and credit risk management, neural network, time series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1157
7198 A Comparison of Artificial Neural Networks for Prediction of Suspended Sediment Discharge in River- A Case Study in Malaysia

Authors: M.R. Mustafa, M.H. Isa, R.B. Rezaur

Abstract:

Prediction of highly non linear behavior of suspended sediment flow in rivers has prime importance in the field of water resources engineering. In this study the predictive performance of two Artificial Neural Networks (ANNs) namely, the Radial Basis Function (RBF) Network and the Multi Layer Feed Forward (MLFF) Network have been compared. Time series data of daily suspended sediment discharge and water discharge at Pari River was used for training and testing the networks. A number of statistical parameters i.e. root mean square error (RMSE), mean absolute error (MAE), coefficient of efficiency (CE) and coefficient of determination (R2) were used for performance evaluation of the models. Both the models produced satisfactory results and showed a good agreement between the predicted and observed data. The RBF network model provided slightly better results than the MLFF network model in predicting suspended sediment discharge.

Keywords: ANN, discharge, modeling, prediction, suspendedsediment,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1704
7197 Implementing an Intuitive Reasoner with a Large Weather Database

Authors: Yung-Chien Sun, O. Grant Clark

Abstract:

In this paper, the implementation of a rule-based intuitive reasoner is presented. The implementation included two parts: the rule induction module and the intuitive reasoner. A large weather database was acquired as the data source. Twelve weather variables from those data were chosen as the “target variables" whose values were predicted by the intuitive reasoner. A “complex" situation was simulated by making only subsets of the data available to the rule induction module. As a result, the rules induced were based on incomplete information with variable levels of certainty. The certainty level was modeled by a metric called "Strength of Belief", which was assigned to each rule or datum as ancillary information about the confidence in its accuracy. Two techniques were employed to induce rules from the data subsets: decision tree and multi-polynomial regression, respectively for the discrete and the continuous type of target variables. The intuitive reasoner was tested for its ability to use the induced rules to predict the classes of the discrete target variables and the values of the continuous target variables. The intuitive reasoner implemented two types of reasoning: fast and broad where, by analogy to human thought, the former corresponds to fast decision making and the latter to deeper contemplation. . For reference, a weather data analysis approach which had been applied on similar tasks was adopted to analyze the complete database and create predictive models for the same 12 target variables. The values predicted by the intuitive reasoner and the reference approach were compared with actual data. The intuitive reasoner reached near-100% accuracy for two continuous target variables. For the discrete target variables, the intuitive reasoner predicted at least 70% as accurately as the reference reasoner. Since the intuitive reasoner operated on rules derived from only about 10% of the total data, it demonstrated the potential advantages in dealing with sparse data sets as compared with conventional methods.

Keywords: Artificial intelligence, intuition, knowledge acquisition, limited certainty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365