Search results for: Missing Data Techniques.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9101

Search results for: Missing Data Techniques.

9011 Microgrid: Low Power Network Topology and Control

Authors: Amit Sachan

Abstract:

The network designing and data modeling developments which are the two significant research tasks in direction to tolerate power control of Microgrid concluded using IEC 61850 data models and facilities. The current casing areas of IEC 61580 include infrastructures in substation automation systems, among substations and to DERs. So, for LV microgrid power control, previously using the IEC 61850 amenities to control the smart electrical devices, we have to model those devices as IEC 61850 data models and design a network topology to maintenance all-in-one communiqué amid those devices. In adding, though IEC 61850 assists modeling a portion by open-handed several object models for common functions similar measurement, metering, monitoring…etc., there are motionless certain missing smithereens for building a multiplicity of functions for household appliances like tuning the temperature of an electric heater or refrigerator.

Keywords: IEC 61850, RCMC, HCMC, DER Unit Controller.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2461
9010 Machine Scoring Model Using Data Mining Techniques

Authors: Wimalin S. Laosiritaworn, Pongsak Holimchayachotikul

Abstract:

this article proposed a methodology for computer numerical control (CNC) machine scoring. The case study company is a manufacturer of hard disk drive parts in Thailand. In this company, sample of parts manufactured from CNC machine are usually taken randomly for quality inspection. These inspection data were used to make a decision to shut down the machine if it has tendency to produce parts that are out of specification. Large amount of data are produced in this process and data mining could be very useful technique in analyzing them. In this research, data mining techniques were used to construct a machine scoring model called 'machine priority assessment model (MPAM)'. This model helps to ensure that the machine with higher risk of producing defective parts be inspected before those with lower risk. If the defective prone machine is identified sooner, defective part and rework could be reduced hence improving the overall productivity. The results showed that the proposed method can be successfully implemented and approximately 351,000 baht of opportunity cost could have saved in the case study company.

Keywords: Computer Numerical Control, Data Mining, HardDisk Drive.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1356
9009 Multivariate Analysis of Spectroscopic Data for Agriculture Applications

Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman

Abstract:

In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.

Keywords: Brown rot disease, NIR spectroscopy, potato, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 815
9008 The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making

Authors: Nevena Stolba, A Min Tjoa

Abstract:

Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.

Keywords: data mining, data warehousing, decision-support systems, evidence-based medicine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3738
9007 Web Content Mining: A Solution to Consumer's Product Hunt

Authors: Syed Salman Ahmed, Zahid Halim, Rauf Baig, Shariq Bashir

Abstract:

With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.

Keywords: Data mining, web mining, search engines, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2008
9006 Assessing and Visualizing the Stability of Feature Selectors: A Case Study with Spectral Data

Authors: R.Guzman-Martinez, Oscar Garcia-Olalla, R.Alaiz-Rodriguez

Abstract:

Feature selection plays an important role in applications with high dimensional data. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the dataset is small and the aim is to gain insight into the underlying process by analyzing the most relevant features. In this work, we propose a graphical approach that enables to analyze the similarity between feature ranking techniques as well as their individual stability. Moreover, it works with whatever stability metric (Canberra distance, Spearman's rank correlation coefficient, Kuncheva's stability index,...). We illustrate this visualization technique evaluating the stability of several feature selection techniques on a spectral binary dataset. Experimental results with a neural-based classifier show that stability and ranking quality may not be linked together and both issues have to be studied jointly in order to offer answers to the domain experts.

Keywords: Feature Selection Stability, Spectral data, Data visualization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480
9005 Estimation of Missing or Incomplete Data in Road Performance Measurement Systems

Authors: Kristjan Kuhi, Kati K. Kaare, Ott Koppel

Abstract:

Modern management in most fields is performance based; both planning and implementation of maintenance and operational activities are driven by appropriately defined performance indicators. Continuous real-time data collection for management is becoming feasible due to technological advancements. Outdated and insufficient input data may result in incorrect decisions. When using deterministic models the uncertainty of the object state is not visible thus applying the deterministic models are more likely to give false diagnosis. Constructing structured probabilistic models of the performance indicators taking into consideration the surrounding indicator environment enables to estimate the trustworthiness of the indicator values. It also assists to fill gaps in data to improve the quality of the performance analysis and management decisions. In this paper authors discuss the application of probabilistic graphical models in the road performance measurement and propose a high-level conceptual model that enables analyzing and predicting more precisely future pavement deterioration based on road utilization.

Keywords: Probabilistic graphical models, performance indicators, road performance management, data collection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
9004 Normalization Discriminant Independent Component Analysis

Authors: Liew Yee Ping, Pang Ying Han, Lau Siong Hoe, Ooi Shih Yin, Housam Khalifa Bashier Babiker

Abstract:

In face recognition, feature extraction techniques attempts to search for appropriate representation of the data. However, when the feature dimension is larger than the samples size, it brings performance degradation. Hence, we propose a method called Normalization Discriminant Independent Component Analysis (NDICA). The input data will be regularized to obtain the most reliable features from the data and processed using Independent Component Analysis (ICA). The proposed method is evaluated on three face databases, Olivetti Research Ltd (ORL), Face Recognition Technology (FERET) and Face Recognition Grand Challenge (FRGC). NDICA showed it effectiveness compared with other unsupervised and supervised techniques.

Keywords: Face recognition, small sample size, regularization, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1920
9003 Tools and Techniques in Risk Assessment in Public Risk Management Organisations

Authors: Atousa Khodadadyan, Gabe Mythen, Hirbod Assa, Beverley Bishop

Abstract:

Risk assessment and the knowledge provided through this process is a crucial part of any decision-making process in the management of risks and uncertainties. Failure in assessment of risks can cause inadequacy in the entire process of risk management, which in turn can lead to failure in achieving organisational objectives as well as having significant damaging consequences on populations affected by the potential risks being assessed. The choice of tools and techniques in risk assessment can influence the degree and scope of decision-making and subsequently the risk response strategy. There are various available qualitative and quantitative tools and techniques that are deployed within the broad process of risk assessment. The sheer diversity of tools and techniques available to practitioners makes it difficult for organisations to consistently employ the most appropriate methods. This tools and techniques adaptation is rendered more difficult in public risk regulation organisations due to the sensitive and complex nature of their activities. This is particularly the case in areas relating to the environment, food, and human health and safety, when organisational goals are tied up with societal, political and individuals’ goals at national and international levels. Hence, recognising, analysing and evaluating different decision support tools and techniques employed in assessing risks in public risk management organisations was considered. This research is part of a mixed method study which aimed to examine the perception of risk assessment and the extent to which organisations practise risk assessment’ tools and techniques. The study adopted a semi-structured questionnaire with qualitative and quantitative data analysis to include a range of public risk regulation organisations from the UK, Germany, France, Belgium and the Netherlands. The results indicated the public risk management organisations mainly use diverse tools and techniques in the risk assessment process. The primary hazard analysis; brainstorming; hazard analysis and critical control points were described as the most practiced risk identification techniques. Within qualitative and quantitative risk analysis, the participants named the expert judgement, risk probability and impact assessment, sensitivity analysis and data gathering and representation as the most practised techniques.

Keywords: Decision-making, public risk management organisations, risk assessment, tools and techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571
9002 Estimation of PM2.5 Emissions and Source Apportionment Using Receptor and Dispersion Models

Authors: Swetha Priya Darshini Thammadi, Sateesh Kumar Pisini, Sanjay Kumar Shukla

Abstract:

Source apportionment using Dispersion model depends primarily on the quality of Emission Inventory. In the present study, a CMB receptor model has been used to identify the sources of PM2.5, while the AERMOD dispersion model has been used to account for missing sources of PM2.5 in the Emission Inventory. A statistical approach has been developed to quantify the missing sources not considered in the Emission Inventory. The inventory of each grid was improved by adjusting emissions based on road lengths and deficit in measured and modelled concentrations. The results showed that in CMB analyses, fugitive sources - soil and road dust - contribute significantly to ambient PM2.5 pollution. As a result, AERMOD significantly underestimated the ambient air concentration at most locations. The revised Emission Inventory showed a significant improvement in AERMOD performance which is evident through statistical tests.

Keywords: CMB, GIS, AERMOD, PM2.5, fugitive, emission inventory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 833
9001 A Consistency Protocol Multi-Layer for Replicas Management in Large Scale Systems

Authors: Ghalem Belalem, Yahya Slimani

Abstract:

Large scale systems such as computational Grid is a distributed computing infrastructure that can provide globally available network resources. The evolution of information processing systems in Data Grid is characterized by a strong decentralization of data in several fields whose objective is to ensure the availability and the reliability of the data in the reason to provide a fault tolerance and scalability, which cannot be possible only with the use of the techniques of replication. Unfortunately the use of these techniques has a height cost, because it is necessary to maintain consistency between the distributed data. Nevertheless, to agree to live with certain imperfections can improve the performance of the system by improving competition. In this paper, we propose a multi-layer protocol combining the pessimistic and optimistic approaches conceived for the data consistency maintenance in large scale systems. Our approach is based on a hierarchical representation model with tree layers, whose objective is with double vocation, because it initially makes it possible to reduce response times compared to completely pessimistic approach and it the second time to improve the quality of service compared to an optimistic approach.

Keywords: Data Grid, replication, consistency, optimistic approach, pessimistic approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1524
9000 A Reasoning Method of Cyber-Attack Attribution Based on Threat Intelligence

Authors: Li Qiang, Yang Ze-Ming, Liu Bao-Xu, Jiang Zheng-Wei

Abstract:

With the increasing complexity of cyberspace security, the cyber-attack attribution has become an important challenge of the security protection systems. The difficult points of cyber-attack attribution were forced on the problems of huge data handling and key data missing. According to this situation, this paper presented a reasoning method of cyber-attack attribution based on threat intelligence. The method utilizes the intrusion kill chain model and Bayesian network to build attack chain and evidence chain of cyber-attack on threat intelligence platform through data calculation, analysis and reasoning. Then, we used a number of cyber-attack events which we have observed and analyzed to test the reasoning method and demo system, the result of testing indicates that the reasoning method can provide certain help in cyber-attack attribution.

Keywords: Reasoning, Bayesian networks, cyber-attack attribution, kill chain, threat intelligence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2624
8999 Probabilistic Approach of Dealing with Uncertainties in Distributed Constraint Optimization Problems and Situation Awareness for Multi-agent Systems

Authors: Sagir M. Yusuf, Chris Baber

Abstract:

In this paper, we describe how Bayesian inferential reasoning will contributes in obtaining a well-satisfied prediction for Distributed Constraint Optimization Problems (DCOPs) with uncertainties. We also demonstrate how DCOPs could be merged to multi-agent knowledge understand and prediction (i.e. Situation Awareness). The DCOPs functions were merged with Bayesian Belief Network (BBN) in the form of situation, awareness, and utility nodes. We describe how the uncertainties can be represented to the BBN and make an effective prediction using the expectation-maximization algorithm or conjugate gradient descent algorithm. The idea of variable prediction using Bayesian inference may reduce the number of variables in agents’ sampling domain and also allow missing variables estimations. Experiment results proved that the BBN perform compelling predictions with samples containing uncertainties than the perfect samples. That is, Bayesian inference can help in handling uncertainties and dynamism of DCOPs, which is the current issue in the DCOPs community. We show how Bayesian inference could be formalized with Distributed Situation Awareness (DSA) using uncertain and missing agents’ data. The whole framework was tested on multi-UAV mission for forest fire searching. Future work focuses on augmenting existing architecture to deal with dynamic DCOPs algorithms and multi-agent information merging.

Keywords: DCOP, multi-agent reasoning, Bayesian reasoning, swarm intelligence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 939
8998 Optimized Vector Quantization for Bayer Color Filter Array

Authors: M. Lakshmi, J. Senthil Kumar

Abstract:

Digital cameras to reduce cost, use an image sensor to capture color images. Color Filter Array (CFA) in digital cameras permits only one of the three primary (red-green-blue) colors to be sensed in a pixel and interpolates the two missing components through a method named demosaicking. Captured data is interpolated into a full color image and compressed in applications. Color interpolation before compression leads to data redundancy. This paper proposes a new Vector Quantization (VQ) technique to construct a VQ codebook with Differential Evolution (DE) Algorithm. The new technique is compared to conventional Linde- Buzo-Gray (LBG) method.

Keywords: Color Filter Array (CFA), Biorthogonal Wavelet, Vector Quantization (VQ), Differential Evolution (DE).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870
8997 Using Data Mining Techniques for Finding Cardiac Outlier Patients

Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi

Abstract:

In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.

Keywords: Data Mining, Clustering, Classification, Drug Utilization..

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1853
8996 Misleading Node Detection and Response Mechanism in Mobile Ad-Hoc Network

Authors: Earleen Jane Fuentes, Regeene Melarese Lim, Franklin Benjamin Tapia, Alexis Pantola

Abstract:

Mobile Ad-hoc Network (MANET) is an infrastructure-less network of mobile devices, also known as nodes. These nodes heavily rely on each other’s resources such as memory, computing power, and energy. Thus, some nodes may become selective in forwarding packets so as to conserve their resources. These nodes are called misleading nodes. Several reputation-based techniques (e.g. CORE, CONFIDANT, LARS, SORI, OCEAN) and acknowledgment-based techniques (e.g. TWOACK, S-TWOACK, EAACK) have been proposed to detect such nodes. These techniques do not appropriately punish misleading nodes. Hence, this paper addresses the limitations of these techniques using a system called MINDRA.

Keywords: Mobile ad-hoc network, selfish nodes, reputation-based techniques, acknowledgment-based techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1337
8995 File System-Based Data Protection Approach

Authors: Jaechun No

Abstract:

As data to be stored in storage subsystems tremendously increases, data protection techniques have become more important than ever, to provide data availability and reliability. In this paper, we present the file system-based data protection (WOWSnap) that has been implemented using WORM (Write-Once-Read-Many) scheme. In the WOWSnap, once WORM files have been created, only the privileged read requests to them are allowed to protect data against any intentional/accidental intrusions. Furthermore, all WORM files are related to their protection cycle that is a time period during which WORM files should securely be protected. Once their protection cycle is expired, the WORM files are automatically moved to the general-purpose data section without any user interference. This prevents the WORM data section from being consumed by unnecessary files. We evaluated the performance of WOWSnap on Linux cluster.

Keywords: Data protection, Protection cycle, WORM

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621
8994 Development and Performance Evaluation of a Gladiolus Planter in Field for Planting Corms

Authors: T. P. Singh, Vijay Gautam

Abstract:

Gladiolus is an important cash crop and is grown mainly for its elegant spikes. Traditionally the gladiolus corms are planted manually which is very tedious, time consuming and labor intensive operation. So far, there is no planter available for planting of gladiolus corms. With a view to mechanize the planting operation of this horticultural crop, a prototype of 4-row gladiolus planter was developed and its performance was evaluated in-situ condition. Cupchain type metering device was used to place each single gladiolus corm in furrow at required spacing while planting. Three levels of corm spacing viz 15, 20 and 25 cm and four levels of forward speed viz 1.0, 1.5, 2.0 and 2.5 km/h was taken as evaluation parameter for the planter. The performance indicators namely corm spacing in each row, coefficient of uniformity, missing index, multiple index, quality of feed index, number of corms per meter length, mechanical damage to the corms etc. were determined during the field test. The data was statistically analyzed using Completely Randomized Design (CRD) for testing the significance of the parameters. The result indicated that planter was able to drop the corms at required nominal spacing with minor variations. The highest deviation from the mean corm spacing was observed as 3.53 cm with maximum coefficient of variation as 13.88%. The highest missing and quality of feed indexes were observed as 6.33% and 97.45% respectively with no multiples. The performance of the planter was observed better at lower forward speed and wider corm spacing. The field capacity of the planter was found as 0.103 ha/h with an observed field efficiency of 76.57%.

Keywords: Coefficient of uniformity, corm spacing, gladiolus planter, mechanization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2109
8993 Aircraft Selection Using Multiple Criteria Decision Making Analysis Method with Different Data Normalization Techniques

Authors: C. Ardil

Abstract:

This paper presents an original application of multiple criteria decision making analysis theory to the evaluation of aircraft selection problem. The selection of an optimal, efficient and reliable fleet, network and operations planning policy is one of the most important factors in aircraft selection problem. Given that decision making in aircraft selection involves the consideration of a number of opposite criteria and possible solutions, such a selection can be considered as a multiple criteria decision making analysis problem. This study presents a new integrated approach to decision making by considering the multiple criteria utility theory and the maximal regret minimization theory methods as well as aircraft technical, economical, and environmental aspects. Multiple criteria decision making analysis method uses different normalization techniques to allow criteria to be aggregated with qualitative and quantitative data of the decision problem. Therefore, selecting a suitable normalization technique for the model is also a challenge to provide data aggregation for the aircraft selection problem. To compare the impact of different normalization techniques on the decision problem, the vector, linear (sum), linear (max), and linear (max-min) data normalization techniques were identified to evaluate aircraft selection problem. As a logical implication of the proposed approach, it enhances the decision making process through enabling the decision maker to: (i) use higher level knowledge regarding the selection of criteria weights and the proposed technique, (ii) estimate the ranking of an alternative, under different data normalization techniques and integrated criteria weights after a posteriori analysis of the final rankings of alternatives. A set of commercial passenger aircraft were considered in order to illustrate the proposed approach. The obtained results of the proposed approach were compared using Spearman's rho tests. An analysis of the final rank stability with respect to the changes in criteria weights was also performed so as to assess the sensitivity of the alternative rankings obtained by the application of different data normalization techniques and the proposed approach.

Keywords: Normalization Techniques, Aircraft Selection, Multiple Criteria Decision Making, Multiple Criteria Decision Making Analysis, MCDMA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 494
8992 Certain Data Dimension Reduction Techniques for application with ANN based MCS for Study of High Energy Shower

Authors: Gitanjali Devi, Kandarpa Kumar Sarma, Pranayee Datta, Anjana Kakoti Mahanta

Abstract:

Cosmic showers, from their places of origin in space, after entering earth generate secondary particles called Extensive Air Shower (EAS). Detection and analysis of EAS and similar High Energy Particle Showers involve a plethora of experimental setups with certain constraints for which soft-computational tools like Artificial Neural Network (ANN)s can be adopted. The optimality of ANN classifiers can be enhanced further by the use of Multiple Classifier System (MCS) and certain data - dimension reduction techniques. This work describes the performance of certain data dimension reduction techniques like Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Self Organizing Map (SOM) approximators for application with an MCS formed using Multi Layer Perceptron (MLP), Recurrent Neural Network (RNN) and Probabilistic Neural Network (PNN). The data inputs are obtained from an array of detectors placed in a circular arrangement resembling a practical detector grid which have a higher dimension and greater correlation among themselves. The PCA, ICA and SOM blocks reduce the correlation and generate a form suitable for real time practical applications for prediction of primary energy and location of EAS from density values captured using detectors in a circular grid.

Keywords: EAS, Shower, Core, ANN, Location.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1565
8991 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1602
8990 A Network Traffic Prediction Algorithm Based On Data Mining Technique

Authors: D. Prangchumpol

Abstract:

This paper is a description approach to predict incoming and outgoing data rate in network system by using association rule discover, which is one of the data mining techniques. Information of incoming and outgoing data in each times and network bandwidth are network performance parameters, which needed to solve in the traffic problem. Since congestion and data loss are important network problems. The result of this technique can predicted future network traffic. In addition, this research is useful for network routing selection and network performance improvement.

Keywords: Traffic prediction, association rule, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3613
8989 Data Centers’ Temperature Profile Simulation Optimized by Finite Elements and Discretization Methods

Authors: José Alberto García Fernández, Zhimin Du, Xinqiao Jin

Abstract:

Nowadays, data center industry faces strong challenges for increasing the speed and data processing capacities while at the same time is trying to keep their devices a suitable working temperature without penalizing that capacity. Consequently, the cooling systems of this kind of facilities use a large amount of energy to dissipate the heat generated inside the servers, and developing new cooling techniques or perfecting those already existing would be a great advance in this type of industry. The installation of a temperature sensor matrix distributed in the structure of each server would provide the necessary information for collecting the required data for obtaining a temperature profile instantly inside them. However, the number of temperature probes required to obtain the temperature profiles with sufficient accuracy is very high and expensive. Therefore, other less intrusive techniques are employed where each point that characterizes the server temperature profile is obtained by solving differential equations through simulation methods, simplifying data collection techniques but increasing the time to obtain results. In order to reduce these calculation times, complicated and slow computational fluid dynamics simulations are replaced by simpler and faster finite element method simulations which solve the Burgers‘ equations by backward, forward and central discretization techniques after simplifying the energy and enthalpy conservation differential equations. The discretization methods employed for solving the first and second order derivatives of the obtained Burgers‘ equation after these simplifications are the key for obtaining results with greater or lesser accuracy regardless of the characteristic truncation error.

Keywords: Burgers’ equations, CFD simulation, data center, discretization methods, FEM simulation, temperature profile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 456
8988 Latent Topic Based Medical Data Classification

Authors: Jian-hua Yeh, Shi-yi Kuo

Abstract:

This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.

Keywords: classification, latent topics, outlier adjustment, feature scaling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608
8987 Visual-Graphical Methods for Exploring Longitudinal Data

Authors: H. W. Ker

Abstract:

Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.

Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2045
8986 Multicast Optimization Techniques using Best Effort Genetic Algorithms

Authors: Dinesh Kumar, Y. S. Brar, V. K. Banga

Abstract:

Multicast Network Technology has pervaded our lives-a few examples of the Networking Techniques and also for the improvement of various routing devices we use. As we know the Multicast Data is a technology offers many applications to the user such as high speed voice, high speed data services, which is presently dominated by the Normal networking and the cable system and digital subscriber line (DSL) technologies. Advantages of Multi cast Broadcast such as over other routing techniques. Usually QoS (Quality of Service) Guarantees are required in most of Multicast applications. The bandwidth-delay constrained optimization and we use a multi objective model and routing approach based on genetic algorithm that optimizes multiple QoS parameters simultaneously. The proposed approach is non-dominated routes and the performance with high efficiency of GA. Its betterment and high optimization has been verified. We have also introduced and correlate the result of multicast GA with the Broadband wireless to minimize the delay in the path.

Keywords: GA (genetic Algorithms), Quality of Service, MOGA, Steiner Tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510
8985 Optimizing Data Evaluation Metrics for Fraud Detection Using Machine Learning

Authors: Jennifer Leach, Umashanger Thayasivam

Abstract:

The use of technology has benefited society in more ways than one ever thought possible. Unfortunately, as society’s knowledge of technology has advanced, so has its knowledge of ways to use technology to manipulate others. This has led to a simultaneous advancement in the world of fraud. Machine learning techniques can offer a possible solution to help decrease these advancements. This research explores how the use of various machine learning techniques can aid in detecting fraudulent activity across two different types of fraudulent datasets, and the accuracy, precision, recall, and F1 were recorded for each method. Each machine learning model was also tested across five different training and testing splits in order to discover which split and technique would lead to the most optimal results.

Keywords: Data science, fraud detection, machine learning, supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 664
8984 The Effects of Applying Linguistic Principles and Teaching Techniques in Teaching English at Secondary School in Thailand

Authors: Wannakarn Likitrattanaporn

Abstract:

The ultimate purpose of this investigation was to determine the teachers’ opinions as well as students’ opinions towards the Adapted English Lessons. The subjects of the study were 5 Thai teachers, who teach English, and 85 Grade 10 mixed-ability students at Triamudom Suksa Pattanakarn Ratchada School, Bangkok, Thailand. The research instruments included questionnaires and the informal interview. The data from the research instruments was collected and analyzed concerning linguistic principles of minimal pair and articulatory phonetics as well as teaching techniques of mimicry-memorization; vocabulary substitution drills, language pattern drills, reading comprehension exercise, practicing listening, speaking and writing skill and communicative activities; informal talk and free writing. The data was statistically compiled according to an arithmetic percentage. The results showed that the teachers and students have very highly positive opinions towards adapting linguistic principles for teaching and learning phonological accuracy. Teaching techniques provided in the Adapted English Lessons can be used efficiently in the classroom. The teachers and students have positive opinions towards them too.

Keywords: Applying linguistic principles and teaching techniques, teachers’ and students’ opinions, teaching English, the Adapted English Lessons.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
8983 Electric Load Forecasting Using Genetic Based Algorithm, Optimal Filter Estimator and Least Error Squares Technique: Comparative Study

Authors: Khaled M. EL-Naggar, Khaled A. AL-Rumaih

Abstract:

This paper presents performance comparison of three estimation techniques used for peak load forecasting in power systems. The three optimum estimation techniques are, genetic algorithms (GA), least error squares (LS) and, least absolute value filtering (LAVF). The problem is formulated as an estimation problem. Different forecasting models are considered. Actual recorded data is used to perform the study. The performance of the above three optimal estimation techniques is examined. Advantages of each algorithms are reported and discussed.

Keywords: Forecasting, Least error squares, Least absolute Value, Genetic algorithms

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2667
8982 Generalized Method for Estimating Best-Fit Vertical Alignments for Profile Data

Authors: Said M. Easa, Shinya Kikuchi

Abstract:

When the profile information of an existing road is missing or not up-to-date and the parameters of the vertical alignment are needed for engineering analysis, the engineer has to recreate the geometric design features of the road alignment using collected profile data. The profile data may be collected using traditional surveying methods, global positioning systems, or digital imagery. This paper develops a method that estimates the parameters of the geometric features that best characterize the existing vertical alignments in terms of tangents and the expressions of the curve, that may be symmetrical, asymmetrical, reverse, and complex vertical curves. The method is implemented using an Excel-based optimization method that minimizes the differences between the observed profile and the profiles estimated from the equations of the vertical curve. The method uses a 'wireframe' representation of the profile that makes the proposed method applicable to all types of vertical curves. A secondary contribution of this paper is to introduce the properties of the equal-arc asymmetrical curve that has been recently developed in the highway geometric design field.

Keywords: Optimization, parameters, data, reverse, spreadsheet, vertical curves

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2396