Search results for: Constraint Based Mining

11305 Analysis of Sequence Moves in Successful Chess Openings Using Data Mining with Association Rules

Authors: R.M.Rani

Abstract:

Chess is one of the indoor games, which improves the level of human confidence, concentration, planning skills and knowledge. The main objective of this paper is to help the chess players to improve their chess openings using data mining techniques. Budding Chess Players usually do practices by analyzing various existing openings. When they analyze and correlate thousands of openings it becomes tedious and complex for them. The work done in this paper is to analyze the best lines of Blackmar- Diemer Gambit(BDG) which opens with White D4... using data mining analysis. It is carried out on the collection of winning games by applying association rules. The first step of this analysis is assigning variables to each different sequence moves. In the second step, the sequence association rules were generated to calculate support and confidence factor which help us to find the best subsequence chess moves that may lead to winning position.

Keywords: Blackmar-Diemer Gambit(BDG), Confidence, sequence Association Rules, Support.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3030

11304 Developing New Algorithm and Its Application on Optimal Control of Pumps in Water Distribution Network

Authors: R. Rajabpour, N. Talebbeydokhti, M. H. Ahmadi

Abstract:

In recent years, new techniques for solving complex problems in engineering are proposed. One of these techniques is JPSO algorithm. With innovative changes in the nature of the jump algorithm JPSO, it is possible to construct a graph-based solution with a new algorithm called G-JPSO. In this paper, a new algorithm to solve the optimal control problem Fletcher-Powell and optimal control of pumps in water distribution network was evaluated. Optimal control of pumps comprise of optimum timetable operation (status on and off) for each of the pumps at the desired time interval. Maximum number of status on and off for each pumps imposed to the objective function as another constraint. To determine the optimal operation of pumps, a model-based optimization-simulation algorithm was developed based on G-JPSO and JPSO algorithms. The proposed algorithm results were compared well with the ant colony algorithm, genetic and JPSO results. This shows the robustness of proposed algorithm in finding near optimum solutions with reasonable computational cost.

Keywords: G-JPSO, operation, optimization, pumping station, water distribution networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597

11303 An Educational Data Mining System for Advising Higher Education Students

Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy

Abstract:

Educational data mining is a specific data mining field applied to data originating from educational environments, it relies on different approaches to discover hidden knowledge from the available data. Among these approaches are machine learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.

In our research, we propose a “Student Advisory Framework” that utilizes classification and clustering to build an intelligent system. This system can be used to provide pieces of consultations to a first year university student to pursue a certain education track where he/she will likely succeed in, aiming to decrease the high rate of academic failure among these students. A real case study in Cairo Higher Institute for Engineering, Computer Science and Management is presented using real dataset collected from 2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5165

11302 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: Data mining, knowledge discovery in databases, prediction models, student success.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2455

11301 Solution of Optimal Reactive Power Flow using Biogeography-Based Optimization

Authors: Aniruddha Bhattacharya, Pranab Kumar Chattopadhyay

Abstract:

Optimal reactive power flow is an optimization problem with one or more objective of minimizing the active power losses for fixed generation schedule. The control variables are generator bus voltages, transformer tap settings and reactive power output of the compensating devices placed on different bus bars. Biogeography- Based Optimization (BBO) technique has been applied to solve different kinds of optimal reactive power flow problems subject to operational constraints like power balance constraint, line flow and bus voltages limits etc. BBO searches for the global optimum mainly through two steps: Migration and Mutation. In the present work, BBO has been applied to solve the optimal reactive power flow problems on IEEE 30-bus and standard IEEE 57-bus power systems for minimization of active power loss. The superiority of the proposed method has been demonstrated. Considering the quality of the solution obtained, the proposed method seems to be a promising one for solving these problems.

Keywords: Active Power Loss, Biogeography-Based Optimization, Migration, Mutation, Optimal Reactive Power Flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4229

11300 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang

Abstract:

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: Text mining, Twitter, topic model, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736

11299 Using Textual Pre-Processing and Text Mining to Create Semantic Links

Authors: Ricardo Avila, Gabriel Lopes, Vania Vidal, Jose Macedo

Abstract:

This article offers a approach to the automatic discovery of semantic concepts and links in the domain of Oil Exploration and Production (E&P). Machine learning methods combined with textual pre-processing techniques were used to detect local patterns in texts and, thus, generate new concepts and new semantic links. Even using more specific vocabularies within the oil domain, our approach has achieved satisfactory results, suggesting that the proposal can be applied in other domains and languages, requiring only minor adjustments.

Keywords: Semantic links, data mining, linked data, SKOS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1003

11298 Application of Advanced Remote Sensing Data in Mineral Exploration in the Vicinity of Heavy Dense Forest Cover Area of Jharkhand and Odisha State Mining Area

Authors: Hemant Kumar, R. N. K. Sharma, A. P. Krishna

Abstract:

The study has been carried out on the Saranda in Jharkhand and a part of Odisha state. Geospatial data of Hyperion, a remote sensing satellite, have been used. This study has used a wide variety of patterns related to image processing to enhance and extract the mining class of Fe and Mn ores.Landsat-8, OLI sensor data have also been used to correctly explore related minerals. In this way, various processes have been applied to increase the mineralogy class and comparative evaluation with related frequency done. The Hyperion dataset for hyperspectral remote sensing has been specifically verified as an effective tool for mineral or rock information extraction within the band range of shortwave infrared used. The abundant spatial and spectral information contained in hyperspectral images enables the differentiation of different objects of any object into targeted applications for exploration such as exploration detection, mining.

Keywords: Hyperion, hyperspectral, sensor, Landsat-8.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 546

11297 Scalable Systolic Multiplier over Binary Extension Fields Based on Two-Level Karatsuba Decomposition

Authors: Chiou-Yng Lee, Wen-Yo Lee, Chieh-Tsai Wu, Cheng-Chen Yang

Abstract:

Shifted polynomial basis (SPB) is a variation of polynomial basis representation. SPB has potential for efficient bit level and digi -level implementations of multiplication over binary extension fields with subquadratic space complexity. For efficient implementation of pairing computation with large finite fields, this paper presents a new SPB multiplication algorithm based on Karatsuba schemes, and used that to derive a novel scalable multiplier architecture. Analytical results show that the proposed multiplier provides a trade-off between space and time complexities. Our proposed multiplier is modular, regular, and suitable for very large scale integration (VLSI) implementations. It involves less area complexity compared to the multipliers based on traditional decomposition methods. It is therefore, more suitable for efficient hardware implementation of pairing based cryptography and elliptic curve cryptography (ECC) in constraint driven applications.

Keywords: Digit-serial systolic multiplier, elliptic curve cryptography (ECC), Karatsuba algorithm (KA), shifted polynomial basis (SPB), pairing computation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2026

11296 Load Frequency Control of Nonlinear Interconnected Hydro-Thermal System Using Differential Evolution Technique

Authors: Banaja Mohanty, Prakash Kumar Hota

Abstract:

This paper presents a differential evolution algorithm to design a robust PI and PID controllers for Load Frequency Control (LFC) of nonlinear interconnected power systems considering the boiler dynamics, Governor Dead Band (GDB), Generation Rate Constraint (GRC). Differential evolution algorithm is employed to search for the optimal controller parameters. The proposed method easily copes of with nonlinear constraints. Further the proposed controller is simple, effective and can ensure the desirable overall system performance. The superiority of the proposed approach has been shown by comparing the results with published fuzzy logic controller for the same power systems. The comparison is done using various performance measures like overshoot, settling time and standard error criteria of frequency and tie-line power deviation following a 1% step load perturbation in hydro area. It is noticed that, the dynamic performance of proposed controller is better than fuzzy logic controller. Furthermore, it is also seen that the proposed system is robust and is not affected by change in the system parameters.

Keywords: Automatic Generation control (AGC), Generation Rate Constraint (GRC), Governor Dead Band (GDB), Differential Evolution (DE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3323

11295 Web Traffic Mining using Neural Networks

Authors: Farhad F. Yusifov

Abstract:

With the explosive growth of data available on the Internet, personalization of this information space become a necessity. At present time with the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge and information to the end users. Discovering hidden and meaningful information about Web users usage patterns is critical to determine effective marketing strategies to optimize the Web server usage for accommodating future growth. The task of mining useful information becomes more challenging when the Web traffic volume is enormous and keeps on growing. In this paper, we propose a intelligent model to discover and analyze useful knowledge from the available Web log data.

Keywords: Clustering, Self organizing map, Web log files, Web traffic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559

11294 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial neural network, competitive dynamics, logistic regression, text classification, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 468

11293 Energy Map Construction using Adaptive Alpha Grey Prediction Model in WSNs

Authors: Surender Kumar Soni, Dhirendra Pratap Singh

Abstract:

Wireless Sensor Networks can be used to monitor the physical phenomenon in such areas where human approach is nearly impossible. Hence the limited power supply is the major constraint of the WSNs due to the use of non-rechargeable batteries in sensor nodes. A lot of researches are going on to reduce the energy consumption of sensor nodes. Energy map can be used with clustering, data dissemination and routing techniques to reduce the power consumption of WSNs. Energy map can also be used to know which part of the network is going to fail in near future. In this paper, Energy map is constructed using the prediction based approach. Adaptive alpha GM(1,1) model is used as the prediction model. GM(1,1) is being used worldwide in many applications for predicting future values of time series using some past values due to its high computational efficiency and accuracy.

Keywords: Adaptive Alpha GM(1, 1) Model, Energy Map, Prediction Based Data Reduction, Wireless Sensor Networks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1762

11292 Analysis of Message Authentication in Turbo Coded Halftoned Images using Exit Charts

Authors: Andhe Dharani, P. S. Satyanarayana, Andhe Pallavi

Abstract:

Considering payload, reliability, security and operational lifetime as major constraints in transmission of images we put forward in this paper a steganographic technique implemented at the physical layer. We suggest transmission of Halftoned images (payload constraint) in wireless sensor networks to reduce the amount of transmitted data. For low power and interference limited applications Turbo codes provide suitable reliability. Ensuring security is one of the highest priorities in many sensor networks. The Turbo Code structure apart from providing forward error correction can be utilized to provide for encryption. We first consider the Halftoned image and then the method of embedding a block of data (called secret) in this Halftoned image during the turbo encoding process is presented. The small modifications required at the turbo decoder end to extract the embedded data are presented next. The implementation complexity and the degradation of the BER (bit error rate) in the Turbo based stego system are analyzed. Using some of the entropy based crypt analytic techniques we show that the strength of our Turbo based stego system approaches that found in the OTPs (one time pad).

Keywords: Halftoning, Turbo codes, security, operationallifetime, Turbo based stego system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472

11291 Knowledge Discovery Techniques for Talent Forecasting in Human Resource Application

Authors: Hamidah Jantan, Abdul Razak Hamdan, Zulaiha Ali Othman

Abstract:

Human Resource (HR) applications can be used to provide fair and consistent decisions, and to improve the effectiveness of decision making processes. Besides that, among the challenge for HR professionals is to manage organization talents, especially to ensure the right person for the right job at the right time. For that reason, in this article, we attempt to describe the potential to implement one of the talent management tasks i.e. identifying existing talent by predicting their performance as one of HR application for talent management. This study suggests the potential HR system architecture for talent forecasting by using past experience knowledge known as Knowledge Discovery in Database (KDD) or Data Mining. This article consists of three main parts; the first part deals with the overview of HR applications, the prediction techniques and application, the general view of Data mining and the basic concept of talent management in HRM. The second part is to understand the use of Data Mining technique in order to solve one of the talent management tasks, and the third part is to propose the potential HR system architecture for talent forecasting.

Keywords: HR Application, Knowledge Discovery inDatabase (KDD), Talent Forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4436

11290 WebGD: A CORBA-based Document Classification and Retrieval System on the Web

Authors: Fuyang Peng, Bo Deng, Chao Qi, Mou Zhan

Abstract:

This paper presents the design and implementation of the WebGD, a CORBA-based document classification and retrieval system on Internet. The WebGD makes use of such techniques as Web, CORBA, Java, NLP, fuzzy technique, knowledge-based processing and database technology. Unified classification and retrieval model, classifying and retrieving with one reasoning engine and flexible working mode configuration are some of its main features. The architecture of WebGD, the unified classification and retrieval model, the components of the WebGD server and the fuzzy inference engine are discussed in this paper in detail.

Keywords: Text Mining, document classification, knowledgeprocessing, fuzzy logic, Web, CORBA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1775

11289 E-Appointment Scheduling (EAS)

Authors: Noraziah Ahmad, Roslina Mohd Sidek, Mohd Affendy Omardin

Abstract:

E-Appointment Scheduling (EAS) has been developed to handle appointment for UMP students, lecturers in Faculty of Computer Systems & Software Engineering (FCSSE) and Student Medical Center. The schedules are based on the timetable and university activities. Constraints Logic Programming (CLP) has been implemented to solve the scheduling problems by giving recommendation to the users in part of determining any available slots from the lecturers and doctors- timetable. By using this system, we can avoid wasting time and cost because this application will set an appointment by auto-generated. In addition, this system can be an alternative to the lecturers and doctors to make decisions whether to approve or reject the appointments.

Keywords: EAS, Constraint Logic Programming, PHP, Apache.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4545

11288 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application

Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil

Abstract:

In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.

Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2074

11287 Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes

Authors: Ilona Mewes, Helena Jenzer, Farshideh Einsele

Abstract:

Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.

Keywords: Health informatics, data mining, nutritional and health databases, nutritional and chronical databases.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621

11286 Ride Control of Passenger Cars with Semi-active Suspension System Using a Linear Quadratic Regulator and Hybrid Optimization Algorithm

Authors: Ali Fellah Jahromi, Wen Fang Xie, Rama B. Bhat

Abstract:

A semi-active control strategy for suspension systems of passenger cars is presented employing Magnetorheological (MR) dampers. The vehicle is modeled with seven DOFs including the, roll pitch and bounce of car body, and the vertical motion of the four tires. In order to design an optimal controller based on the actuator constraints, a Linear-Quadratic Regulator (LQR) is designed. The design procedure of the LQR consists of selecting two weighting matrices to minimize the energy of the control system. This paper presents a hybrid optimization procedure which is a combination of gradient-based and evolutionary algorithms to choose the weighting matrices with regards to the actuator constraint. The optimization algorithm is defined based on maximum comfort and actuator constraints. It is noted that utilizing the present control algorithm may significantly reduce the vibration response of the passenger car, thus, providing a comfortable ride.

Keywords: Full car model, Linear Quadratic Regulator, Sequential Quadratic Programming, Genetic Algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2902

11285 Attribute Selection Methods Comparison for Classification of Diffuse Large B-Cell Lymphoma

Authors: Helyane Bronoski Borges, Júlio Cesar Nievola

Abstract:

The most important subtype of non-Hodgkin-s lymphoma is the Diffuse Large B-Cell Lymphoma. Approximately 40% of the patients suffering from it respond well to therapy, whereas the remainder needs a more aggressive treatment, in order to better their chances of survival. Data Mining techniques have helped to identify the class of the lymphoma in an efficient manner. Despite that, thousands of genes should be processed to obtain the results. This paper presents a comparison of the use of various attribute selection methods aiming to reduce the number of genes to be searched, looking for a more effective procedure as a whole.

Keywords: Attribute selection, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1370

11284 Online Forums Hotspot Detection and Analysis Using Aging Theory

Authors: K. Nirmala Devi, V. Murali Bhaskaran

Abstract:

The exponential growth of social media arouses much attention on public opinion information. The online forums, blogs, micro blogs are proving to be extremely valuable resources and are having bulk volume of information. However, most of the social media data is unstructured and semi structured form. So that it is more difficult to decipher automatically. Therefore, it is very much essential to understand and analyze those data for making a right decision. The online forums hotspot detection is a promising research field in the web mining and it guides to motivate the user to take right decision in right time. The proposed system consist of a novel approach to detect a hotspot forum for any given time period. It uses aging theory to find the hot terms and E-K-means for detecting the hotspot forum. Experimental results demonstrate that the proposed approach outperforms k-means for detecting the hotspot forums with the improved accuracy.

Keywords: Hotspot forums, Micro blog, Blog, Sentiment Analysis, Opinion Mining, Social media, Twitter, Web mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2140

11283 Information Gain Ratio Based Clustering for Investigation of Environmental Parameters Effects on Human Mental Performance

Authors: H. Mehdi, Kh. S. Karimov, A. A. Kavokin

Abstract:

Methods of clustering which were developed in the data mining theory can be successfully applied to the investigation of different kinds of dependencies between the conditions of environment and human activities. It is known, that environmental parameters such as temperature, relative humidity, atmospheric pressure and illumination have significant effects on the human mental performance. To investigate these parameters effect, data mining technique of clustering using entropy and Information Gain Ratio (IGR) K(Y/X) = (H(X)–H(Y/X))/H(Y) is used, where H(Y)=-ΣPi ln(Pi). This technique allows adjusting the boundaries of clusters. It is shown that the information gain ratio (IGR) grows monotonically and simultaneously with degree of connectivity between two variables. This approach has some preferences if compared, for example, with correlation analysis due to relatively smaller sensitivity to shape of functional dependencies. Variant of an algorithm to implement the proposed method with some analysis of above problem of environmental effects is also presented. It was shown that proposed method converges with finite number of steps.

Keywords: Clustering, Correlation analysis, EnvironmentalParameters, Information Gain Ratio, Mental Performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1785

11282 A Testbed for the Experiments Performed in Missing Value Treatments

Authors: Dias de J. C. Lilian, Lobato M. F. Fábio, de Santana L. Ádamo

Abstract:

The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.

Keywords: Data imputation, data mining, missing values treatment, testbed.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1470

11281 Data Mining Techniques in Computer-Aided Diagnosis: Non-Invasive Cancer Detection

Authors: Florin Gorunescu

Abstract:

Diagnosis can be achieved by building a model of a certain organ under surveillance and comparing it with the real time physiological measurements taken from the patient. This paper deals with the presentation of the benefits of using Data Mining techniques in the computer-aided diagnosis (CAD), focusing on the cancer detection, in order to help doctors to make optimal decisions quickly and accurately. In the field of the noninvasive diagnosis techniques, the endoscopic ultrasound elastography (EUSE) is a recent elasticity imaging technique, allowing characterizing the difference between malignant and benign tumors. Digitalizing and summarizing the main EUSE sample movies features in a vector form concern with the use of the exploratory data analysis (EDA). Neural networks are then trained on the corresponding EUSE sample movies vector input in such a way that these intelligent systems are able to offer a very precise and objective diagnosis, discriminating between benign and malignant tumors. A concrete application of these Data Mining techniques illustrates the suitability and the reliability of this methodology in CAD.

Keywords: Endoscopic ultrasound elastography, exploratorydata analysis, neural networks, non-invasive cancer detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1815

11280 Mining and Visual Management of XML-Based Image Collections

Authors: Khalil Shihab, Nida Al-Chalabi

Abstract:

This article describes Uruk, the virtual museum of Iraq that we developed for visual exploration and retrieval of image collections. The system largely exploits the loosely-structured hierarchy of XML documents that provides a useful representation method to store semi-structured or unstructured data, which does not easily fit into existing database. The system offers users the capability to mine and manage the XML-based image collections through a web-based Graphical User Interface (GUI). Typically, at an interactive session with the system, the user can browse a visual structural summary of the XML database in order to select interesting elements. Using this intermediate result, queries combining structure and textual references can be composed and presented to the system. After query evaluation, the full set of answers is presented in a visual and structured way.

Keywords: Data-centric XML, graphical user interfaces, information retrieval, case-based reasoning, fuzzy sets

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743

11279 Operational risks Classification for Information Systems with Service-Oriented Architecture (Including Loss Calculation Example)

Authors: Irina Pyrlina

Abstract:

This article presents the results of a study conducted to identify operational risks for information systems (IS) with service-oriented architecture (SOA). Analysis of current approaches to risk and system error classifications revealed that the system error classes were never used for SOA risk estimation. Additionally system error classes are not normallyexperimentally supported with realenterprise error data. Through the study several categories of various existing error classifications systems are applied and three new error categories with sub-categories are identified. As a part of operational risks a new error classification scheme is proposed for SOA applications. It is based on errors of real information systems which are service providers for application with service-oriented architecture. The proposed classification approach has been used to classify SOA system errors for two different enterprises (oil and gas industry, metal and mining industry). In addition we have conducted a research to identify possible losses from operational risks.

Keywords: Enterprise architecture, Error classification, Oil&Gas and Metal&Mining industries, Operational risks, Serviceoriented architecture

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1566

11278 An Energy Aware Dispatch Scheme WSNs

Authors: Siddhartha Chauhan, Kumar S. Pandey, Prateek Chandra

Abstract:

One of the key research issues in wireless sensor networks (WSNs) is how to efficiently deploy sensors to cover an area. In this paper, we present a Fishnet Based Dispatch Scheme (FiBDS) with energy aware mobility and interest based sensing angle. We propose two algorithms, one is FiBDS centralized algorithm and another is FiBDS distributed algorithm. The centralized algorithm is designed specifically for the non-time critical applications, commonly known as non real-time applications while the distributed algorithm is designed specifically for the time critical applications, commonly known as real-time applications. The proposed dispatch scheme works in a phase-selection manner. In this in each phase a specific constraint is dealt with according to the specified priority and then moved onto the next phase and at the end of each only the best suited nodes for the phase are chosen. Simulation results are presented to verify their effectiveness.

Keywords: Dispatch Scheme, Energy Aware Mobility, Interest based Sensing, Wireless Sensor Networks (WSNs).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577

11277 Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.

Keywords: cluster boundaries, clustering, code vectors, data mining, particle swarm optimization, self-organizing maps, U-matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871

11276 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: Data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2493