Search results for: Aspect Mining
822 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques
Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas
Abstract:
The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.
Keywords: Artificial neural network, competitive dynamics, logistic regression, text classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 539821 Evaluation of the End Effect Impact on the Torsion Test for Determining the Shear Modulus of a Timber Beam through a Photogrammetry Approach
Authors: Niaz Gharavi, Hexin Zhang, Yanjun Xie
Abstract:
The timber beam end effect in the torsion test is evaluated using binocular stereo vision system. It is recommended by BS EN 408:2010+A1:2012 to exclude a distance of two to three times of cross-sectional thickness (b) from ends to avoid the end effect; whereas, this study indicates that this distance is not sufficiently far enough to remove this effect in slender cross-sections. The shear modulus of six timber beams with different aspect ratios is determined at the various angles and cross-sections. The result of this experiment shows that the end affected span of each specimen varies depending on their aspect ratios. It is concluded that by increasing the aspect ratio this span will increase. However, by increasing the distance from the ends to the values greater than 6b, the shear modulus trend becomes constant and end effect will be negligible. Moreover, it is concluded that end affected span is preferred to be depth-dependent rather than thickness-dependant.Keywords: End effect, structural-size torsion test, shear properties, timber engineering, binocular stereo vision.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1367820 Probabilistic Approach as a Method Used in the Solution of Engineering Design for Biomechanics and Mining
Authors: Karel Frydrýšek
Abstract:
This paper focuses on the probabilistic numerical solution of the problems in biomechanics and mining. Applications of Simulation-Based Reliability Assessment (SBRA) Method are presented in the solution of designing of the external fixators applied in traumatology and orthopaedics (these fixators can be applied for the treatment of open and unstable fractures etc.) and in the solution of a hard rock (ore) disintegration process (i.e. the bit moves into the ore and subsequently disintegrates it, the results are compared with experiments, new design of excavation tool is proposed.Keywords: probabilistic approach, engineering design, traumatology, rock mechanics
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1483819 Knowledge Discovery Techniques for Talent Forecasting in Human Resource Application
Authors: Hamidah Jantan, Abdul Razak Hamdan, Zulaiha Ali Othman
Abstract:
Human Resource (HR) applications can be used to provide fair and consistent decisions, and to improve the effectiveness of decision making processes. Besides that, among the challenge for HR professionals is to manage organization talents, especially to ensure the right person for the right job at the right time. For that reason, in this article, we attempt to describe the potential to implement one of the talent management tasks i.e. identifying existing talent by predicting their performance as one of HR application for talent management. This study suggests the potential HR system architecture for talent forecasting by using past experience knowledge known as Knowledge Discovery in Database (KDD) or Data Mining. This article consists of three main parts; the first part deals with the overview of HR applications, the prediction techniques and application, the general view of Data mining and the basic concept of talent management in HRM. The second part is to understand the use of Data Mining technique in order to solve one of the talent management tasks, and the third part is to propose the potential HR system architecture for talent forecasting.Keywords: HR Application, Knowledge Discovery inDatabase (KDD), Talent Forecasting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4485818 Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes
Authors: Ilona Mewes, Helena Jenzer, Farshideh Einsele
Abstract:
Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.
Keywords: Health informatics, data mining, nutritional and health databases, nutritional and chronical databases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737817 Attribute Selection Methods Comparison for Classification of Diffuse Large B-Cell Lymphoma
Authors: Helyane Bronoski Borges, Júlio Cesar Nievola
Abstract:
The most important subtype of non-Hodgkin-s lymphoma is the Diffuse Large B-Cell Lymphoma. Approximately 40% of the patients suffering from it respond well to therapy, whereas the remainder needs a more aggressive treatment, in order to better their chances of survival. Data Mining techniques have helped to identify the class of the lymphoma in an efficient manner. Despite that, thousands of genes should be processed to obtain the results. This paper presents a comparison of the use of various attribute selection methods aiming to reduce the number of genes to be searched, looking for a more effective procedure as a whole.Keywords: Attribute selection, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419816 Online Forums Hotspot Detection and Analysis Using Aging Theory
Authors: K. Nirmala Devi, V. Murali Bhaskaran
Abstract:
The exponential growth of social media arouses much attention on public opinion information. The online forums, blogs, micro blogs are proving to be extremely valuable resources and are having bulk volume of information. However, most of the social media data is unstructured and semi structured form. So that it is more difficult to decipher automatically. Therefore, it is very much essential to understand and analyze those data for making a right decision. The online forums hotspot detection is a promising research field in the web mining and it guides to motivate the user to take right decision in right time. The proposed system consist of a novel approach to detect a hotspot forum for any given time period. It uses aging theory to find the hot terms and E-K-means for detecting the hotspot forum. Experimental results demonstrate that the proposed approach outperforms k-means for detecting the hotspot forums with the improved accuracy.
Keywords: Hotspot forums, Micro blog, Blog, Sentiment Analysis, Opinion Mining, Social media, Twitter, Web mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2187815 Mixed Convection in a Vertical Heated Channel: Influence of the Aspect Ratio
Authors: Ameni Mokni , Hatem Mhiri , Georges Le Palec , Philippe Bournot
Abstract:
In mechanical and environmental engineering, mixed convection is a frequently encountered thermal fluid phenomenon which exists in atmospheric environment, urban canopy flows, ocean currents, gas turbines, heat exchangers, and computer chip cooling systems etc... . This paper deals with a numerical investigation of mixed convection in a vertical heated channel. This flow results from the mixing of the up-going fluid along walls of the channel with the one issued from a flat nozzle located in its entry section. The fluiddynamic and heat-transfer characteristics of vented vertical channels are investigated for constant heat-flux boundary conditions, a Rayleigh number equal to 2.57 1010, for two jet Reynolds number Re=3 103 and 2104 and the aspect ratio in the 8-20 range. The system of governing equations is solved with a finite volumes method and an implicit scheme. The obtained results show that the turbulence and the jet-wall interaction activate the heat transfer, as does the drive of ambient air by the jet. For low Reynolds number Re=3 103, the increase of the aspect Ratio enhances the heat transfer of about 3%, however; for Re=2 104, the heat transfer enhancement is of about 12%. The numerical velocity, pressure and temperature fields are post-processed to compute the quantities of engineering interest such as the induced mass flow rate, and average Nusselt number, in terms of Rayleigh, Reynolds numbers and dimensionless geometric parameters are presented.Keywords: Aspect Ratio, Channel, Jet, Mixed convection
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2180814 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Keywords: Sentiment Analysis, Social Media, Twitter, Amazon, Data Mining, Machine Learning, Text Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3521813 A Novel Approach to Improve Users Search Goal in Web Usage Mining
Authors: R. Lokeshkumar, P. Sengottuvelan
Abstract:
Web mining is to discover and extract useful Information. Different users may have different search goals when they search by giving queries and submitting it to a search engine. The inference and analysis of user search goals can be very useful for providing an experience result for a user search query. In this project, we propose a novel approach to infer user search goals by analyzing search web logs. First, we propose a novel approach to infer user search goals by analyzing search engine query logs, the feedback sessions are constructed from user click-through logs and it efficiently reflect the information needed for users. Second we propose a preprocessing technique to clean the unnecessary data’s from web log file (feedback session). Third we propose a technique to generate pseudo-documents to representation of feedback sessions for clustering. Finally we implement k-medoids clustering algorithm to discover different user search goals and to provide a more optimal result for a search query based on feedback sessions for the user.Keywords: Data Preprocessing, Session Identification, Web log mining, Web Personalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2026812 Total and Leachable Concentration of Trace Elements in Soil towards Human Health Risk, Related with Coal Mine in Jorong, South Kalimantan, Indonesia
Authors: Arie Pujiwati, Kengo Nakamura, Noriaki Watanabe, Takeshi Komai
Abstract:
Coal mining is well known to cause considerable environmental impacts, including trace element contamination of soil. This study aimed to assess the trace element (As, Cd, Co, Cu, Ni, Pb, Sb, and Zn) contamination of soil in the vicinity of coal mining activities, using the case study of Asam-asam River basin, South Kalimantan, Indonesia, and to assess the human health risk, incorporating total and bioavailable (water-leachable and acid-leachable) concentrations. The results show the enrichment of As and Co in soil, surpassing the background soil value. Contamination was evaluated based on the index of geo-accumulation, Igeo and the pollution index, PI. Igeo values showed that the soil was generally uncontaminated (Igeo ≤ 0), except for elevated As and Co. Mean PI for Ni and Cu indicated slight contamination. Regarding the assessment of health risks, the Hazard Index, HI showed adverse risks (HI > 1) for Ni, Co, and As. Further, Ni and As were found to pose unacceptable carcinogenic risk (risk > 1.10-5). Farming, settlement, and plantation were found to present greater risk than coal mines. These results show that coal mining activity in the study area contaminates the soils by particular elements and may pose potential human health risk in its surrounding area. This study is important for setting appropriate countermeasure actions and improving basic coal mining management in Indonesia.
Keywords: Coal mine, risk, soil, trace elements.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1176811 The Power of Indigenous Peoples in Decision-Making Processes of Mining Projects: The Pilbara Region
Authors: K. N. Penna, J. P. English
Abstract:
The destruction of the Juukan Gorge rock shelters in 2020 has catalysed impetus within Australian society for a significant change in engagement with Indigenous Peoples, and the approach to Indigenous cultural heritage, both within the Pilbara region and more broadly across Australia. Culture-based and people-centred approaches are inherent to inclusive sustainable development and Free, Prior, Informed Consent, outcomes encouraged by international and local recommendations on the human rights and cultural heritage preservation of Indigenous peoples. In this paper, we present an interpretive model of an evolved process for mining project development, incorporating culture-based and people-centred approaches, based on the Theory U system change method. The evolved process advocates a change in organisational mindset and culture, and a comprehensive understanding of Indigenous Peoples’ culture and values, as the foundations for increasing their influence and achieving mutually beneficial developments.
Keywords: Indigenous Engagement, mining industry, culture-based approach, people-centred approach, Theory U.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 445810 Effect of the Cross-Sectional Geometry on Heat Transfer and Particle Motion of Circulating Fluidized Bed Riser for CO2 Capture
Authors: Seungyeong Choi, Namkyu Lee, Dong Il Shim, Young Mun Lee, Yong-Ki Park, Hyung Hee Cho
Abstract:
Effect of the cross-sectional geometry on heat transfer and particle motion of circulating fluidized bed riser for CO2 capture was investigated. Numerical simulation using Eulerian-eulerian method with kinetic theory of granular flow was adopted to analyze gas-solid flow consisting in circulating fluidized bed riser. Circular, square, and rectangular cross-sectional geometry cases of the same area were carried out. Rectangular cross-sectional geometries were analyzed having aspect ratios of 1: 2, 1: 4, 1: 8, and 1:16. The cross-sectional geometry significantly influenced the particle motion and heat transfer. The downward flow pattern of solid particles near the wall was changed. The gas-solid mixing degree of the riser with the rectangular cross section of the high aspect ratio was the lowest. There were differences in bed-to-wall heat transfer coefficient according to rectangular geometry with different aspect ratios.
Keywords: Bed geometry, computational fluid dynamics, circulating fluidized bed riser, heat transfer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1334809 A Semantic Recommendation Procedure for Electronic Product Catalog
Authors: Hadi Khosravi Farsani, Mohammadali Nematbakhsh
Abstract:
To overcome the product overload of Internet shoppers, we introduce a semantic recommendation procedure which is more efficient when applied to Internet shopping malls. The suggested procedure recommends the semantic products to the customers and is originally based on Web usage mining, product classification, association rule mining, and frequently purchasing. We applied the procedure to the data set of MovieLens Company for performance evaluation, and some experimental results are provided. The experimental results have shown superior performance in terms of coverage and precision.Keywords: Personalization, Recommendation, OWL Ontology, Electronic Catalogs, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1927808 Comparative Study of Universities’ Web Structure Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
This paper is meant to analyze the ranking of University of Malaysia Terengganu, UMT’s website in the World Wide Web. There are only few researches have been done on comparing the ranking of universities’ websites so this research will be able to determine whether the existing UMT’s website is serving its purpose which is to introduce UMT to the world. The ranking is based on hub and authority values which are accordance to the structure of the website. These values are computed using two websearching algorithms, HITS and SALSA. Three other universities’ websites are used as the benchmarks which are UM, Harvard and Stanford. The result is clearly showing that more work has to be done on the existing UMT’s website where important pages according to the benchmarks, do not exist in UMT’s pages. The ranking of UMT’s website will act as a guideline for the web-developer to develop a more efficient website.Keywords: Algorithm, ranking, website, web structure mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1670807 A Testbed for the Experiments Performed in Missing Value Treatments
Authors: Dias de J. C. Lilian, Lobato M. F. Fábio, de Santana L. Ádamo
Abstract:
The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.
Keywords: Data imputation, data mining, missing values treatment, testbed.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1517806 Data Mining Techniques in Computer-Aided Diagnosis: Non-Invasive Cancer Detection
Authors: Florin Gorunescu
Abstract:
Diagnosis can be achieved by building a model of a certain organ under surveillance and comparing it with the real time physiological measurements taken from the patient. This paper deals with the presentation of the benefits of using Data Mining techniques in the computer-aided diagnosis (CAD), focusing on the cancer detection, in order to help doctors to make optimal decisions quickly and accurately. In the field of the noninvasive diagnosis techniques, the endoscopic ultrasound elastography (EUSE) is a recent elasticity imaging technique, allowing characterizing the difference between malignant and benign tumors. Digitalizing and summarizing the main EUSE sample movies features in a vector form concern with the use of the exploratory data analysis (EDA). Neural networks are then trained on the corresponding EUSE sample movies vector input in such a way that these intelligent systems are able to offer a very precise and objective diagnosis, discriminating between benign and malignant tumors. A concrete application of these Data Mining techniques illustrates the suitability and the reliability of this methodology in CAD.Keywords: Endoscopic ultrasound elastography, exploratorydata analysis, neural networks, non-invasive cancer detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871805 Finite Element Prediction of Hip Fracture during a Sideways Fall
Authors: M. Ikhwan Z. Ridzwan, Bidyut Pal, Ulrich N. Hansen
Abstract:
Finite element method was applied to model damage development in the femoral neck during a sideways fall. The femoral failure was simulated using the maximum principal strain criterion. The evolution of damage was consistent with previous studies. It was initiated by compressive failure at the junction of the superior aspect of the femoral neck and the greater trochanter. It was followed by tensile failure that occurred at the inferior aspect of the femoral neck before a complete transcervical fracture was observed. The estimated failure line was less than 50° from the horizontal plane (Pauwels type II).Keywords: Femoral Strength, Finite Element Models, Hip Fracture, Progressive Failure, Sideways Fall.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2357804 Analysis of Statistical Data on Social Resources Dimension of Occupational Status Attainment: A Rational Choice Approach
Authors: Oleg Demchenko
Abstract:
The aim of the present study is to analyze empirical researches on the social resources dimension of occupational status attainment process and relate them to the rational choice approach. The analysis suggests that the existing data on the strength of ties aspect of social resources is insufficient and does not allow any implication concerning rational actor-s behavior. However, the results concerning work relation aspect are more encouraging.Keywords: Social resources, status attainment, rational choice, weak ties, work-related ties.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1489803 Dimensional Modeling of HIV Data Using Open Source
Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer
Abstract:
Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1964802 Integrated Method for Detection of Unknown Steganographic Content
Authors: Magdalena Pejas
Abstract:
This article concerns the presentation of an integrated method for detection of steganographic content embedded by new unknown programs. The method is based on data mining and aggregated hypothesis testing. The article contains the theoretical basics used to deploy the proposed detection system and the description of improvement proposed for the basic system idea. Further main results of experiments and implementation details are collected and described. Finally example results of the tests are presented.Keywords: Steganography, steganalysis, data embedding, data mining, feature extraction, knowledge base, system learning, hypothesis testing, error estimation, black box program, file structure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568801 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.
Keywords: Classification algorithms; data mining; tourism; knowledge discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2550800 Application of Computer Aided Engineering Tools in Performance Prediction and Fault Detection of Mechanical Equipment of Mining Process Line
Abstract:
Nowadays, to decrease the number of downtimes in the industries such as metal mining, petroleum and chemical industries, predictive maintenance is crucial. In order to have efficient predictive maintenance, knowing the performance of critical equipment of production line such as pumps and hydro-cyclones under variable operating parameters, selecting best indicators of this equipment health situations, best locations for instrumentation, and also measuring of these indicators are very important. In this paper, computer aided engineering (CAE) tools are implemented to study some important elements of copper process line, namely slurry pumps and cyclone to predict the performance of these components under different working conditions. These modeling and simulations can be used in predicting, for example, the damage tolerance of the main shaft of the slurry pump or wear rate and location of cyclone wall or pump case and impeller. Also, the simulations can suggest best-measuring parameters, measuring intervals, and their locations.Keywords: Computer aided engineering, predictive maintenance, fault detection, mining process line, slurry pump, hydrocyclone.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1835799 A Review and Comparative Analysis on Cluster Ensemble Methods
Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha
Abstract:
Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 830798 Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features
Authors: Mehmet Hacibeyoglu, Ahmet Arslan, Sirzat Kahramanli
Abstract:
This study analyzes the effect of discretization on classification of datasets including continuous valued features. Six datasets from UCI which containing continuous valued features are discretized with entropy-based discretization method. The performance improvement between the dataset with original features and the dataset with discretized features is compared with k-nearest neighbors, Naive Bayes, C4.5 and CN2 data mining classification algorithms. As the result the classification accuracies of the six datasets are improved averagely by 1.71% to 12.31%.Keywords: Data mining classification algorithms, entropy-baseddiscretization method
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2465797 Cirrhosis Mortality Prediction as Classification Using Frequent Subgraph Mining
Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride
Abstract:
In this work, we use machine learning and data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. Our work applies modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.
Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 451796 The Development of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications
Authors: Mohamed R. Mhereeg
Abstract:
The paper investigates the feasibility of constructing a software multi-agent based monitoring and classification system and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. The agents function autonomously to provide continuous and periodic monitoring of excels spreadsheet workbooks. Resulting in, the development of the MultiAgent classification System (MACS) that is in compliance with the specifications of the Foundation for Intelligent Physical Agents (FIPA). However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies that are Windows Communication Foundation (WCF) services, Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW that is in order to satisfy the monitoring and classification of the multiple developer aspect. ODM was used to automate the classification phase of MACS.
Keywords: Autonomous, Classification, MACS, Multi-Agent, SOA, WCF.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1592795 A Comparative Study of GTC and PSP Algorithms for Mining Sequential Patterns Embedded in Database with Time Constraints
Authors: Safa Adi
Abstract:
This paper will consider the problem of sequential mining patterns embedded in a database by handling the time constraints as defined in the GSP algorithm (level wise algorithms). We will compare two previous approaches GTC and PSP, that resumes the general principles of GSP. Furthermore this paper will discuss PG-hybrid algorithm, that using PSP and GTC. The results show that PSP and GTC are more efficient than GSP. On the other hand, the GTC algorithm performs better than PSP. The PG-hybrid algorithm use PSP algorithm for the two first passes on the database, and GTC approach for the following scans. Experiments show that the hybrid approach is very efficient for short, frequent sequences.Keywords: Database, GTC algorithm, PSP algorithm, sequential patterns, time constraints.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703794 Fundamental Problems in the Operation of the Automotive Parts Industry Small and Medium Businesses in Bangkok and Surrounding Provinces
Authors: P. Thepnarintra
Abstract:
The purposes of this study were to: 1) investigate operation conditions of SME automotive part industry in Bangkok and vicinity and 2) to compare operation problem levels of SME automotive part industry in Bangkok and vicinity according to the sizes of the enterprises. Samples in this study included 196 entrepreneurs of SME automotive part industry in Bangkok and vicinity derived from simple random sampling and calculation from R. V. Krejcie and D. W. Morgan’s tables. Research statistics included frequency, percentage, mean, standard deviation, and T-test. The results revealed that in general the problem levels of SME automotive part industry in Bangkok and vicinity were high. When considering in details, it was found that the problem levels were high at every aspect, i.e. personal, production, export, finance, and marketing respectively. The comparison of the problem levels according to the sizes of the enterprises revealed statistically significant differences at .05. When considering on each aspect, it was found that the aspect with the statistical difference at .05 included 5 aspects, i.e. production, marketing, finance, personal, and export. The findings also showed that small enterprises faced more severe problems than those of medium enterprises.Keywords: Automotive part industry, operation problems, SME, perimeter.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597793 Response of the Residential Building Structureon Load Technical Seismicity due to Mining Activities
Authors: V. Salajka, Z. Kaláb, J. Kala, P. Hradil
Abstract:
In the territories where high-intensity earthquakes are frequent is paid attention to the solving of the seismic problems. In the paper are described two computational model variants based on finite element method of the construction with different subsoil simulation (rigid or elastic subsoil) is used. For simulation and calculations program system based on method final elements ANSYS was used. Seismic responses calculations of residential building structure were effected on loading characterized by accelerogram for comparing with the responses spectra method.Keywords: Accelerogram, ANSYS, mining induced seismic, residential building structure, spectra, subsoil.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1540