Search results for: subgraph mining
378 A Novel Approach to Improve Users Search Goal in Web Usage Mining
Authors: R. Lokeshkumar, P. Sengottuvelan
Abstract:
Web mining is to discover and extract useful Information. Different users may have different search goals when they search by giving queries and submitting it to a search engine. The inference and analysis of user search goals can be very useful for providing an experience result for a user search query. In this project, we propose a novel approach to infer user search goals by analyzing search web logs. First, we propose a novel approach to infer user search goals by analyzing search engine query logs, the feedback sessions are constructed from user click-through logs and it efficiently reflect the information needed for users. Second we propose a preprocessing technique to clean the unnecessary data’s from web log file (feedback session). Third we propose a technique to generate pseudo-documents to representation of feedback sessions for clustering. Finally we implement k-medoids clustering algorithm to discover different user search goals and to provide a more optimal result for a search query based on feedback sessions for the user.Keywords: Data Preprocessing, Session Identification, Web log mining, Web Personalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2023377 Total and Leachable Concentration of Trace Elements in Soil towards Human Health Risk, Related with Coal Mine in Jorong, South Kalimantan, Indonesia
Authors: Arie Pujiwati, Kengo Nakamura, Noriaki Watanabe, Takeshi Komai
Abstract:
Coal mining is well known to cause considerable environmental impacts, including trace element contamination of soil. This study aimed to assess the trace element (As, Cd, Co, Cu, Ni, Pb, Sb, and Zn) contamination of soil in the vicinity of coal mining activities, using the case study of Asam-asam River basin, South Kalimantan, Indonesia, and to assess the human health risk, incorporating total and bioavailable (water-leachable and acid-leachable) concentrations. The results show the enrichment of As and Co in soil, surpassing the background soil value. Contamination was evaluated based on the index of geo-accumulation, Igeo and the pollution index, PI. Igeo values showed that the soil was generally uncontaminated (Igeo ≤ 0), except for elevated As and Co. Mean PI for Ni and Cu indicated slight contamination. Regarding the assessment of health risks, the Hazard Index, HI showed adverse risks (HI > 1) for Ni, Co, and As. Further, Ni and As were found to pose unacceptable carcinogenic risk (risk > 1.10-5). Farming, settlement, and plantation were found to present greater risk than coal mines. These results show that coal mining activity in the study area contaminates the soils by particular elements and may pose potential human health risk in its surrounding area. This study is important for setting appropriate countermeasure actions and improving basic coal mining management in Indonesia.
Keywords: Coal mine, risk, soil, trace elements.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1175376 The Power of Indigenous Peoples in Decision-Making Processes of Mining Projects: The Pilbara Region
Authors: K. N. Penna, J. P. English
Abstract:
The destruction of the Juukan Gorge rock shelters in 2020 has catalysed impetus within Australian society for a significant change in engagement with Indigenous Peoples, and the approach to Indigenous cultural heritage, both within the Pilbara region and more broadly across Australia. Culture-based and people-centred approaches are inherent to inclusive sustainable development and Free, Prior, Informed Consent, outcomes encouraged by international and local recommendations on the human rights and cultural heritage preservation of Indigenous peoples. In this paper, we present an interpretive model of an evolved process for mining project development, incorporating culture-based and people-centred approaches, based on the Theory U system change method. The evolved process advocates a change in organisational mindset and culture, and a comprehensive understanding of Indigenous Peoples’ culture and values, as the foundations for increasing their influence and achieving mutually beneficial developments.
Keywords: Indigenous Engagement, mining industry, culture-based approach, people-centred approach, Theory U.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 439375 A Semantic Recommendation Procedure for Electronic Product Catalog
Authors: Hadi Khosravi Farsani, Mohammadali Nematbakhsh
Abstract:
To overcome the product overload of Internet shoppers, we introduce a semantic recommendation procedure which is more efficient when applied to Internet shopping malls. The suggested procedure recommends the semantic products to the customers and is originally based on Web usage mining, product classification, association rule mining, and frequently purchasing. We applied the procedure to the data set of MovieLens Company for performance evaluation, and some experimental results are provided. The experimental results have shown superior performance in terms of coverage and precision.Keywords: Personalization, Recommendation, OWL Ontology, Electronic Catalogs, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925374 Comparative Study of Universities’ Web Structure Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
This paper is meant to analyze the ranking of University of Malaysia Terengganu, UMT’s website in the World Wide Web. There are only few researches have been done on comparing the ranking of universities’ websites so this research will be able to determine whether the existing UMT’s website is serving its purpose which is to introduce UMT to the world. The ranking is based on hub and authority values which are accordance to the structure of the website. These values are computed using two websearching algorithms, HITS and SALSA. Three other universities’ websites are used as the benchmarks which are UM, Harvard and Stanford. The result is clearly showing that more work has to be done on the existing UMT’s website where important pages according to the benchmarks, do not exist in UMT’s pages. The ranking of UMT’s website will act as a guideline for the web-developer to develop a more efficient website.Keywords: Algorithm, ranking, website, web structure mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1667373 A Testbed for the Experiments Performed in Missing Value Treatments
Authors: Dias de J. C. Lilian, Lobato M. F. Fábio, de Santana L. Ádamo
Abstract:
The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.
Keywords: Data imputation, data mining, missing values treatment, testbed.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513372 Data Mining Techniques in Computer-Aided Diagnosis: Non-Invasive Cancer Detection
Authors: Florin Gorunescu
Abstract:
Diagnosis can be achieved by building a model of a certain organ under surveillance and comparing it with the real time physiological measurements taken from the patient. This paper deals with the presentation of the benefits of using Data Mining techniques in the computer-aided diagnosis (CAD), focusing on the cancer detection, in order to help doctors to make optimal decisions quickly and accurately. In the field of the noninvasive diagnosis techniques, the endoscopic ultrasound elastography (EUSE) is a recent elasticity imaging technique, allowing characterizing the difference between malignant and benign tumors. Digitalizing and summarizing the main EUSE sample movies features in a vector form concern with the use of the exploratory data analysis (EDA). Neural networks are then trained on the corresponding EUSE sample movies vector input in such a way that these intelligent systems are able to offer a very precise and objective diagnosis, discriminating between benign and malignant tumors. A concrete application of these Data Mining techniques illustrates the suitability and the reliability of this methodology in CAD.Keywords: Endoscopic ultrasound elastography, exploratorydata analysis, neural networks, non-invasive cancer detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1867371 Dimensional Modeling of HIV Data Using Open Source
Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer
Abstract:
Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1959370 Integrated Method for Detection of Unknown Steganographic Content
Authors: Magdalena Pejas
Abstract:
This article concerns the presentation of an integrated method for detection of steganographic content embedded by new unknown programs. The method is based on data mining and aggregated hypothesis testing. The article contains the theoretical basics used to deploy the proposed detection system and the description of improvement proposed for the basic system idea. Further main results of experiments and implementation details are collected and described. Finally example results of the tests are presented.Keywords: Steganography, steganalysis, data embedding, data mining, feature extraction, knowledge base, system learning, hypothesis testing, error estimation, black box program, file structure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564369 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.
Keywords: Classification algorithms; data mining; tourism; knowledge discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2547368 Application of Computer Aided Engineering Tools in Performance Prediction and Fault Detection of Mechanical Equipment of Mining Process Line
Abstract:
Nowadays, to decrease the number of downtimes in the industries such as metal mining, petroleum and chemical industries, predictive maintenance is crucial. In order to have efficient predictive maintenance, knowing the performance of critical equipment of production line such as pumps and hydro-cyclones under variable operating parameters, selecting best indicators of this equipment health situations, best locations for instrumentation, and also measuring of these indicators are very important. In this paper, computer aided engineering (CAE) tools are implemented to study some important elements of copper process line, namely slurry pumps and cyclone to predict the performance of these components under different working conditions. These modeling and simulations can be used in predicting, for example, the damage tolerance of the main shaft of the slurry pump or wear rate and location of cyclone wall or pump case and impeller. Also, the simulations can suggest best-measuring parameters, measuring intervals, and their locations.Keywords: Computer aided engineering, predictive maintenance, fault detection, mining process line, slurry pump, hydrocyclone.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1834367 A Review and Comparative Analysis on Cluster Ensemble Methods
Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha
Abstract:
Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 822366 Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features
Authors: Mehmet Hacibeyoglu, Ahmet Arslan, Sirzat Kahramanli
Abstract:
This study analyzes the effect of discretization on classification of datasets including continuous valued features. Six datasets from UCI which containing continuous valued features are discretized with entropy-based discretization method. The performance improvement between the dataset with original features and the dataset with discretized features is compared with k-nearest neighbors, Naive Bayes, C4.5 and CN2 data mining classification algorithms. As the result the classification accuracies of the six datasets are improved averagely by 1.71% to 12.31%.Keywords: Data mining classification algorithms, entropy-baseddiscretization method
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2461365 A Comparative Study of GTC and PSP Algorithms for Mining Sequential Patterns Embedded in Database with Time Constraints
Authors: Safa Adi
Abstract:
This paper will consider the problem of sequential mining patterns embedded in a database by handling the time constraints as defined in the GSP algorithm (level wise algorithms). We will compare two previous approaches GTC and PSP, that resumes the general principles of GSP. Furthermore this paper will discuss PG-hybrid algorithm, that using PSP and GTC. The results show that PSP and GTC are more efficient than GSP. On the other hand, the GTC algorithm performs better than PSP. The PG-hybrid algorithm use PSP algorithm for the two first passes on the database, and GTC approach for the following scans. Experiments show that the hybrid approach is very efficient for short, frequent sequences.Keywords: Database, GTC algorithm, PSP algorithm, sequential patterns, time constraints.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 699364 Response of the Residential Building Structureon Load Technical Seismicity due to Mining Activities
Authors: V. Salajka, Z. Kaláb, J. Kala, P. Hradil
Abstract:
In the territories where high-intensity earthquakes are frequent is paid attention to the solving of the seismic problems. In the paper are described two computational model variants based on finite element method of the construction with different subsoil simulation (rigid or elastic subsoil) is used. For simulation and calculations program system based on method final elements ANSYS was used. Seismic responses calculations of residential building structure were effected on loading characterized by accelerogram for comparing with the responses spectra method.Keywords: Accelerogram, ANSYS, mining induced seismic, residential building structure, spectra, subsoil.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538363 Application of Granular Computing Paradigm in Knowledge Induction
Authors: Iftikhar U. Sikder
Abstract:
This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.
Keywords: Concept approximation, granular computing, reducts, rough set theory, rule induction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 834362 A Decision Support System for Predicting Hospitalization of Hemodialysis Patients
Authors: Jinn-Yi Yeh, Tai-Hsi Wu
Abstract:
Hemodialysis patients might suffer from unhealthy care behaviors or long-term dialysis treatments. Ultimately they need to be hospitalized. If the hospitalization rate of a hemodialysis center is high, its quality of service would be low. Therefore, how to decrease hospitalization rate is a crucial problem for health care. In this study we combined temporal abstraction with data mining techniques for analyzing the dialysis patients' biochemical data to develop a decision support system. The mined temporal patterns are helpful for clinicians to predict hospitalization of hemodialysis patients and to suggest them some treatments immediately to avoid hospitalization.Keywords: Hemodialysis, Temporal abstract, Data mining, Healthcare quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730361 A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms
Authors: S. Nandagopalan, N. Pradeep
Abstract:
The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.Keywords: Active Contour, Bayesian, Echocardiographic image, Feature vector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714360 Field Trial of Resin-Based Composite Materials for the Treatment of Surface Collapses Associated with Former Shallow Coal Mining
Authors: Philip T. Broughton, Mark P. Bettney, Isla L. Smail
Abstract:
Effective treatment of ground instability is essential when managing the impacts associated with historic mining. A field trial was undertaken by the Coal Authority to investigate the geotechnical performance and potential use of composite materials comprising resin and fill or stone to safely treat surface collapses, such as crown-holes, associated with shallow mining. Test pits were loosely filled with various granular fill materials. The fill material was injected with commercially available silicate and polyurethane resin foam products. In situ and laboratory testing was undertaken to assess the geotechnical properties of the resultant composite materials. The test pits were subsequently excavated to assess resin permeation. Drilling and resin injection was easiest through clean limestone fill materials. Recycled building waste fill material proved difficult to inject with resin; this material is thus considered unsuitable for use in resin composites. Incomplete resin permeation in several of the test pits created irregular ‘blocks’ of composite. Injected resin foams significantly improve the stiffness and resistance (strength) of the un-compacted fill material. The stiffness of the treated fill material appears to be a function of the stone particle size, its associated compaction characteristics (under loose tipping) and the proportion of resin foam matrix. The type of fill material is more critical than the type of resin to the geotechnical properties of the composite materials. Resin composites can effectively support typical design imposed loads. Compared to other traditional treatment options, such as cement grouting, the use of resin composites is potentially less disruptive, particularly for sites with limited access, and thus likely to achieve significant reinstatement cost savings. The use of resin composites is considered a suitable option for the future treatment of shallow mining collapses.
Keywords: Composite material, ground improvement, mining legacy, resin.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1541359 An Innovation of Travel Information Gathering Framework
Authors: Pairaya J., Buddhagarn R., Sukree S., Punthumadee K.
Abstract:
Application of Information Technology (IT) has revolutionized the functioning of business all over the world. Its impact has been felt mostly among the information of dependent industries. Tourism is one of such industry. The conceptual framework in this study represents an innovation of travel information searching system on mobile devices which is used as tools to deliver travel information (such as hotels, restaurants, tourist attractions and souvenir shops) for each user by travelers segmentation based on data mining technique to segment the tourists- behavior patterns then match them with tourism products and services. This system innovation is designed to be a knowledge incremental learning. It is a marketing strategy to support business to respond traveler-s demand effectively.Keywords: Tourism, Innovation, Information Searching, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870358 Opinion Mining Framework in the Education Domain
Authors: A. M. H. Elyasir, K. S. M. Anbananthen
Abstract:
The internet is growing larger and becoming the most popular platform for the people to share their opinion in different interests. We choose the education domain specifically comparing some Malaysian universities against each other. This comparison produces benchmark based on different criteria shared by the online users in various online resources including Twitter, Facebook and web pages. The comparison is accomplished using opinion mining framework to extract, process the unstructured text and classify the result to positive, negative or neutral (polarity). Hence, we divide our framework to three main stages; opinion collection (extraction), unstructured text processing and polarity classification. The extraction stage includes web crawling, HTML parsing, Sentence segmentation for punctuation classification, Part of Speech (POS) tagging, the second stage processes the unstructured text with stemming and stop words removal and finally prepare the raw text for classification using Named Entity Recognition (NER). Last phase is to classify the polarity and present overall result for the comparison among the Malaysian universities. The final result is useful for those who are interested to study in Malaysia, in which our final output declares clear winners based on the public opinions all over the web.
Keywords: Entity Recognition, Education Domain, Opinion Mining, Unstructured Text.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2965357 Secure Multiparty Computations for Privacy Preserving Classifiers
Authors: M. Sumana, K. S. Hareesha
Abstract:
Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.Keywords: Homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 920356 A Note on Metallurgy at Khanak: An Indus Site in Tosham Mining Area, Haryana
Authors: Ravindra N. Singh, Dheerendra P. Singh
Abstract:
Recent discoveries of Bronze Age artefacts, tin slag, furnaces and crucibles, together with new geological evidence on tin deposits in Tosham area of Bhiwani district in Haryana (India) provide the opportunity to survey the evidence for possible sources of tin and the use of bronze in the Harappan sites of north western India. Earlier, Afghanistan emerged as the most promising eastern source of tin utilized by Indus Civilization copper-smiths. Our excavations conducted at Khanak near Tosham mining area during 2014 and 2016 revealed ample evidence of metallurgical activities as attested by the occurrence of slag, ores and evidences of ashes and fragments of furnaces in addition to the bronze objects. We have conducted petrological, XRD, EDAX, TEM, SEM and metallography on the slag, ores, crucible fragments and bronze objects samples recovered from Khanak excavations. This has given positive indication of mining and metallurgy of poly-mettalic Tin at the site; however, it can only be ascertained after the detailed scientific examination of the materials which is underway. In view of the importance of site, we intend to excavate the site horizontally in future so as to obtain more samples for scientific studies.
Keywords: Archaeometallurgy, problem of tin, metallography, Indus civilization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2013355 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining
Authors: Hina Kausher, Sangita Srivastava
Abstract:
In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which cover the variety of figure proportions in both height and girth. 3,000 data have been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from the some states of India to produce the sizing system suitable for clothing manufacture and retailing. The data are used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from the large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.Keywords: Anthropometric data, data mining, decision tree, garments manufacturing, ready-made garments, sizing systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 963354 Using the Combined Model of PROMETHEE and Fuzzy Analytic Network Process for Determining Question Weights in Scientific Exams through Data Mining Approach
Authors: Hassan Haleh, Amin Ghaffari, Parisa Farahpour
Abstract:
Need for an appropriate system of evaluating students- educational developments is a key problem to achieve the predefined educational goals. Intensity of the related papers in the last years; that tries to proof or disproof the necessity and adequacy of the students assessment; is the corroborator of this matter. Some of these studies tried to increase the precision of determining question weights in scientific examinations. But in all of them there has been an attempt to adjust the initial question weights while the accuracy and precision of those initial question weights are still under question. Thus In order to increase the precision of the assessment process of students- educational development, the present study tries to propose a new method for determining the initial question weights by considering the factors of questions like: difficulty, importance and complexity; and implementing a combined method of PROMETHEE and fuzzy analytic network process using a data mining approach to improve the model-s inputs. The result of the implemented case study proves the development of performance and precision of the proposed model.Keywords: Assessing students, Analytic network process, Clustering, Data mining, Fuzzy sets, Multi-criteria decision making, and Preference function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582353 BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis
Authors: Mohamed A. Mahfouz, M. A. Ismail
Abstract:
Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1964352 Towards Clustering of Web-based Document Structures
Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf
Abstract:
Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1507351 Using Data Mining Techniques for Estimating Minimum, Maximum and Average Daily Temperature Values
Authors: S. Kotsiantis, A. Kostoulas, S. Lykoudis, A. Argiriou, K. Menagias
Abstract:
Estimates of temperature values at a specific time of day, from daytime and daily profiles, are needed for a number of environmental, ecological, agricultural and technical applications, ranging from natural hazards assessments, crop growth forecasting to design of solar energy systems. The scope of this research is to investigate the efficiency of data mining techniques in estimating minimum, maximum and mean temperature values. For this reason, a number of experiments have been conducted with well-known regression algorithms using temperature data from the city of Patras in Greece. The performance of these algorithms has been evaluated using standard statistical indicators, such as Correlation Coefficient, Root Mean Squared Error, etc.
Keywords: regression algorithms, supervised machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3418350 Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System
Authors: A. Gruzdz, A. Ihnatowicz, J. Siddiqi, B. Akhgar
Abstract:
MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.Keywords: Bioinformatics, gene expression, ontology, selforganizingmaps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1974349 Generating Concept Trees from Dynamic Self-organizing Map
Authors: Norashikin Ahmad, Damminda Alahakoon
Abstract:
Self-organizing map (SOM) provides both clustering and visualization capabilities in mining data. Dynamic self-organizing maps such as Growing Self-organizing Map (GSOM) has been developed to overcome the problem of fixed structure in SOM to enable better representation of the discovered patterns. However, in mining large datasets or historical data the hierarchical structure of the data is also useful to view the cluster formation at different levels of abstraction. In this paper, we present a technique to generate concept trees from the GSOM. The formation of tree from different spread factor values of GSOM is also investigated and the quality of the trees analyzed. The results show that concept trees can be generated from GSOM, thus, eliminating the need for re-clustering of the data from scratch to obtain a hierarchical view of the data under study.
Keywords: dynamic self-organizing map, concept formation, clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1460