Search results for: mining tailings
102 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel
Authors: Tatjana Eitrich, Bruno Lang
Abstract:
This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443101 Product Features Extraction from Opinions According to Time
Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou
Abstract:
Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.
Keywords: Opinion mining, product feature extraction, sentiment analysis, SentiWordNet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1300100 Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers
Authors: Emel Kızılkaya Aydogan, Mihrimah Ozmen, Yılmaz Delice
Abstract:
In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.
Keywords: Churn prediction, data mining, decision-theoretic rough set, feature selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 176399 Artificial Intelligence Techniques Applications for Power Disturbances Classification
Authors: K.Manimala, Dr.K.Selvi, R.Ahila
Abstract:
Artificial Intelligence (AI) methods are increasingly being used for problem solving. This paper concerns using AI-type learning machines for power quality problem, which is a problem of general interest to power system to provide quality power to all appliances. Electrical power of good quality is essential for proper operation of electronic equipments such as computers and PLCs. Malfunction of such equipment may lead to loss of production or disruption of critical services resulting in huge financial and other losses. It is therefore necessary that critical loads be supplied with electricity of acceptable quality. Recognition of the presence of any disturbance and classifying any existing disturbance into a particular type is the first step in combating the problem. In this work two classes of AI methods for Power quality data mining are studied: Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). We show that SVMs are superior to ANNs in two critical respects: SVMs train and run an order of magnitude faster; and SVMs give higher classification accuracy.
Keywords: back propagation network, power quality, probabilistic neural network, radial basis function support vector machine
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 155698 Phytoremediation Potential of Native Plants Growing on a Heavy Metals Contaminated Soil of Copper mine in Iran
Authors: B. Lorestani, M. Cheraghi, N. Yousefi
Abstract:
A research project dealing with the phytoremediation of a soil polluted by some heavy metals is currently running. The case study is represented by a mining area in Hamedan province in the central west part of Iran. The potential of phytoextraction and phytostabilization of plants was evaluated considering the concentration of heavy metals in the plant tissues and also the bioconcentration factor (BCF) and the translocation factor (TF). Also the several established criteria were applied to define hyperaccumulator plants in the studied area. Results showed that none of the collected plant species were suitable for phytoextraction of Cu, Zn, Fe and Mn, but among the plants, Euphorbia macroclada was the most efficient in phytostabilization of Cu and Fe, while, Ziziphora clinopodioides, Cousinia sp. and Chenopodium botrys were the most suitable for phytostabilization of Zn and Chondrila juncea and Stipa barbata had the potential for phytostabilization of Mn. Using the most common criterion, Euphorbia macroclada and Verbascum speciosum were Fe hyperaccumulator plants. Present study showed that native plant species growing on contaminated sites may have the potential for phytoremediation.Keywords: Bioconcentration factor, Heavy metals, Hyperaccumulator, Phytoremediation, Translocation factor
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 395197 A Novel In-Place Sorting Algorithm with O(n log z) Comparisons and O(n log z) Moves
Authors: Hanan Ahmed-Hosni Mahmoud, Nadia Al-Ghreimil
Abstract:
In-place sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel in-place sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other in-place sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm.
Keywords: Auxiliary storage sorting, in-place sorting, sorting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 190996 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering
Authors: Yogita, Durga Toshniwal
Abstract:
Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.
Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 263795 The Implementation of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications
Authors: Mohamed R. Mhereeg
Abstract:
The paper discusses the implementation of the MultiAgent classification System (MACS) and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies, which are the .NET widows service based agents, the Windows Communication Foundation (WCF) services, the Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW. The Monitoring Agents (MAs) were configured to execute automatically to monitor excel spreadsheets development activities by content. Data gathered by the Monitoring Agents from various resources over a period of time was collected and filtered by a Database Updater Agent (DUA) residing in the .NET client application of the system. This agent then transfers and stores the data in Oracle server database via Oracle stored procedures for further processing that leads to the classification of the end user developers.
Keywords: MACS, Implementation, Multi-Agent, SOA, Autonomous, WCF.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 170994 Thai Perception on Litecoin Value
Authors: Toby Gibbs, Suwaree Yordchim
Abstract:
This research analyzes factors affecting the success of Litecoin Value within Thailand and develops a guideline for selfreliance for effective business implementation. Samples in this study included 119 people through surveys. The results revealed four main factors affecting the success as follows: 1) Future Career training should be pursued in applied Litecoin development. 2) Didn't grasp the concept of a digital currency or see the benefit of a digital currency. 3) There is a great need to educate the next generation of learners on the benefits of Litecoin within the community. 4) A great majority didn't know what Litecoin was. The guideline for self-reliance planning consisted of 4 aspects: 1) Development planning: by arranging meet up groups to conduct further education on Litecoin and share solutions on adoption into every day usage. Local communities need to develop awareness of the usefulness of Litecoin and share the value of Litecoin among friends and family. 2) Computer Science and Business Management staff should develop skills to expand on the benefits of Litecoin within their departments. 3) Further research should be pursued on how Litecoin Value can improve business and tourism within Thailand. 4) Local communities should focus on developing Litecoin awareness by encouraging street vendors to accept Litecoin as another form of payment for services rendered.
Keywords: Litecoin, Mining, Confirmations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 279393 Evaluation of Graph-based Analysis for Forest Fire Detections
Authors: Young Gi Byun, Yong Huh, Kiyun Yu, Yong Il Kim
Abstract:
Spatial outliers in remotely sensed imageries represent observed quantities showing unusual values compared to their neighbor pixel values. There have been various methods to detect the spatial outliers based on spatial autocorrelations in statistics and data mining. These methods may be applied in detecting forest fire pixels in the MODIS imageries from NASA-s AQUA satellite. This is because the forest fire detection can be referred to as finding spatial outliers using spatial variation of brightness temperature. This point is what distinguishes our approach from the traditional fire detection methods. In this paper, we propose a graph-based forest fire detection algorithm which is based on spatial outlier detection methods, and test the proposed algorithm to evaluate its applicability. For this the ordinary scatter plot and Moran-s scatter plot were used. In order to evaluate the proposed algorithm, the results were compared with the MODIS fire product provided by the NASA MODIS Science Team, which showed the possibility of the proposed algorithm in detecting the fire pixels.Keywords: Spatial Outlier Detection, MODIS, Forest Fire
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 222692 Plant Varieties Selection System
Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh
Abstract:
In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.
Keywords: Plant varieties selection system, decision tree, expert recommendation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 179391 Managing the Baltic Sea Region Resilience: Prevention, Treatment Actions and Circular Economy
Authors: J. Burlakovs, Y. Jani, L. Grinberga, M. Kriipsalu, O. Anne, I. Grinfelde, W. Hogland
Abstract:
The worldwide future sustainable economies are oriented towards the sea: the maritime economy is becoming one of the strongest driving forces in many regions as population growth is the highest in coastal areas. For hundreds of years sea resources were depleted unsustainably by fishing, mining, transportation, tourism, and waste. European Sustainable Development Strategy is identifying and developing actions to enable the EU to achieve a continuous, long-term improvement of the quality of life through the creation of sustainable communities. The aim of this paper is to provide insight in Baltic Sea Region case studies on implemented actions on tourism industry waste and beach wrack management in coastal areas, hazardous contaminants and plastic flow treatment from waste, wastewaters and stormwaters. These projects mentioned in study promote successful prevention of contaminant flows to the sea environments and provide perspectives for creation of valuable new products from residuals for future circular economy are the step forward to green innovation winning streak.
Keywords: Resilience, hazardous waste, phytoremediation, water management, circular economy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 90390 A Recognition Method for Spatio-Temporal Background in Korean Historical Novels
Authors: Seo-Hee Kim, Kee-Won Kim, Seung-Hoon Kim
Abstract:
The most important elements of a novel are the characters, events and background. The background represents the time, place and situation that character appears, and conveys event and atmosphere more realistically. If readers have the proper knowledge about background of novels, it may be helpful for understanding the atmosphere of a novel and choosing a novel that readers want to read. In this paper, we are targeting Korean historical novels because spatio-temporal background especially performs an important role in historical novels among the genre of Korean novels. To the best of our knowledge, we could not find previous study that was aimed at Korean novels. In this paper, we build a Korean historical national dictionary. Our dictionary has historical places and temple names of kings over many generations as well as currently existing spatial words or temporal words in Korean history. We also present a method for recognizing spatio-temporal background based on patterns of phrasal words in Korean sentences. Our rules utilize postposition for spatial background recognition and temple names for temporal background recognition. The knowledge of the recognized background can help readers to understand the flow of events and atmosphere, and can use to visualize the elements of novels.
Keywords: Data mining, Korean historical novels, Korean linguistic feature, spatio-temporal background.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 112389 The Application of Distributed Optical Strain Sensing to Measure Rock Bolt Deformation Subject to Bedding Shear
Authors: Thomas P. Roper, Brad Forbes, Jurij Karlovšek
Abstract:
Shear displacement along bedding defects is a well-recognised behaviour when tunnelling and mining in stratified rock. This deformation can affect the durability and integrity of installed rock bolts. In-situ monitoring of rock bolt deformation under bedding shear cannot be accurately derived from traditional strain gauge bolts as sensors are too large and spaced too far apart to accurately assess concentrated displacement along discrete defects. A possible solution to this is the use of fiber optic technologies developed for precision monitoring. Distributed Optic Sensor (DOS) embedded rock bolts were installed in a tunnel project with the aim of measuring the bolt deformation profile under significant shear displacements. This technology successfully measured the 3D strain distribution along the bolts when subjected to bedding shear and resolved the axial and lateral strain constituents in order to determine the deformational geometry of the bolts. The results are compared well with the current visual method for monitoring shear displacement using borescope holes, considering this method as suitable.
Keywords: Distributed optical strain sensing, geotechnical monitoring, rock bolt stain measurement, bedding shear displacement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 93388 Indigenous Engagement: Towards a Culturally Sensitive Approach for Inclusive Economic Development
Authors: K. N. Penna, E. J. Hoffman, T. R. Carter
Abstract:
This paper suggests that cultural landscape management plans in an Indigenous context are more effective if designed by taking into consideration context-related social and cultural aspects, adopting people-centred and cultural-based approaches for instance. In relation to working in Indigenous and mining contexts, we draw upon and contribute to international policies on human rights that promote the development of management plans that are co-designed through genuine engagement processes. We suggest that the production of management plans that are built upon culturally relevant frameworks leads to more inclusive economic development, a greater sense of trust, and shared managerial responsibilities. In this paper, three issues related to Indigenous engagement and cultural landscape management plans will be addressed: (1) the need for effective communication channels between proponents and Traditional Owners (Australian original Aboriginal peoples who inhabited specific regions), (2) the use of a culturally sensitive approach to engage local representatives in the decision-making processes, and (3) how design of new management plans can help in establishing shared management.
Keywords: Culture-Centred Approach, Holons’ Hierarchy, Inclusive Economic Development, Indigenous Engagement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 47887 Optimizing Spatial Trend Detection By Artificial Immune Systems
Authors: M. Derakhshanfar, B. Minaei-Bidgoli
Abstract:
Spatial trends are one of the valuable patterns in geo databases. They play an important role in data analysis and knowledge discovery from spatial data. A spatial trend is a regular change of one or more non spatial attributes when spatially moving away from a start object. Spatial trend detection is a graph search problem therefore heuristic methods can be good solution. Artificial immune system (AIS) is a special method for searching and optimizing. AIS is a novel evolutionary paradigm inspired by the biological immune system. The models based on immune system principles, such as the clonal selection theory, the immune network model or the negative selection algorithm, have been finding increasing applications in fields of science and engineering. In this paper, we develop a novel immunological algorithm based on clonal selection algorithm (CSA) for spatial trend detection. We are created neighborhood graph and neighborhood path, then select spatial trends that their affinity is high for antibody. In an evolutionary process with artificial immune algorithm, affinity of low trends is increased with mutation until stop condition is satisfied.Keywords: Spatial Data Mining, Spatial Trend Detection, Heuristic Methods, Artificial Immune System, Clonal Selection Algorithm (CSA)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 204686 Classifying Biomedical Text Abstracts based on Hierarchical 'Concept' Structure
Authors: Rozilawati Binti Dollah, Masaki Aono
Abstract:
Classifying biomedical literature is a difficult and challenging task, especially when a large number of biomedical articles should be organized into a hierarchical structure. In this paper, we present an approach for classifying a collection of biomedical text abstracts downloaded from Medline database with the help of ontology alignment. To accomplish our goal, we construct two types of hierarchies, the OHSUMED disease hierarchy and the Medline abstract disease hierarchies from the OHSUMED dataset and the Medline abstracts, respectively. Then, we enrich the OHSUMED disease hierarchy before adapting it to ontology alignment process for finding probable concepts or categories. Subsequently, we compute the cosine similarity between the vector in probable concepts (in the “enriched" OHSUMED disease hierarchy) and the vector in Medline abstract disease hierarchies. Finally, we assign category to the new Medline abstracts based on the similarity score. The results obtained from the experiments show the performance of our proposed approach for hierarchical classification is slightly better than the performance of the multi-class flat classification.Keywords: Biomedical literature, hierarchical text classification, ontology alignment, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201185 Parkinsons Disease Classification using Neural Network and Feature Selection
Authors: Anchana Khemphila, Veera Boonjing
Abstract:
In this study, the Multi-Layer Perceptron (MLP)with Back-Propagation learning algorithm are used to classify to effective diagnosis Parkinsons disease(PD).It-s a challenging problem for medical community.Typically characterized by tremor, PD occurs due to the loss of dopamine in the brains thalamic region that results in involuntary or oscillatory movement in the body. A feature selection algorithm along with biomedical test values to diagnose Parkinson disease.Clinical diagnosis is done mostly by doctor-s expertise and experience.But still cases are reported of wrong diagnosis and treatment. Patients are asked to take number of tests for diagnosis.In many cases,not all the tests contribute towards effective diagnosis of a disease.Our work is to classify the presence of Parkinson disease with reduced number of attributes.Original,22 attributes are involved in classify.We use Information Gain to determine the attributes which reduced the number of attributes which is need to be taken from patients.The Artificial neural networks is used to classify the diagnosis of patients.Twenty-Two attributes are reduced to sixteen attributes.The accuracy is in training data set is 82.051% and in the validation data set is 83.333%.
Keywords: Data mining, classification, Parkinson disease, artificial neural networks, feature selection, information gain.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 377884 A New DIDS Design Based on a Combination Feature Selection Approach
Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman
Abstract:
Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 193983 Improving Spatiotemporal Change Detection: A High Level Fusion Approach for Discovering Uncertain Knowledge from Satellite Image Database
Authors: Wadii Boulila, Imed Riadh Farah, Karim Saheb Ettabaa, Basel Solaiman, Henda Ben Ghezala
Abstract:
This paper investigates the problem of tracking spa¬tiotemporal changes of a satellite image through the use of Knowledge Discovery in Database (KDD). The purpose of this study is to help a given user effectively discover interesting knowledge and then build prediction and decision models. Unfortunately, the KDD process for spatiotemporal data is always marked by several types of imperfections. In our paper, we take these imperfections into consideration in order to provide more accurate decisions. To achieve this objective, different KDD methods are used to discover knowledge in satellite image databases. Each method presents a different point of view of spatiotemporal evolution of a query model (which represents an extracted object from a satellite image). In order to combine these methods, we use the evidence fusion theory which considerably improves the spatiotemporal knowledge discovery process and increases our belief in the spatiotemporal model change. Experimental results of satellite images representing the region of Auckland in New Zealand depict the improvement in the overall change detection as compared to using classical methods.
Keywords: Knowledge discovery in satellite databases, knowledge fusion, data imperfection, data mining, spatiotemporal change detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 154782 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings
Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank
Abstract:
Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].Keywords: data mining, protein secondary structure prediction, parallelization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 159681 Designing a Framework for Network Security Protection
Authors: Eric P. Jiang
Abstract:
As the Internet continues to grow at a rapid pace as the primary medium for communications and commerce and as telecommunication networks and systems continue to expand their global reach, digital information has become the most popular and important information resource and our dependence upon the underlying cyber infrastructure has been increasing significantly. Unfortunately, as our dependency has grown, so has the threat to the cyber infrastructure from spammers, attackers and criminal enterprises. In this paper, we propose a new machine learning based network intrusion detection framework for cyber security. The detection process of the framework consists of two stages: model construction and intrusion detection. In the model construction stage, a semi-supervised machine learning algorithm is applied to a collected set of network audit data to generate a profile of normal network behavior and in the intrusion detection stage, input network events are analyzed and compared with the patterns gathered in the profile, and some of them are then flagged as anomalies should these events are sufficiently far from the expected normal behavior. The proposed framework is particularly applicable to the situations where there is only a small amount of labeled network training data available, which is very typical in real world network environments.Keywords: classification, data analysis and mining, network intrusion detection, semi-supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 179580 Mixtures of Monotone Networks for Prediction
Authors: Marina Velikova, Hennie Daniels, Ad Feelders
Abstract:
In many data mining applications, it is a priori known that the target function should satisfy certain constraints imposed by, for example, economic theory or a human-decision maker. In this paper we consider partially monotone prediction problems, where the target variable depends monotonically on some of the input variables but not on all. We propose a novel method to construct prediction models, where monotone dependences with respect to some of the input variables are preserved by virtue of construction. Our method belongs to the class of mixture models. The basic idea is to convolute monotone neural networks with weight (kernel) functions to make predictions. By using simulation and real case studies, we demonstrate the application of our method. To obtain sound assessment for the performance of our approach, we use standard neural networks with weight decay and partially monotone linear models as benchmark methods for comparison. The results show that our approach outperforms partially monotone linear models in terms of accuracy. Furthermore, the incorporation of partial monotonicity constraints not only leads to models that are in accordance with the decision maker's expertise, but also reduces considerably the model variance in comparison to standard neural networks with weight decay.Keywords: mixture models, monotone neural networks, partially monotone models, partially monotone problems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 124679 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis
Authors: William Xiu Shun Wong, Myungsu Lim, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Kee-Young Kwahk, Namgyu Kim
Abstract:
Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.Keywords: Big data, social network analysis, text mining, topic modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166178 M2LGP: Mining Multiple Level Gradual Patterns
Authors: Yogi Satrya Aryadinata, Anne Laurent, Michel Sala
Abstract:
Gradual patterns have been studied for many years as they contain precious information. They have been integrated in many expert systems and rule-based systems, for instance to reason on knowledge such as “the greater the number of turns, the greater the number of car crashes”. In many cases, this knowledge has been considered as a rule “the greater the number of turns → the greater the number of car crashes” Historically, works have thus been focused on the representation of such rules, studying how implication could be defined, especially fuzzy implication. These rules were defined by experts who were in charge to describe the systems they were working on in order to turn them to operate automatically. More recently, approaches have been proposed in order to mine databases for automatically discovering such knowledge. Several approaches have been studied, the main scientific topics being: how to determine what is an relevant gradual pattern, and how to discover them as efficiently as possible (in terms of both memory and CPU usage). However, in some cases, end-users are not interested in raw level knowledge, and are rather interested in trends. Moreover, it may be the case that no relevant pattern can be discovered at a low level of granularity (e.g. city), whereas some can be discovered at a higher level (e.g. county). In this paper, we thus extend gradual pattern approaches in order to consider multiple level gradual patterns. For this purpose, we consider two aggregation policies, namely horizontal and vertical.Keywords: Gradual Pattern.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 150077 The Robust Clustering with Reduction Dimension
Authors: Dyah E. Herwindiati
Abstract:
A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paperKeywords: Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 169776 Sliding Joints and Soil-Structure Interaction
Authors: Radim Cajka, Pavlina Mateckova, Martina Janulikova, Marie Stara
Abstract:
Use of a sliding joint is an effective method to decrease the stress in foundation structure where there is a horizontal deformation of subsoil (areas afflicted with underground mining) or horizontal deformation of a foundation structure (pre-stressed foundations, creep, shrinkage, temperature deformation). A convenient material for a sliding joint is a bitumen asphalt belt. Experiments for different types of bitumen belts were undertaken at the Faculty of Civil Engineering - VSB Technical University of Ostrava in 2008. This year an extension of the 2008 experiments is in progress and the shear resistance of a slide joint is being tested as a function of temperature in a temperature controlled room. In this paper experimental results of temperature dependant shear resistance are presented. The result of the experiments should be the sliding joint shear resistance as a function of deformation velocity and temperature. This relationship is used for numerical analysis of stress/strain relation between foundation structure and subsoil. Using a rheological slide joint could lead to a decrease of the reinforcement amount, and contribute to higher reliability of foundation structure and thus enable design of more durable and sustainable building structures.Keywords: Pre-stressed foundations, sliding joint, soil-structure interaction, subsoil horizontal deformation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201675 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis
Authors: Ju-Young Kim, Tae-Heon Moon, Jung-Hun Cho
Abstract:
Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.
Keywords: Land use, SNS, text mining, urban regeneration.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 121574 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh
Abstract:
Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 127173 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm
Authors: Ghada Badr, Arwa Alturki
Abstract:
The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.Keywords: Alignment, RNA secondary structure, pairwise, component-based, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974