Search results for: distributed data stream mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26343

Search results for: distributed data stream mining

26253 A Survey on Concurrency Control Methods in Distributed Database

Authors: Seyed Mohsen Jameii

Abstract:

In the last years, remarkable improvements have been made in the ability of distributed database systems performance. A distributed database is composed of some sites which are connected to each other through network connections. In this system, if good harmonization is not made between different transactions, it may result in database incoherence. Nowadays, because of the complexity of many sites and their connection methods, it is difficult to extend different models in distributed database serially. The principle goal of concurrency control in distributed database is to ensure not interfering in accessibility of common database by different sites. Different concurrency control algorithms have been suggested to use in distributed database systems. In this paper, some available methods have been introduced and compared for concurrency control in distributed database.

Keywords: distributed database, two phase locking protocol, transaction, concurrency

Procedia PDF Downloads 326
26252 Modeling Stream Flow with Prediction Uncertainty by Using SWAT Hydrologic and RBNN Neural Network Models for Agricultural Watershed in India

Authors: Ajai Singh

Abstract:

Simulation of hydrological processes at the watershed outlet through modelling approach is essential for proper planning and implementation of appropriate soil conservation measures in Damodar Barakar catchment, Hazaribagh, India where soil erosion is a dominant problem. This study quantifies the parametric uncertainty involved in simulation of stream flow using Soil and Water Assessment Tool (SWAT), a watershed scale model and Radial Basis Neural Network (RBNN), an artificial neural network model. Both the models were calibrated and validated based on measured stream flow and quantification of the uncertainty in SWAT model output was assessed using ‘‘Sequential Uncertainty Fitting Algorithm’’ (SUFI-2). Though both the model predicted satisfactorily, but RBNN model performed better than SWAT with R2 and NSE values of 0.92 and 0.92 during training, and 0.71 and 0.70 during validation period, respectively. Comparison of the results of the two models also indicates a wider prediction interval for the results of the SWAT model. The values of P-factor related to each model shows that the percentage of observed stream flow values bracketed by the 95PPU in the RBNN model as 91% is higher than the P-factor in SWAT as 87%. In other words the RBNN model estimates the stream flow values more accurately and with less uncertainty. It could be stated that RBNN model based on simple input could be used for estimation of monthly stream flow, missing data, and testing the accuracy and performance of other models.

Keywords: SWAT, RBNN, SUFI 2, bootstrap technique, stream flow, simulation

Procedia PDF Downloads 333
26251 Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

Authors: Sana Hamdi, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.

Keywords: real-time spatial big data, quality of service, vertical partitioning, horizontal partitioning, matching algorithm, hamming distance, stream query

Procedia PDF Downloads 140
26250 Spatial Data Mining by Decision Trees

Authors: Sihem Oujdi, Hafida Belbachir

Abstract:

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

Keywords: C4.5 algorithm, decision trees, S-CART, spatial data mining

Procedia PDF Downloads 598
26249 Speed Characteristics of Mixed Traffic Flow on Urban Arterials

Authors: Ashish Dhamaniya, Satish Chandra

Abstract:

Speed and traffic volume data are collected on different sections of four lane and six lane roads in three metropolitan cities in India. Speed data are analyzed to fit the statistical distribution to individual vehicle speed data and all vehicles speed data. It is noted that speed data of individual vehicle generally follows a normal distribution but speed data of all vehicle combined at a section of urban road may or may not follow the normal distribution depending upon the composition of traffic stream. A new term Speed Spread Ratio (SSR) is introduced in this paper which is the ratio of difference in 85th and 50th percentile speed to the difference in 50th and 15th percentile speed. If SSR is unity then speed data are truly normally distributed. It is noted that on six lane urban roads, speed data follow a normal distribution only when SSR is in the range of 0.86 – 1.11. The range of SSR is validated on four lane roads also.

Keywords: normal distribution, percentile speed, speed spread ratio, traffic volume

Procedia PDF Downloads 393
26248 Identify Users Behavior from Mobile Web Access Logs Using Automated Log Analyzer

Authors: Bharat P. Modi, Jayesh M. Patel

Abstract:

Mobile Internet is acting as a major source of data. As the number of web pages continues to grow the Mobile web provides the data miners with just the right ingredients for extracting information. In order to cater to this growing need, a special term called Mobile Web mining was coined. Mobile Web mining makes use of data mining techniques and deciphers potentially useful information from web data. Web Usage mining deals with understanding the behavior of users by making use of Mobile Web Access Logs that are generated on the server while the user is accessing the website. A Web access log comprises of various entries like the name of the user, his IP address, a number of bytes transferred time-stamp etc. A variety of Log Analyzer tools exists which help in analyzing various things like users navigational pattern, the part of the website the users are mostly interested in etc. The present paper makes use of such log analyzer tool called Mobile Web Log Expert for ascertaining the behavior of users who access an astrology website. It also provides a comparative study between a few log analyzer tools available.

Keywords: mobile web access logs, web usage mining, web server, log analyzer

Procedia PDF Downloads 341
26247 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification

Procedia PDF Downloads 446
26246 Mining Diagnostic Investigation Process

Authors: Sohail Imran, Tariq Mahmood

Abstract:

In complex healthcare diagnostic investigation process, medical practitioners have to focus on ways to standardize their processes to perform high quality care and optimize the time and costs. Process mining techniques can be applied to extract process related knowledge from data without considering causal and dynamic dependencies in business domain and processes. The application of process mining is effective in diagnostic investigation. It is very helpful where a treatment gives no dispositive evidence favoring it. In this paper, we applied process mining to discover important process flow of diagnostic investigation for hepatitis patients. This approach has some benefits which can enhance the quality and efficiency of diagnostic investigation processes.

Keywords: process mining, healthcare, diagnostic investigation process, process flow

Procedia PDF Downloads 498
26245 Water Quality Determination of River Systems in Antalya Basin by Biomonitoring

Authors: Hasan Kalyoncu, Füsun Kılçık, Hatice Gülboy Akyıldırım, Aynur Özen, Mehmet Acar, Nur Yoluk

Abstract:

For evaluation of water quality of the river systems in Antalya Basin, macrozoobenthos samples were taken from 22 determined stations by a hand net and identified at family level. Water quality of Antalya Basin was determined according to Biological Monitoring Working Party (BMWP) system, by using macrozoobenthic invertebrates and physicochemical parameters. As a result of the evaluation, while Aksu Stream was determined as the most polluted stream in Antalya Basin, Isparta Stream was determined as the most polluted tributary of Aksu Stream. Pollution level of the Isparta Stream was determined as quality class V and it is the extremely polluted part of stream. Pollution loads at the sources of the streams were determined in low levels in general. Due to some parts of the streams have passed through deep canyons and take their sources from nonresidential and non-arable regions, majority of the streams that take place in Antalya Basin are at high quality level. Waste water, which comes from agricultural and residential regions, affects the lower basins of the streams. Because of the waste water, lower parts of the stream basins exposed to the pollution under anthropogenic effects. However, in Aksu Stream, which differs by being exposed to domestic and industrial wastes of Isparta City, extreme pollution was determined, particularly in the Isparta Stream part.

Keywords: Antalya basin, biomonitoring, BMWP, water quality

Procedia PDF Downloads 299
26244 Analysis of Users’ Behavior on Book Loan Log Based on Association Rule Mining

Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong

Abstract:

This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24 percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.

Keywords: behavior, data mining technique, a priori algorithm, knowledge discovery

Procedia PDF Downloads 388
26243 An Efficient Data Mining Technique for Online Stores

Authors: Mohammed Al-Shalabi, Alaa Obeidat

Abstract:

In any food stores, some items will be expired or destroyed because the demand on these items is infrequent, so we need a system that can help the decision maker to make an offer on such items to improve the demand on the items by putting them with some other frequent item and decrease the price to avoid losses. The system generates hundreds or thousands of patterns (offers) for each low demand item, then it uses the association rules (support, confidence) to find the interesting patterns (the best offer to achieve the lowest losses). In this paper, we propose a data mining method for determining the best offer by merging the data mining techniques with the e-commerce strategy. The task is to build a model to predict the best offer. The goal is to maximize the profits of a store and avoid the loss of products. The idea in this paper is the using of the association rules in marketing with a combination with e-commerce.

Keywords: data mining, association rules, confidence, online stores

Procedia PDF Downloads 389
26242 Spatio-Temporal Data Mining with Association Rules for Lake Van

Authors: Tolga Aydin, M. Fatih Alaeddinoğlu

Abstract:

People, throughout the history, have made estimates and inferences about the future by using their past experiences. Developing information technologies and the improvements in the database management systems make it possible to extract useful information from knowledge in hand for the strategic decisions. Therefore, different methods have been developed. Data mining by association rules learning is one of such methods. Apriori algorithm, one of the well-known association rules learning algorithms, is not commonly used in spatio-temporal data sets. However, it is possible to embed time and space features into the data sets and make Apriori algorithm a suitable data mining technique for learning spatio-temporal association rules. Lake Van, the largest lake of Turkey, is a closed basin. This feature causes the volume of the lake to increase or decrease as a result of change in water amount it holds. In this study, evaporation, humidity, lake altitude, amount of rainfall and temperature parameters recorded in Lake Van region throughout the years are used by the Apriori algorithm and a spatio-temporal data mining application is developed to identify overflows and newly-formed soil regions (underflows) occurring in the coastal parts of Lake Van. Identifying possible reasons of overflows and underflows may be used to alert the experts to take precautions and make the necessary investments.

Keywords: apriori algorithm, association rules, data mining, spatio-temporal data

Procedia PDF Downloads 349
26241 Assessment of Prevalent Diseases Caused by Mining Activities in the Northern Part of Mindanao Island, Philippines

Authors: Odinah Cuartero-Enteria, Kyla Rita Mercado, Jason Salamanes, Aian Pecasales, Sherwin Sabado

Abstract:

The northern part of Mindanao Island, Philippines has sizable reserve of mineral resources. Years ago, mining activities have been flourishing which resulted to both local economic gain but with environmental concerns. This study investigates the prevalent diseases by mining activities in these areas. The study was done using the secondary data gathered from the Rural Health Units (RHU) of the selected areas. The study further determined the prevalent diseases that existed in the three areas from years 2005, 2010 and 2015 indicating before the mining activities and when mining activities are present. The results show that areas which are far from mining activities have fewer cases of patients suffering from air-borne diseases. The top ten most common diseases such as pneumonia, tuberculosis, influenza, upper respiratory tract infection (URTI) and skin diseases were caused by air-borne due to air pollution. Hence, the places where mining activities are present contribute to the prevalent diseases. Thus, addressing the air pollution caused by mining activities is very important.

Keywords: Philippines, Mindanao Island, mining activities, pollution, prevalent diseases

Procedia PDF Downloads 451
26240 Machine Learning Application in Shovel Maintenance

Authors: Amir Taghizadeh Vahed, Adithya Thaduri

Abstract:

Shovels are the main components in the mining transportation system. The productivity of the mines depends on the availability of shovels due to its high capital and operating costs. The unplanned failure/shutdowns of a shovel results in higher repair costs, increase in downtime, as well as increasing indirect cost (i.e. loss of production and company’s reputation). In order to mitigate these failures, predictive maintenance can be useful approach using failure prediction. The modern mining machinery or shovels collect huge datasets automatically; it consists of reliability and maintenance data. However, the gathered datasets are useless until the information and knowledge of data are extracted. Machine learning as well as data mining, which has a major role in recent studies, has been used for the knowledge discovery process. In this study, data mining and machine learning approaches are implemented to detect not only anomalies but also patterns from a dataset and further detection of failures.

Keywords: maintenance, machine learning, shovel, conditional based monitoring

Procedia PDF Downloads 189
26239 Reversible Information Hitting in Encrypted JPEG Bitstream by LSB Based on Inherent Algorithm

Authors: Vaibhav Barve

Abstract:

Reversible information hiding has drawn a lot of interest as of late. Being reversible, we can restore unique computerized data totally. It is a plan where mystery data is put away in digital media like image, video, audio to maintain a strategic distance from unapproved access and security reason. By and large JPEG bit stream is utilized to store this key data, first JPEG bit stream is encrypted into all around sorted out structure and then this secret information or key data is implanted into this encrypted region by marginally changing the JPEG bit stream. Valuable pixels suitable for information implanting are computed and as indicated by this key subtle elements are implanted. In our proposed framework we are utilizing RC4 algorithm for encrypting JPEG bit stream. Encryption key is acknowledged by framework user which, likewise, will be used at the time of decryption. We are executing enhanced least significant bit supplanting steganography by utilizing genetic algorithm. At first, the quantity of bits that must be installed in a guaranteed coefficient is versatile. By utilizing proper parameters, we can get high capacity while ensuring high security. We are utilizing logistic map for shuffling of bits and utilization GA (Genetic Algorithm) to find right parameters for the logistic map. Information embedding key is utilized at the time of information embedding. By utilizing precise picture encryption and information embedding key, the beneficiary can, without much of a stretch, concentrate the incorporated secure data and totally recoup the first picture and also the original secret information. At the point when the embedding key is truant, the first picture can be recouped pretty nearly with sufficient quality without getting the embedding key of interest.

Keywords: data embedding, decryption, encryption, reversible data hiding, steganography

Procedia PDF Downloads 271
26238 ROOP: Translating Sequential Code Fragments to Distributed Code Fragments Using Deep Reinforcement Learning

Authors: Arun Sanjel, Greg Speegle

Abstract:

Every second, massive amounts of data are generated, and Data Intensive Scalable Computing (DISC) frameworks have evolved into effective tools for analyzing such massive amounts of data. Since the underlying architecture of these distributed computing platforms is often new to users, building a DISC application can often be time-consuming and prone to errors. The automated conversion of a sequential program to a DISC program will consequently significantly improve productivity. However, synthesizing a user’s intended program from an input specification is complex, with several important applications, such as distributed program synthesizing and code refactoring. Existing works such as Tyro and Casper rely entirely on deductive synthesis techniques or similar program synthesis approaches. Our approach is to develop a data-driven synthesis technique to identify sequential components and translate them to equivalent distributed operations. We emphasize using reinforcement learning and unit testing as feedback mechanisms to achieve our objectives.

Keywords: program synthesis, distributed computing, reinforcement learning, unit testing, DISC

Procedia PDF Downloads 79
26237 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 351
26236 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research

Authors: Carla Silva

Abstract:

Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.

Keywords: data mining, research analysis, investment decision-making, educational research

Procedia PDF Downloads 333
26235 Human Immunodeficiency Virus (HIV) Test Predictive Modeling and Identify Determinants of HIV Testing for People with Age above Fourteen Years in Ethiopia Using Data Mining Techniques: EDHS 2011

Authors: S. Abera, T. Gidey, W. Terefe

Abstract:

Introduction: Testing for HIV is the key entry point to HIV prevention, treatment, and care and support services. Hence, predictive data mining techniques can greatly benefit to analyze and discover new patterns from huge datasets like that of EDHS 2011 data. Objectives: The objective of this study is to build a predictive modeling for HIV testing and identify determinants of HIV testing for adults with age above fourteen years using data mining techniques. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes among adult Ethiopians. Decision tree, Naïve-Bayes, logistic regression and artificial neural networks of data mining techniques were used to build the predictive models. Results: The target dataset contained 30,625 study participants; of which 16, 515 (53.9%) were women. Nearly two-fifth; 17,719 (58%), have never been tested for HIV while the rest 12,906 (42%) had been tested. Ethiopians with higher wealth index, higher educational level, belonging 20 to 29 years old, having no stigmatizing attitude towards HIV positive person, urban residents, having HIV related knowledge, information about family planning on mass media and knowing a place where to get testing for HIV showed an increased patterns with respect to HIV testing. Conclusion and Recommendation: Public health interventions should consider the identified determinants to promote people to get testing for HIV.

Keywords: data mining, HIV, testing, ethiopia

Procedia PDF Downloads 470
26234 Study for Establishing a Concept of Underground Mining in a Folded Deposit with Weathering

Authors: Chandan Pramanik, Bikramjit Chanda

Abstract:

Large metal mines operated with open-cast mining methods must transition to underground mining at the conclusion of the operation; however, this requires a period of a difficult time when production convergence due to interference between the two mining methods. A transition model with collaborative mining operations is presented and established in this work, based on the case of the South Kaliapani Underground Project, to address these technical issues of inadequate production security and other mining challenges during the transition phase and beyond. By integrating the technology of the small-scale Drift and Fill method and Highly productive Sub Level Open Stoping at deep section, this hybrid mining concept tries to eliminate major bottlenecks and offers an optimized production profile with the safe and sustainable operation. Considering every geo-mining aspect, this study offers a genuine and precise technical deliberation for the transition from open pit to underground mining.

Keywords: drift and fill, geo-mining aspect, sublevel open stoping, underground mining method

Procedia PDF Downloads 80
26233 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 352
26232 The Environmental and Socio Economic Impacts of Mining on Local Livelihood in Cameroon: A Case Study in Bertoua

Authors: Fongang Robert Tichuck

Abstract:

This paper reports the findings of a study undertaken to assess the socio-economic and environmental impacts of mining in Bertoua Eastern Region of Cameroon. In addition to sampling community perceptions of mining activities, the study prescribes interventions that can assist in mitigating the negative impacts of mining. Marked environmental and interrelated socio-economic improvements can be achieved within regional artisanal gold mines if the government provides technical support to local operators, regulations are improved, and illegal mining activity is reduced.

Keywords: gold mining, socio-economic, mining activities, local people

Procedia PDF Downloads 369
26231 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 44
26230 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 855
26229 Effects of Urbanization on Land Use/Land Cover and Stream Flow of a Sub-Tropical River Basin of India

Authors: Satyavati Shukla, Lakhan V. Rathod, Mohan V. Khire

Abstract:

Rapid urbanization changes the land use/land cover pattern of a developing region. Due to these land surface changes, stream flow of the rivers also changes. It is important to investigate the factors affecting hydrological characteristics of the river basin for better river basin management planning. This study is aimed to understand the effect of Land Use/Land Cover (LU/LC) changes on stream flow of Upper Bhima River basin which is highly stressed in terms of water resources. In this study, Upper Bhima River basin is divided into two adjacent sub-watersheds: Mula-Mutha (urbanized) sub-watershed and Bhima (non-urbanized) sub-watershed. First of all, LU/LC changes were estimated over 1980, 2002, and 2009 for both Mula-Mutha and Bhima sub-watersheds. Further, stream flow simulations were done using Soil and Water Assessment Tool (SWAT) for the streams draining both watersheds. Results revealed that stream flow was relatively higher for urbanized sub-watershed. Through Sensitivity Analysis it was observed that out of all the parameters used, base flow was the most sensitive parameter towards LU/LC changes.

Keywords: land use/land cover, remote sensing, stream flow, urbanization

Procedia PDF Downloads 300
26228 A Data Mining Approach for Analysing and Predicting the Bank's Asset Liability Management Based on Basel III Norms

Authors: Nidhin Dani Abraham, T. K. Sri Shilpa

Abstract:

Asset liability management is an important aspect in banking business. Moreover, the today’s banking is based on BASEL III which strictly regulates on the counterparty default. This paper focuses on prediction and analysis of counter party default risk, which is a type of risk occurs when the customers fail to repay the amount back to the lender (bank or any financial institutions). This paper proposes an approach to reduce the counterparty risk occurring in the financial institutions using an appropriate data mining technique and thus predicts the occurrence of NPA. It also helps in asset building and restructuring quality. Liability management is very important to carry out banking business. To know and analyze the depth of liability of bank, a suitable technique is required. For that a data mining technique is being used to predict the dormant behaviour of various deposit bank customers. Various models are implemented and the results are analyzed of saving bank deposit customers. All these data are cleaned using data cleansing approach from the bank data warehouse.

Keywords: data mining, asset liability management, BASEL III, banking

Procedia PDF Downloads 530
26227 Extracting Opinions from Big Data of Indonesian Customer Reviews Using Hadoop MapReduce

Authors: Veronica S. Moertini, Vinsensius Kevin, Gede Karya

Abstract:

Customer reviews have been collected by many kinds of e-commerce websites selling products, services, hotel rooms, tickets and so on. Each website collects its own customer reviews. The reviews can be crawled, collected from those websites and stored as big data. Text analysis techniques can be used to analyze that data to produce summarized information, such as customer opinions. Then, these opinions can be published by independent service provider websites and used to help customers in choosing the most suitable products or services. As the opinions are analyzed from big data of reviews originated from many websites, it is expected that the results are more trusted and accurate. Indonesian customers write reviews in Indonesian language, which comes with its own structures and uniqueness. We found that most of the reviews are expressed with “daily language”, which is informal, do not follow the correct grammar, have many abbreviations and slangs or non-formal words. Hadoop is an emerging platform aimed for storing and analyzing big data in distributed systems. A Hadoop cluster consists of master and slave nodes/computers operated in a network. Hadoop comes with distributed file system (HDFS) and MapReduce framework for supporting parallel computation. However, MapReduce has weakness (i.e. inefficient) for iterative computations, specifically, the cost of reading/writing data (I/O cost) is high. Given this fact, we conclude that MapReduce function is best adapted for “one-pass” computation. In this research, we develop an efficient technique for extracting or mining opinions from big data of Indonesian reviews, which is based on MapReduce with one-pass computation. In designing the algorithm, we avoid iterative computation and instead adopt a “look up table” technique. The stages of the proposed technique are: (1) Crawling the data reviews from websites; (2) cleaning and finding root words from the raw reviews; (3) computing the frequency of the meaningful opinion words; (4) analyzing customers sentiments towards defined objects. The experiments for evaluating the performance of the technique were conducted on a Hadoop cluster with 14 slave nodes. The results show that the proposed technique (stage 2 to 4) discovers useful opinions, is capable of processing big data efficiently and scalable.

Keywords: big data analysis, Hadoop MapReduce, analyzing text data, mining Indonesian reviews

Procedia PDF Downloads 184
26226 Hydrogeophysical Investigations And Mapping of Ingress Channels Along The Blesbokspruit Stream In The East Rand Basin Of The Witwatersrand, South Africa

Authors: Melvin Sethobya, Sithule Xanga, Sechaba Lenong, Lunga Nolakana, Gbenga Adesola

Abstract:

Mining has been the cornerstone of the South African economy for the last century. Most of the gold mining in South Africa was conducted within the Witwatersrand basin, which contributed to the rapid growth of the city of Johannesburg and capitulated the city to becoming the business and wealth capital of the country. But with gradual depletion of resources, a stoppage in the extraction of underground water from mines and other factors relating to survival of the mining operations over a lengthy period, most of the mines were abandoned and left to pollute the local waterways and groundwater with toxins, heavy metal residue and increased acid mine drainage ensued. The Department of Mineral Resources and Energy commissioned a project whose aim is to monitor, maintain, and mitigate the adverse environmental impacts of polluted water mine water flowing into local streams affecting local ecosystems and livelihoods downstream. As part of mitigation efforts, the diagnosis and monitoring of groundwater or surface water polluted sites has become important. Geophysical surveys, in particular, Resistivity and Magnetics surveys, were selected as some of most suitable techniques for investigation of local ingress points along of one the major streams cutting through the Witwatersrand basin, namely the Blesbokspruit, which is found in the eastern part of the basin. The aim of the surveys was to provide information that could be used to assist in determining possible water loss/ ingress from the Blesbokspriut stream. Modelling of geophysical surveys results offered an in-depth insight into the interaction and pathways of polluted water through mapping of possible ingress channels near the Blesbokspruit. The resistivity - depth profile of the surveyed site exhibit a three(3) layered model with low resistivity values (10 to 200 Ω.m) overburden, which is underlain by a moderate resistivity weathered layer (>300 Ω.m), which sits on a more resistive crystalline bedrock (>500 Ω.m). Two locations of potential ingress channels were mapped across the two traverses at the site. The magnetic survey conducted at the site mapped a major NE-SW trending regional linearment with a strong magnetic signature, which was modeled to depth beyond 100m, with the potential to act as a conduit for dispersion of stream water away from the stream, as it shared a similar orientation with the potential ingress channels as mapped using the resistivity method.

Keywords: eletrictrical resistivity, magnetics survey, blesbokspruit, ingress

Procedia PDF Downloads 48
26225 Design and Development of Data Mining Application for Medical Centers in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

Data Mining is the extraction of information from a large database which helps in predicting a trend or behavior, thereby helping management make knowledge-driven decisions. One principal problem of most hospitals in rural areas is making use of the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method, which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to easily retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: data mining, medical record system, systems programming, computing

Procedia PDF Downloads 188
26224 Data Mining Spatial: Unsupervised Classification of Geographic Data

Authors: Chahrazed Zouaoui

Abstract:

In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.

Keywords: mining, GIS, geo-clustering, neighborhood

Procedia PDF Downloads 360