Search results for: knowledge discovery and data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30260

Search results for: knowledge discovery and data mining

30140 Efficient Recommendation System for Frequent and High Utility Itemsets over Incremental Datasets

Authors: J. K. Kavitha, D. Manjula, U. Kanimozhi

Abstract:

Mining frequent and high utility item sets have gained much significance in the recent years. When the data arrives sporadically, incremental and interactive rule mining and utility mining approaches can be adopted to handle user’s dynamic environmental needs and avoid redundancies, using previous data structures, and mining results. The dependence on recommendation systems has exponentially risen since the advent of search engines. This paper proposes a model for building a recommendation system that suggests frequent and high utility item sets over dynamic datasets for a cluster based location prediction strategy to predict user’s trajectories using the Efficient Incremental Rule Mining (EIRM) algorithm and the Fast Update Utility Pattern Tree (FUUP) algorithm. Through comprehensive evaluations by experiments, this scheme has shown to deliver excellent performance.

Keywords: data sets, recommendation system, utility item sets, frequent item sets mining

Procedia PDF Downloads 291
30139 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 273
30138 Hybrid Approximate Structural-Semantic Frequent Subgraph Mining

Authors: Montaceur Zaghdoud, Mohamed Moussaoui, Jalel Akaichi

Abstract:

Frequent subgraph mining refers usually to graph matching and it is widely used in when analyzing big data with large graphs. A lot of research works dealt with structural exact or inexact graph matching but a little attention is paid to semantic matching when graph vertices and/or edges are attributed and typed. Therefore, it seems very interesting to integrate background knowledge into the analysis and that extracted frequent subgraphs should become more pruned by applying a new semantic filter instead of using only structural similarity in graph matching process. Consequently, this paper focuses on developing a new hybrid approximate structuralsemantic graph matching to discover a set of frequent subgraphs. It uses simultaneously an approximate structural similarity function based on graph edit distance function and a possibilistic vertices similarity function based on affinity function. Both structural and semantic filters contribute together to prune extracted frequent set. Indeed, new hybrid structural-semantic frequent subgraph mining approach searches will be suitable to be applied to several application such as community detection in social networks.

Keywords: approximate graph matching, hybrid frequent subgraph mining, graph mining, possibility theory

Procedia PDF Downloads 401
30137 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: hybrid storage system, data mining, recurrent neural network, support vector machine

Procedia PDF Downloads 306
30136 Classification Rule Discovery by Using Parallel Ant Colony Optimization

Authors: Waseem Shahzad, Ayesha Tahir Khan, Hamid Hussain Awan

Abstract:

Ant-Miner algorithm that lies under ACO algorithms is used to extract knowledge from data in the form of rules. A variant of Ant-Miner algorithm named as cAnt-MinerPB is used to generate list of rules using pittsburgh approach in order to maintain the rule interaction among the rules that are generated. In this paper, we propose a parallel Ant MinerPB in which Ant colony optimization algorithm runs parallel. In this technique, a data set is divided vertically (i-e attributes) into different subsets. These subsets are created based on the correlation among attributes using Mutual Information (MI). It generates rules in a parallel manner and then merged to form a final list of rules. The results have shown that the proposed technique achieved higher accuracy when compared with original cAnt-MinerPB and also the execution time has also reduced.

Keywords: ant colony optimization, parallel Ant-MinerPB, vertical partitioning, classification rule discovery

Procedia PDF Downloads 293
30135 Determination of the Risks of Heart Attack at the First Stage as Well as Their Control and Resource Planning with the Method of Data Mining

Authors: İbrahi̇m Kara, Seher Arslankaya

Abstract:

Frequently preferred in the field of engineering in particular, data mining has now begun to be used in the field of health as well since the data in the health sector have reached great dimensions. With data mining, it is aimed to reveal models from the great amounts of raw data in agreement with the purpose and to search for the rules and relationships which will enable one to make predictions about the future from the large amount of data set. It helps the decision-maker to find the relationships among the data which form at the stage of decision-making. In this study, it is aimed to determine the risk of heart attack at the first stage, to control it, and to make its resource planning with the method of data mining. Through the early and correct diagnosis of heart attacks, it is aimed to reveal the factors which affect the diseases, to protect health and choose the right treatment methods, to reduce the costs in health expenditures, and to shorten the durations of patients’ stay at hospitals. In this way, the diagnosis and treatment costs of a heart attack will be scrutinized, which will be useful to determine the risk of the disease at the first stage, to control it, and to make its resource planning.

Keywords: data mining, decision support systems, heart attack, health sector

Procedia PDF Downloads 355
30134 Generation of Knowlege with Self-Learning Methods for Ophthalmic Data

Authors: Klaus Peter Scherer, Daniel Knöll, Constantin Rieder

Abstract:

Problem and Purpose: Intelligent systems are available and helpful to support the human being decision process, especially when complex surgical eye interventions are necessary and must be performed. Normally, such a decision support system consists of a knowledge-based module, which is responsible for the real assistance power, given by an explanation and logical reasoning processes. The interview based acquisition and generation of the complex knowledge itself is very crucial, because there are different correlations between the complex parameters. So, in this project (semi)automated self-learning methods are researched and developed for an enhancement of the quality of such a decision support system. Methods: For ophthalmic data sets of real patients in a hospital, advanced data mining procedures seem to be very helpful. Especially subgroup analysis methods are developed, extended and used to analyze and find out the correlations and conditional dependencies between the structured patient data. After finding causal dependencies, a ranking must be performed for the generation of rule-based representations. For this, anonymous patient data are transformed into a special machine language format. The imported data are used as input for algorithms of conditioned probability methods to calculate the parameter distributions concerning a special given goal parameter. Results: In the field of knowledge discovery advanced methods and applications could be performed to produce operation and patient related correlations. So, new knowledge was generated by finding causal relations between the operational equipment, the medical instances and patient specific history by a dependency ranking process. After transformation in association rules logically based representations were available for the clinical experts to evaluate the new knowledge. The structured data sets take account of about 80 parameters as special characteristic features per patient. For different extended patient groups (100, 300, 500), as well one target value as well multi-target values were set for the subgroup analysis. So the newly generated hypotheses could be interpreted regarding the dependency or independency of patient number. Conclusions: The aim and the advantage of such a semi-automatically self-learning process are the extensions of the knowledge base by finding new parameter correlations. The discovered knowledge is transformed into association rules and serves as rule-based representation of the knowledge in the knowledge base. Even more, than one goal parameter of interest can be considered by the semi-automated learning process. With ranking procedures, the most strong premises and also conjunctive associated conditions can be found to conclude the interested goal parameter. So the knowledge, hidden in structured tables or lists can be extracted as rule-based representation. This is a real assistance power for the communication with the clinical experts.

Keywords: an expert system, knowledge-based support, ophthalmic decision support, self-learning methods

Procedia PDF Downloads 252
30133 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline Maria Ribeiro Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). Previously we developed and proposed a novel strategy capable of detecting patterns at borehole images that may point to regions that have tension and breakout characteristics, based on segmented images. In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge data set configurations.

Keywords: image segmentation, oil well visualization, classifiers, data-mining, visual computer

Procedia PDF Downloads 302
30132 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 392
30131 Troubleshooting Petroleum Equipment Based on Wireless Sensors Based on Bayesian Algorithm

Authors: Vahid Bayrami Rad

Abstract:

In this research, common methods and techniques have been investigated with a focus on intelligent fault finding and monitoring systems in the oil industry. In fact, remote and intelligent control methods are considered a necessity for implementing various operations in the oil industry, but benefiting from the knowledge extracted from countless data generated with the help of data mining algorithms. It is a avoid way to speed up the operational process for monitoring and troubleshooting in today's big oil companies. Therefore, by comparing data mining algorithms and checking the efficiency and structure and how these algorithms respond in different conditions, The proposed (Bayesian) algorithm using data clustering and their analysis and data evaluation using a colored Petri net has provided an applicable and dynamic model from the point of view of reliability and response time. Therefore, by using this method, it is possible to achieve a dynamic and consistent model of the remote control system and prevent the occurrence of leakage in oil pipelines and refineries and reduce costs and human and financial errors. Statistical data The data obtained from the evaluation process shows an increase in reliability, availability and high speed compared to other previous methods in this proposed method.

Keywords: wireless sensors, petroleum equipment troubleshooting, Bayesian algorithm, colored Petri net, rapid miner, data mining-reliability

Procedia PDF Downloads 64
30130 Privacy Preserving in Association Rule Mining on Horizontally Partitioned Database

Authors: Manvar Sagar, Nikul Virpariya

Abstract:

The advancement in data mining techniques plays an important role in many applications. In context of privacy and security issues, the problems caused by association rule mining technique are investigated by many research scholars. It is proved that the misuse of this technique may reveal the database owner’s sensitive and private information to others. Many researchers have put their effort to preserve privacy in Association Rule Mining. Amongst the two basic approaches for privacy preserving data mining, viz. Randomization based and Cryptography based, the later provides high level of privacy but incurs higher computational as well as communication overhead. Hence, it is necessary to explore alternative techniques that improve the over-heads. In this work, we propose an efficient, collusion-resistant cryptography based approach for distributed Association Rule mining using Shamir’s secret sharing scheme. As we show from theoretical and practical analysis, our approach is provably secure and require only one time a trusted third party. We use secret sharing for privately sharing the information and code based identification scheme to add support against malicious adversaries.

Keywords: Privacy, Privacy Preservation in Data Mining (PPDM), horizontally partitioned database, EMHS, MFI, shamir secret sharing

Procedia PDF Downloads 406
30129 Signal Strength Based Multipath Routing for Mobile Ad Hoc Networks

Authors: Chothmal

Abstract:

In this paper, we present a route discovery process which uses the signal strength on a link as a parameter of its inclusion in the route discovery method. The proposed signal-to-interference and noise ratio (SINR) based multipath reactive routing protocol is named as SINR-MP protocol. The proposed SINR-MP routing protocols has two following two features: a) SINR-MP protocol selects routes based on the SINR of the links during the route discovery process therefore it select the routes which has long lifetime and low frame error rate for data transmission, and b) SINR-MP protocols route discovery process is multipath which discovers more than one SINR based route between a given source destination pair. The multiple routes selected by our SINR-MP protocol are node-disjoint in nature which increases their robustness against link failures, as failure of one route will not affect the other route. The secondary route is very useful in situations where the primary route is broken because we can now use the secondary route without causing a new route discovery process. Due to this, the network overhead caused by a route discovery process is avoided. This increases the network performance greatly. The proposed SINR-MP routing protocol is implemented in the trail version of network simulator called Qualnet.

Keywords: ad hoc networks, quality of service, video streaming, H.264/SVC, multiple routes, video traces

Procedia PDF Downloads 247
30128 Curriculum Check in Industrial Design, Based on Knowledge Management in Iran Universities

Authors: Maryam Mostafaee, Hassan Sadeghi Naeini, Sara Mostowfi

Abstract:

Today’s Knowledge management (KM), plays an important role in organizations. Basically, knowledge management is in the relation of using it for taking advantage of work forces in an organization for forwarding the goals and demand of that organization used at the most. The purpose of knowledge management is not only to manage existing documentation, information, and Data through an organization, but the most important part of KM is to control most important and key factor of those information and Data. For sure it is to chase the information needed for the employees in the right time of needed to take from genuine source for bringing out the best performance and result then in this matter the performance of organization will be at most of it. There are a lot of definitions over the objective of management released. Management is the science that in force the accurate knowledge with repeating to the organization to shape it and take full advantages for reaching goals and targets in the organization to be used by employees and users, but the definition of Knowledge based on Kalinz dictionary is: Facts, emotions or experiences known by man or group of people is ‘ knowledge ‘: Based on the Merriam Webster Dictionary: the act or skill of controlling and making decision about a business, department, sport team, etc, based on the Oxford Dictionary: Efficient handling of information and resources within a commercial organization, and based on the Oxford Dictionary: The art or process of designing manufactured products: the scale is a beautiful work of industrial design. When knowledge management performed executive in universities, discovery and create a new knowledge be facilitated. Make procedures between different units for knowledge exchange. College's officials and employees understand the importance of knowledge for University's success and will make more efforts to prevent the errors. In this strategy, is explored factors and affective trends and manage of it in University. In this research, Iranian universities for a time being analyzed that over usage of knowledge management, how they are behaving and having understood this matter: 1. Discovery of knowledge management in Iranian Universities, 2. Transferring exciting knowledge between faculties and unites, 3. Participate of employees for getting and using and transferring knowledge, 4.The accessibility of valid sources, 5. Researching over factors and correct processes in the university. We are pointing in some examples that we have already analyzed which is: -Enabling better and faster decision-making, -Making it easy to find relevant information and resources, -Reusing ideas, documents, and expertise, -Avoiding redundant effort. Consequence: It is found that effectiveness of knowledge management in the Industrial design field is low. Based on filled checklist by Education officials and professors in universities, and coefficient of effectiveness Calculate, knowledge management could not get the right place.

Keywords: knowledge management, industrial design, educational curriculum, learning performance

Procedia PDF Downloads 370
30127 Survey Research Assessment for Renewable Energy Integration into the Mining Industry

Authors: Kateryna Zharan, Jan C. Bongaerts

Abstract:

Mining operations are energy intensive, and the share of energy costs in total costs is often quoted in the range of 40 %. Saving on energy costs is, therefore, a key element of any mine operator. With the improving reliability and security of renewable energy (RE) sources, and requirements to reduce carbon dioxide emissions, perspectives for using RE in mining operations emerge. These aspects are stimulating the mining companies to search for ways to substitute fossil energy with RE. Hereby, the main purpose of this study is to present the survey research assessment in matter of finding out the key issues related to the integration of RE into mining activities, based on the mining and renewable energy experts’ opinion. The purpose of the paper is to present the outcomes of a survey conducted among mining and renewable energy experts about the feasibility of RE in mining operations. The survey research has been developed taking into consideration the following categories: first of all, the mining and renewable energy experts were chosen based on the specific criteria. Secondly, they were offered a questionnaire to gather their knowledge and opinions on incentives for mining operators to turn to RE, barriers and challenges to be expected, environmental effects, appropriate business models and the overall impact of RE on mining operations. The outcomes of the survey allow for the identification of factors which favor and disfavor decision-making on the use of RE in mining operations. It concludes with a set of recommendations for further study. One of them relates to a deeper analysis of benefits for mining operators when using RE, and another one suggests that appropriate business models considering economic and environmental issues need to be studied and developed. The results of the paper will be used for developing a hybrid optimized model which might be adopted at mines according to their operation processes as well as economic and environmental perspectives.

Keywords: carbon dioxide emissions, mining industry, photovoltaic, renewable energy, survey research, wind generation

Procedia PDF Downloads 356
30126 Identify Users Behavior from Mobile Web Access Logs Using Automated Log Analyzer

Authors: Bharat P. Modi, Jayesh M. Patel

Abstract:

Mobile Internet is acting as a major source of data. As the number of web pages continues to grow the Mobile web provides the data miners with just the right ingredients for extracting information. In order to cater to this growing need, a special term called Mobile Web mining was coined. Mobile Web mining makes use of data mining techniques and deciphers potentially useful information from web data. Web Usage mining deals with understanding the behavior of users by making use of Mobile Web Access Logs that are generated on the server while the user is accessing the website. A Web access log comprises of various entries like the name of the user, his IP address, a number of bytes transferred time-stamp etc. A variety of Log Analyzer tools exists which help in analyzing various things like users navigational pattern, the part of the website the users are mostly interested in etc. The present paper makes use of such log analyzer tool called Mobile Web Log Expert for ascertaining the behavior of users who access an astrology website. It also provides a comparative study between a few log analyzer tools available.

Keywords: mobile web access logs, web usage mining, web server, log analyzer

Procedia PDF Downloads 360
30125 Spatial Data Mining by Decision Trees

Authors: Sihem Oujdi, Hafida Belbachir

Abstract:

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

Keywords: C4.5 algorithm, decision trees, S-CART, spatial data mining

Procedia PDF Downloads 611
30124 Application of Granular Computing Paradigm in Knowledge Induction

Authors: Iftikhar U. Sikder

Abstract:

This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.

Keywords: concept approximation, granular computing, reducts, rough set theory, rule induction

Procedia PDF Downloads 529
30123 Assessment of Prevalent Diseases Caused by Mining Activities in the Northern Part of Mindanao Island, Philippines

Authors: Odinah Cuartero-Enteria, Kyla Rita Mercado, Jason Salamanes, Aian Pecasales, Sherwin Sabado

Abstract:

The northern part of Mindanao Island, Philippines has sizable reserve of mineral resources. Years ago, mining activities have been flourishing which resulted to both local economic gain but with environmental concerns. This study investigates the prevalent diseases by mining activities in these areas. The study was done using the secondary data gathered from the Rural Health Units (RHU) of the selected areas. The study further determined the prevalent diseases that existed in the three areas from years 2005, 2010 and 2015 indicating before the mining activities and when mining activities are present. The results show that areas which are far from mining activities have fewer cases of patients suffering from air-borne diseases. The top ten most common diseases such as pneumonia, tuberculosis, influenza, upper respiratory tract infection (URTI) and skin diseases were caused by air-borne due to air pollution. Hence, the places where mining activities are present contribute to the prevalent diseases. Thus, addressing the air pollution caused by mining activities is very important.

Keywords: Philippines, Mindanao Island, mining activities, pollution, prevalent diseases

Procedia PDF Downloads 472
30122 Mining Scientific Literature to Discover Potential Research Data Sources: An Exploratory Study in the Field of Haemato-Oncology

Authors: A. Anastasiou, K. S. Tingay

Abstract:

Background: Discovering suitable datasets is an important part of health research, particularly for projects working with clinical data from patients organized in cohorts (cohort data), but with the proliferation of so many national and international initiatives, it is becoming increasingly difficult for research teams to locate real world datasets that are most relevant to their project objectives. We present a method for identifying healthcare institutes in the European Union (EU) which may hold haemato-oncology (HO) data. A key enabler of this research was the bibInsight platform, a scientometric data management and analysis system developed by the authors at Swansea University. Method: A PubMed search was conducted using HO clinical terms taken from previous work. The resulting XML file was processed using the bibInsight platform, linking affiliations to the Global Research Identifier Database (GRID). GRID is an international, standardized list of institutions, including the city and country in which the institution exists, as well as a category of the main business type, e.g., Academic, Healthcare, Government, Company. Countries were limited to the 28 current EU members, and institute type to 'Healthcare'. An article was considered valid if at least one author was affiliated with an EU-based healthcare institute. Results: The PubMed search produced 21,310 articles, consisting of 9,885 distinct affiliations with correspondence in GRID. Of these articles, 760 were from EU countries, and 390 of these were healthcare institutes. One affiliation was excluded as being a veterinary hospital. Two EU countries did not have any publications in our analysis dataset. The results were analysed by country and by individual healthcare institute. Networks both within the EU and internationally show institutional collaborations, which may suggest a willingness to share data for research purposes. Geographical mapping can ensure that data has broad population coverage. Collaborations with industry or government may exclude healthcare institutes that may have embargos or additional costs associated with data access. Conclusions: Data reuse is becoming increasingly important both for ensuring the validity of results, and economy of available resources. The ability to identify potential, specific data sources from over twenty thousand articles in less than an hour could assist in improving knowledge of, and access to, data sources. As our method has not yet specified if these healthcare institutes are holding data, or merely publishing on that topic, future work will involve text mining of data-specific concordant terms to identify numbers of participants, demographics, study methodologies, and sub-topics of interest.

Keywords: data reuse, data discovery, data linkage, journal articles, text mining

Procedia PDF Downloads 115
30121 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 160
30120 Development of Knowledge Discovery Based Interactive Decision Support System on Web Platform for Maternal and Child Health System Strengthening

Authors: Partha Saha, Uttam Kumar Banerjee

Abstract:

Maternal and Child Healthcare (MCH) has always been regarded as one of the important issues globally. Reduction of maternal and child mortality rates and increase of healthcare service coverage were declared as one of the targets in Millennium Development Goals till 2015 and thereafter as an important component of the Sustainable Development Goals. Over the last decade, worldwide MCH indicators have improved but could not match the expected levels. Progress of both maternal and child mortality rates have been monitored by several researchers. Each of the studies has stated that only less than 26% of low-income and middle income countries (LMICs) were on track to achieve targets as prescribed by MDG4. Average worldwide annual rate of reduction of under-five mortality rate and maternal mortality rate were 2.2% and 1.9% as on 2011 respectively whereas rates should be minimum 4.4% and 5.5% annually to achieve targets. In spite of having proven healthcare interventions for both mothers and children, those could not be scaled up to the required volume due to fragmented health systems, especially in the developing and under-developed countries. In this research, a knowledge discovery based interactive Decision Support System (DSS) has been developed on web platform which would assist healthcare policy makers to develop evidence-based policies. To achieve desirable results in MCH, efficient resource planning is very much required. In maximum LMICs, resources are big constraint. Knowledge, generated through this system, would help healthcare managers to develop strategic resource planning for combatting with issues like huge inequity and less coverage in MCH. This system would help healthcare managers to accomplish following four tasks. Those are a) comprehending region wise conditions of variables related with MCH, b) identifying relationships within variables, c) segmenting regions based on variables status, and d) finding out segment wise key influential variables which have major impact on healthcare indicators. Whole system development process has been divided into three phases. Those were i) identifying contemporary issues related with MCH services and policy making; ii) development of the system; and iii) verification and validation of the system. More than 90 variables under three categories, such as a) educational, social, and economic parameters; b) MCH interventions; and c) health system building blocks have been included into this web-based DSS and five separate modules have been developed under the system. First module has been designed for analysing current healthcare scenario. Second module would help healthcare managers to understand correlations among variables. Third module would reveal frequently-occurring incidents along with different MCH interventions. Fourth module would segment regions based on previously mentioned three categories and in fifth module, segment-wise key influential interventions will be identified. India has been considered as case study area in this research. Data of 601 districts of India has been used for inspecting effectiveness of those developed modules. This system has been developed by importing different statistical and data mining techniques on Web platform. Policy makers would be able to generate different scenarios from the system before drawing any inference, aided by its interactive capability.

Keywords: maternal and child heathcare, decision support systems, data mining techniques, low and middle income countries

Procedia PDF Downloads 257
30119 Study for Establishing a Concept of Underground Mining in a Folded Deposit with Weathering

Authors: Chandan Pramanik, Bikramjit Chanda

Abstract:

Large metal mines operated with open-cast mining methods must transition to underground mining at the conclusion of the operation; however, this requires a period of a difficult time when production convergence due to interference between the two mining methods. A transition model with collaborative mining operations is presented and established in this work, based on the case of the South Kaliapani Underground Project, to address these technical issues of inadequate production security and other mining challenges during the transition phase and beyond. By integrating the technology of the small-scale Drift and Fill method and Highly productive Sub Level Open Stoping at deep section, this hybrid mining concept tries to eliminate major bottlenecks and offers an optimized production profile with the safe and sustainable operation. Considering every geo-mining aspect, this study offers a genuine and precise technical deliberation for the transition from open pit to underground mining.

Keywords: drift and fill, geo-mining aspect, sublevel open stoping, underground mining method

Procedia PDF Downloads 98
30118 An Efficient Data Mining Technique for Online Stores

Authors: Mohammed Al-Shalabi, Alaa Obeidat

Abstract:

In any food stores, some items will be expired or destroyed because the demand on these items is infrequent, so we need a system that can help the decision maker to make an offer on such items to improve the demand on the items by putting them with some other frequent item and decrease the price to avoid losses. The system generates hundreds or thousands of patterns (offers) for each low demand item, then it uses the association rules (support, confidence) to find the interesting patterns (the best offer to achieve the lowest losses). In this paper, we propose a data mining method for determining the best offer by merging the data mining techniques with the e-commerce strategy. The task is to build a model to predict the best offer. The goal is to maximize the profits of a store and avoid the loss of products. The idea in this paper is the using of the association rules in marketing with a combination with e-commerce.

Keywords: data mining, association rules, confidence, online stores

Procedia PDF Downloads 410
30117 The Environmental and Socio Economic Impacts of Mining on Local Livelihood in Cameroon: A Case Study in Bertoua

Authors: Fongang Robert Tichuck

Abstract:

This paper reports the findings of a study undertaken to assess the socio-economic and environmental impacts of mining in Bertoua Eastern Region of Cameroon. In addition to sampling community perceptions of mining activities, the study prescribes interventions that can assist in mitigating the negative impacts of mining. Marked environmental and interrelated socio-economic improvements can be achieved within regional artisanal gold mines if the government provides technical support to local operators, regulations are improved, and illegal mining activity is reduced.

Keywords: gold mining, socio-economic, mining activities, local people

Procedia PDF Downloads 393
30116 Defining Processes of Gender Restructuring: The Case of Displaced Tribal Communities of North East India

Authors: Bitopi Dutta

Abstract:

Development Induced Displacement (DID) of subaltern groups has been an issue of intense debate in India. This research will do a gender analysis of displacement induced by the mining projects in tribal indigenous societies of North East India, centering on the primary research question which is 'How does DID reorder gendered relationship in tribal matrilineal societies?' This paper will not focus primarily on the impacts of the displacement induced by coal mining on indigenous tribal women in the North East India; it will rather study 'what' are the processes that lead to these transformations and 'how' do they operate. In doing so, the paper will locate the cracks in traditional social systems that the discourse of displacement manipulates for its own benefit. DID in this sense will not only be understood as only physical displacement, but also as social and cultural displacement. The study will cover one matrilineal tribe in the state of Meghalaya in the North East India affected by several coal mining projects in the last 30 years. In-depth unstructured interviews used to collect life narratives will be the primary mode of data collection because the indigenous culture of the tribes in Meghalaya, including the matrilineal tribes, is based on oral history where knowledge and experiences produced under a tradition of oral history exist in a continuum. This is unlike modern societies which produce knowledge in a compartmentalized system. An interview guide designed around specific themes will be used rather than specific questions to ensure the flow of narratives from the interviewee. In addition to this, a number of focus groups will be held. The data collected through the life narrative will be supplemented and contextualized through documentary research using government data, and local media sources of the region.

Keywords: displacement, gender-relations, matriliny, mining

Procedia PDF Downloads 194
30115 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 91
30114 Reduction of Plants Biodiversity in Hyrcanian Forest by Coal Mining Activities

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

Considering that coal mining is one of the important industrial activities, it may cause damages to environment. According to the author’s best knowledge, the effect of traditional coal mining activities on plant biodiversity has not been investigated in the Hyrcanian forests. Therefore, in this study, the effect of coal mining activities on vegetation and tree diversity was investigated in Hyrcanian forest, North Iran. After filed visiting and determining the mine, 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity, and it is considered as the control area. In each plot, the data about trees such as number and type of species were recorded. The biodiversity of vegetation cover was considered 5 square sub-plots (1 m2) in each plot. PAST software and Ecological Methodology were used to calculate Biodiversity indices. The value of Shannon Wiener and Simpson diversity indices for tree cover in control area (1.04±0.34 and 0.62±0.20) was significantly higher than mining area (0.78±0.27 and 0.45±0.14). The value of evenness indices for tree cover in the mining area was significantly lower than that of the control area. The value of Shannon Wiener and Simpson diversity indices for vegetation cover in the control area (1.37±0.06 and 0.69±0.02) was significantly higher than the mining area (1.02±0.13 and 0.50±0.07). The value of evenness index in the control area was significantly higher than the mining area. Plant communities are a good indicator of the changes in the site. Study about changes in vegetation biodiversity and plant dynamics in the degraded land can provide necessary information for forest management and reforestation of these areas.

Keywords: vegetation biodiversity, species composition, traditional coal mining, Caspian forest

Procedia PDF Downloads 182
30113 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: data management, digitization, industry 4.0, knowledge engineering, metamodel

Procedia PDF Downloads 355
30112 The Parallelization of Algorithm Based on Partition Principle for Association Rules Discovery

Authors: Khadidja Belbachir, Hafida Belbachir

Abstract:

subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations’ rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.

Keywords: association rules, distributed data mining, partition, parallel algorithms

Procedia PDF Downloads 414
30111 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 149