Search results for: distributed data stream mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26151

Search results for: distributed data stream mining

25791 Merging of Results in Distributed Information Retrieval Systems

Authors: Larbi Guezouli, Imane Azzouz

Abstract:

This work is located in the domain of distributed information retrieval ‘DIR’. A simplified view of the DIR requires a multi-search in a set of collections, which forces the system to analyze results found in these collections, and merge results back before sending them to the user in a single list. Our work is to find a fusion method based on the relevance score of each result received from collections and the relevance of the local search engine of each collection.

Keywords: information retrieval, distributed IR systems, merging results, datamining

Procedia PDF Downloads 305
25790 Flexible Mixed Model Assembly Line Design: A Strategy to Respond for Demand Uncertainty at Automotive Part Manufacturer in Indonesia

Authors: T. Yuri, M. Zagloel, Inaki M. Hakim, Tegu Bintang Nugraha

Abstract:

In an era of customer centricity, automotive parts manufacturer in Indonesia must be able to keep up with the uncertainty and fluctuation of consumer demand. Flexible Manufacturing System (FMS) is a strategy to react to predicted and unpredicted changes of demand in automotive industry. This research is about flexible mixed model assembly line design through Value Stream Mapping (VSM) and Line Balancing in mixed model assembly line prior to simulation. It uses value stream mapping to identify and reduce waste while finding the best position to add or reduce manpower. Line balancing is conducted to minimize or maximize production rate while increasing assembly line productivity and efficiency. Results of this research is a recommendation of standard work combination for specifics demand scenario which can enhance assembly line efficiency and productivity.

Keywords: automotive industry, demand uncertainty, flexible assembly system, line balancing, value stream mapping

Procedia PDF Downloads 304
25789 A Recommender System Fusing Collaborative Filtering and User’s Review Mining

Authors: Seulbi Choi, Hyunchul Ahn

Abstract:

Collaborative filtering (CF) algorithm has been popularly used for recommender systems in both academic and practical applications. It basically generates recommendation results using users’ numeric ratings. However, the additional use of the information other than user ratings may lead to better accuracy of CF. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's review can be regarded as the new informative source for identifying user's preference with accuracy. Under this background, this study presents a hybrid recommender system that fuses CF and user's review mining. Our system adopts conventional memory-based CF, but it is designed to use both user’s numeric ratings and his/her text reviews on the items when calculating similarities between users.

Keywords: Recommender system, Collaborative filtering, Text mining, Review mining

Procedia PDF Downloads 307
25788 Integrating of Multi-Criteria Decision Making and Spatial Data Warehouse in Geographic Information System

Authors: Zohra Mekranfar, Ahmed Saidi, Abdellah Mebrek

Abstract:

This work aims to develop multi-criteria decision making (MCDM) and spatial data warehouse (SDW) methods, which will be integrated into a GIS according to a ‘GIS dominant’ approach. The GIS operating tools will be operational to operate the SDW. The MCDM methods can provide many solutions to a set of problems with various and multiple criteria. When the problem is so complex, integrating spatial dimension, it makes sense to combine the MCDM process with other approaches like data mining, ascending analyses, we present in this paper an experiment showing a geo-decisional methodology of SWD construction, On-line analytical processing (OLAP) technology which combines both basic multidimensional analysis and the concepts of data mining provides powerful tools to highlight inductions and information not obvious by traditional tools. However, these OLAP tools become more complex in the presence of the spatial dimension. The integration of OLAP with a GIS is the future geographic and spatial information solution. GIS offers advanced functions for the acquisition, storage, analysis, and display of geographic information. However, their effectiveness for complex spatial analysis is questionable due to their determinism and their decisional rigor. A prerequisite for the implementation of any analysis or exploration of spatial data requires the construction and structuring of a spatial data warehouse (SDW). This SDW must be easily usable by the GIS and by the tools offered by an OLAP system.

Keywords: data warehouse, GIS, MCDM, SOLAP

Procedia PDF Downloads 146
25787 Federated Learning in Healthcare

Authors: Ananya Gangavarapu

Abstract:

Convolutional Neural Networks (CNN) based models are providing diagnostic capabilities on par with the medical specialists in many specialty areas. However, collecting the medical data for training purposes is very challenging because of the increased regulations around data collections and privacy concerns around personal health data. The gathering of the data becomes even more difficult if the capture devices are edge-based mobile devices (like smartphones) with feeble wireless connectivity in rural/remote areas. In this paper, I would like to highlight Federated Learning approach to mitigate data privacy and security issues.

Keywords: deep learning in healthcare, data privacy, federated learning, training in distributed environment

Procedia PDF Downloads 113
25786 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 349
25785 A Numerical Study on the Influence of CO2 Dilution on Combustion Characteristics of a Turbulent Diffusion Flame

Authors: Yasaman Tohidi, Rouzbeh Riazi, Shidvash Vakilipour, Masoud Mohammadi

Abstract:

The objective of the present study is to numerically investigate the effect of CO2 replacement of N2 in air stream on the flame characteristics of the CH4 turbulent diffusion flame. The Open source Field Operation and Manipulation (OpenFOAM) has been used as the computational tool. In this regard, laminar flamelet and modified k-ε models have been utilized as combustion and turbulence models, respectively. Results reveal that the presence of CO2 in air stream changes the flame shape and maximum flame temperature. Also, CO2 dilution causes an increment in CO mass fraction.

Keywords: CH4 diffusion flame, CO2 dilution, OpenFOAM, turbulent flame

Procedia PDF Downloads 243
25784 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization

Procedia PDF Downloads 161
25783 Detection of Atrial Fibrillation Using Wearables via Attentional Two-Stream Heterogeneous Networks

Authors: Huawei Bai, Jianguo Yao, Fellow, IEEE

Abstract:

Atrial fibrillation (AF) is the most common form of heart arrhythmia and is closely associated with mortality and morbidity in heart failure, stroke, and coronary artery disease. The development of single spot optical sensors enables widespread photoplethysmography (PPG) screening, especially for AF, since it represents a more convenient and noninvasive approach. To our knowledge, most existing studies based on public and unbalanced datasets can barely handle the multiple noises sources in the real world and, also, lack interpretability. In this paper, we construct a large- scale PPG dataset using measurements collected from PPG wrist- watch devices worn by volunteers and propose an attention-based two-stream heterogeneous neural network (TSHNN). The first stream is a hybrid neural network consisting of a three-layer one-dimensional convolutional neural network (1D-CNN) and two-layer attention- based bidirectional long short-term memory (Bi-LSTM) network to learn representations from temporally sampled signals. The second stream extracts latent representations from the PPG time-frequency spectrogram using a five-layer CNN. The outputs from both streams are fed into a fusion layer for the outcome. Visualization of the attention weights learned demonstrates the effectiveness of the attention mechanism against noise. The experimental results show that the TSHNN outperforms all the competitive baseline approaches and with 98.09% accuracy, achieves state-of-the-art performance.

Keywords: PPG wearables, atrial fibrillation, feature fusion, attention mechanism, hyber network

Procedia PDF Downloads 85
25782 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 232
25781 Screen Method of Distributed Cooperative Navigation Factors for Unmanned Aerial Vehicle Swarm

Authors: Can Zhang, Qun Li, Yonglin Lei, Zhi Zhu, Dong Guo

Abstract:

Aiming at the problem of factor screen in distributed collaborative navigation of dense UAV swarm, an efficient distributed collaborative navigation factor screen method is proposed. The method considered the balance between computing load and positioning accuracy. The proposed algorithm utilized the factor graph model to implement a distributed collaborative navigation algorithm. The GNSS information of the UAV itself and the ranging information between the UAVs are used as the positioning factors. In this distributed scheme, a local factor graph is established for each UAV. The positioning factors of nodes with good geometric position distribution and small variance are selected to participate in the navigation calculation. To demonstrate and verify the proposed methods, the simulation and experiments in different scenarios are performed in this research. Simulation results show that the proposed scheme achieves a good balance between the computing load and positioning accuracy in the distributed cooperative navigation calculation of UAV swarm. This proposed algorithm has important theoretical and practical value for both industry and academic areas.

Keywords: screen method, cooperative positioning system, UAV swarm, factor graph, cooperative navigation

Procedia PDF Downloads 48
25780 Discerning Divergent Nodes in Social Networks

Authors: Mehran Asadi, Afrand Agah

Abstract:

In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.

Keywords: online social networks, data mining, social cloud computing, interaction and collaboration

Procedia PDF Downloads 116
25779 Design of Personal Job Recommendation Framework on Smartphone Platform

Authors: Chayaporn Kaensar

Abstract:

Recently, Job Recommender Systems have gained much attention in industries since they solve the problem of information overload on the recruiting website. Therefore, we proposed Extended Personalized Job System that has the capability of providing the appropriate jobs for job seeker and recommending some suitable information for them using Data Mining Techniques and Dynamic User Profile. On the other hands, company can also interact to the system for publishing and updating job information. This system have emerged and supported various platforms such as web application and android mobile application. In this paper, User profiles, Implicit User Action, User Feedback, and Clustering Techniques in WEKA libraries have gained attention and implemented for this application. In additions, open source tools like Yii Web Application Framework, Bootstrap Front End Framework and Android Mobile Technology were also applied.

Keywords: recommendation, user profile, data mining, web and mobile technology

Procedia PDF Downloads 294
25778 Defining Processes of Gender Restructuring: The Case of Displaced Tribal Communities of North East India

Authors: Bitopi Dutta

Abstract:

Development Induced Displacement (DID) of subaltern groups has been an issue of intense debate in India. This research will do a gender analysis of displacement induced by the mining projects in tribal indigenous societies of North East India, centering on the primary research question which is 'How does DID reorder gendered relationship in tribal matrilineal societies?' This paper will not focus primarily on the impacts of the displacement induced by coal mining on indigenous tribal women in the North East India; it will rather study 'what' are the processes that lead to these transformations and 'how' do they operate. In doing so, the paper will locate the cracks in traditional social systems that the discourse of displacement manipulates for its own benefit. DID in this sense will not only be understood as only physical displacement, but also as social and cultural displacement. The study will cover one matrilineal tribe in the state of Meghalaya in the North East India affected by several coal mining projects in the last 30 years. In-depth unstructured interviews used to collect life narratives will be the primary mode of data collection because the indigenous culture of the tribes in Meghalaya, including the matrilineal tribes, is based on oral history where knowledge and experiences produced under a tradition of oral history exist in a continuum. This is unlike modern societies which produce knowledge in a compartmentalized system. An interview guide designed around specific themes will be used rather than specific questions to ensure the flow of narratives from the interviewee. In addition to this, a number of focus groups will be held. The data collected through the life narrative will be supplemented and contextualized through documentary research using government data, and local media sources of the region.

Keywords: displacement, gender-relations, matriliny, mining

Procedia PDF Downloads 165
25777 Acceptance of Big Data Technologies and Its Influence towards Employee’s Perception on Job Performance

Authors: Jia Yi Yap, Angela S. H. Lee

Abstract:

With the use of big data technologies, organization can get result that they are interested in. Big data technologies simply load all the data that is useful for the organizations and provide organizations a better way of analysing data. The purpose of this research is to get employees’ opinion from films in Malaysia to explore the use of big data technologies in their organization in order to provide how it may affect the perception of the employees on job performance. Therefore, in order to identify will accepting big data technologies in the organization affect the perception of the employee, questionnaire will be distributed to different employee from different Small and medium-sized enterprises (SME) organization listed in Malaysia. The conceptual model proposed will test with other variables in order to see the relationship between variables.

Keywords: big data technologies, employee, job performance, questionnaire

Procedia PDF Downloads 266
25776 The Significance of Picture Mining in the Fashion and Design as a New Research Method

Authors: Katsue Edo, Yu Hiroi

Abstract:

T Increasing attention has been paid to using pictures and photographs in research since the beginning of the 21th century in social sciences. Meanwhile we have been studying the usefulness of Picture mining, which is one of the new ways for a these picture using researches. Picture Mining is an explorative research analysis method that takes useful information from pictures, photographs and static or moving images. It is often compared with the methods of text mining. The Picture Mining concept includes observational research in the broad sense, because it also aims to analyze moving images (Ochihara and Edo 2013). In the recent literature, studies and reports using pictures are increasing due to the environmental changes. These are identified as technological and social changes (Edo et.al. 2013). Low price digital cameras and i-phones, high information transmission speed, low costs for information transferring and high performance and resolution of the cameras of mobile phones have changed the photographing behavior of people. Consequently, there is less resistance in taking and processing photographs for most of the people in the developing countries. In these studies, this method of collecting data from respondents is often called as ‘participant-generated photography’ or ‘respondent-generated visual imagery’, which focuses on the collection of data and its analysis (Pauwels 2011, Snyder 2012). But there are few systematical and conceptual studies that supports it significance of these methods. We have discussed in the recent years to conceptualize these picture using research methods and formalize theoretical findings (Edo et. al. 2014). We have identified the most efficient fields of Picture mining in the following areas inductively and in case studies; 1) Research in Consumer and Customer Lifestyles. 2) New Product Development. 3) Research in Fashion and Design. Though we have found that it will be useful in these fields and areas, we must verify these assumptions. In this study we will focus on the field of fashion and design, to determine whether picture mining methods are really reliable in this area. In order to do so we have conducted an empirical research of the respondents’ attitudes and behavior concerning pictures and photographs. We compared the attitudes and behavior of pictures toward fashion to meals, and found out that taking pictures of fashion is not as easy as taking meals and food. Respondents do not often take pictures of fashion and upload their pictures online, such as Facebook and Instagram, compared to meals and food because of the difficulty of taking them. We concluded that we should be more careful in analyzing pictures in the fashion area for there still might be some kind of bias existing even if the environment of pictures have drastically changed in these years.

Keywords: empirical research, fashion and design, Picture Mining, qualitative research

Procedia PDF Downloads 336
25775 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance

Authors: Sokkhey Phauk, Takeo Okazaki

Abstract:

The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.

Keywords: academic performance prediction system, educational data mining, dominant factors, feature selection method, prediction model, student performance

Procedia PDF Downloads 85
25774 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 177
25773 Bankruptcy Prediction Analysis on Mining Sector Companies in Indonesia

Authors: Devina Aprilia Gunawan, Tasya Aspiranti, Inugrah Ratia Pratiwi

Abstract:

This research aims to classify the mining sector companies based on Altman’s Z-score model, and providing an analysis based on the Altman’s Z-score model’s financial ratios to provide a picture about the financial condition in mining sector companies in Indonesia and their viability in the future, and to find out the partial and simultaneous impact of each of the financial ratio variables in the Altman’s Z-score model, namely (WC/TA), (RE/TA), (EBIT/TA), (MVE/TL), and (S/TA), toward the financial condition represented by the Z-score itself. Among 38 mining sector companies listed in Indonesia Stock Exchange (IDX), 28 companies are selected as research sample according to the purposive sampling criteria.The results of this research showed that during 3 years research period at 2010-2012, the amount of the companies that was predicted to be healthy in each year was less than half of the total sample companies and not even reach up to 50%. The multiple regression analysis result showed that all of the research hypotheses are accepted, which means that (WC/TA), (RE/TA), (EBIT/TA), (MVE/TL), and (S/TA), both partially and simultaneously had an impact towards company’s financial condition.

Keywords: Altman’s Z-score model, financial condition, mining companies, Indonesia

Procedia PDF Downloads 503
25772 Dido: An Automatic Code Generation and Optimization Framework for Stencil Computations on Distributed Memory Architectures

Authors: Mariem Saied, Jens Gustedt, Gilles Muller

Abstract:

We present Dido, a source-to-source auto-generation and optimization framework for multi-dimensional stencil computations. It enables a large programmer community to easily and safely implement stencil codes on distributed-memory parallel architectures with Ordered Read-Write Locks (ORWL) as an execution and communication back-end. ORWL provides inter-task synchronization for data-oriented parallel and distributed computations. It has been proven to guarantee equity, liveness, and efficiency for a wide range of applications, particularly for iterative computations. Dido consists mainly of an implicitly parallel domain-specific language (DSL) implemented as a source-level transformer. It captures domain semantics at a high level of abstraction and generates parallel stencil code that leverages all ORWL features. The generated code is well-structured and lends itself to different possible optimizations. In this paper, we enhance Dido to handle both Jacobi and Gauss-Seidel grid traversals. We integrate temporal blocking to the Dido code generator in order to reduce the communication overhead and minimize data transfers. To increase data locality and improve intra-node data reuse, we coupled the code generation technique with the polyhedral parallelizer Pluto. The accuracy and portability of the generated code are guaranteed thanks to a parametrized solution. The combination of ORWL features, the code generation pattern and the suggested optimizations, make of Dido a powerful code generation framework for stencil computations in general, and for distributed-memory architectures in particular. We present a wide range of experiments over a number of stencil benchmarks.

Keywords: stencil computations, ordered read-write locks, domain-specific language, polyhedral model, experiments

Procedia PDF Downloads 92
25771 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka

Procedia PDF Downloads 263
25770 Clustering Ethno-Informatics of Naming Village in Java Island Using Data Mining

Authors: Atje Setiawan Abdullah, Budi Nurani Ruchjana, I. Gede Nyoman Mindra Jaya, Eddy Hermawan

Abstract:

Ethnoscience is used to see the culture with a scientific perspective, which may help to understand how people develop various forms of knowledge and belief, initially focusing on the ecology and history of the contributions that have been there. One of the areas studied in ethnoscience is etno-informatics, is the application of informatics in the culture. In this study the science of informatics used is data mining, a process to automatically extract knowledge from large databases, to obtain interesting patterns in order to obtain a knowledge. While the application of culture described by naming database village on the island of Java were obtained from Geographic Indonesia Information Agency (BIG), 2014. The purpose of this study is; first, to classify the naming of the village on the island of Java based on the structure of the word naming the village, including the prefix of the word, syllable contained, and complete word. Second to classify the meaning of naming the village based on specific categories, as well as its role in the community behavioral characteristics. Third, how to visualize the naming of the village to a map location, to see the similarity of naming villages in each province. In this research we have developed two theorems, i.e theorems area as a result of research studies have collected intersection naming villages in each province on the island of Java, and the composition of the wedge theorem sets the provinces in Java is used to view the peculiarities of a location study. The methodology in this study base on the method of Knowledge Discovery in Database (KDD) on data mining, the process includes preprocessing, data mining and post processing. The results showed that the Java community prioritizes merit in running his life, always working hard to achieve a more prosperous life, and love as well as water and environmental sustainment. Naming villages in each location adjacent province has a high degree of similarity, and influence each other. Cultural similarities in the province of Central Java, East Java and West Java-Banten have a high similarity, whereas in Jakarta-Yogyakarta has a low similarity. This research resulted in the cultural character of communities within the meaning of the naming of the village on the island of Java, this character is expected to serve as a guide in the behavior of people's daily life on the island of Java.

Keywords: ethnoscience, ethno-informatics, data mining, clustering, Java island culture

Procedia PDF Downloads 252
25769 The Need of Sustainable Mining: Communities, Government and Legal Mining in Central Andes of Peru

Authors: Melissa R. Quispe-Zuniga, Daniel Callo-Concha, Christian Borgemeister, Klaus Greve

Abstract:

The Peruvian Andes have a high potential for mining, but many of the mining areas overlay with campesino community lands, being these key actors for agriculture and livestock production. Lead by economic incentives, some communities are renting their lands to mining companies for exploration or exploitation. However, a growing number of campesino communities, usually social and economically marginalized, have developed resistance, alluding consequences, such as water pollution, land-use change, insufficient economic compensation, etc. what eventually end up in Socio-Environmental Conflicts (SEC). It is hypothesized that disclosing the information on environmental pollution and enhance the involvement of communities in the decision-making process may contribute to prevent SEC. To assess whether such complains are grounded on the environmental impact of mining activities, we measured the heavy metals concentration in 24 indicative samples from rivers that run across mining exploitations and farming community lands. Samples were taken during the 2016 dry season and analyzed by inductively-coupled-plasma-atomic-emission-spectroscopy. The results were contrasted against the standards of monitoring government institutions (i.e., OEFA). Furthermore, we investigated the water/environmental complains related to mining in the neighboring 14 communities. We explored the relationship between communities and mining companies, via open-ended interviews with community authorities and non-participatory observations of community assemblies. We found that the concentrations of cadmium (0.023 mg/L), arsenic (0.562 mg/L) and copper (0.07 mg/L), surpass the national water quality standards for Andean rivers (0.00025 mg/L of cadmium, 0.15 mg/L of arsenic and 0.01 mg/L of copper). 57% of communities have posed environmental complains, but 21% of the total number of communities were receiving an annual economic benefit from mining projects. However, 87.5% of the communities who had posed complains have high concentration of heavy metals in their water streams. The evidence shows that mining activities tend to relate to the affectation and vulnerability of campesino community water streams, what justify the environmental complains and eventually the occurrence of a SEC.

Keywords: mining companies, campesino community, water, socio-environmental conflict

Procedia PDF Downloads 171
25768 The Effective Use of the Network in the Distributed Storage

Authors: Mamouni Mohammed Dhiya Eddine

Abstract:

This work aims at studying the exploitation of high-speed networks of clusters for distributed storage. Parallel applications running on clusters require both high-performance communications between nodes and efficient access to the storage system. Many studies on network technologies led to the design of dedicated architectures for clusters with very fast communications between computing nodes. Efficient distributed storage in clusters has been essentially developed by adding parallelization mechanisms so that the server(s) may sustain an increased workload. In this work, we propose to improve the performance of distributed storage systems in clusters by efficiently using the underlying high-performance network to access distant storage systems. The main question we are addressing is: do high-speed networks of clusters fit the requirements of a transparent, efficient and high-performance access to remote storage? We show that storage requirements are very different from those of parallel computation. High-speed networks of clusters were designed to optimize communications between different nodes of a parallel application. We study their utilization in a very different context, storage in clusters, where client-server models are generally used to access remote storage (for instance NFS, PVFS or LUSTRE). Our experimental study based on the usage of the GM programming interface of MYRINET high-speed networks for distributed storage raised several interesting problems. Firstly, the specific memory utilization in the storage access system layers does not easily fit the traditional memory model of high-speed networks. Secondly, client-server models that are used for distributed storage have specific requirements on message control and event processing, which are not handled by existing interfaces. We propose different solutions to solve communication control problems at the filesystem level. We show that a modification of the network programming interface is required. Data transfer issues need an adaptation of the operating system. We detail several propositions for network programming interfaces which make their utilization easier in the context of distributed storage. The integration of a flexible processing of data transfer in the new programming interface MYRINET/MX is finally presented. Performance evaluations show that its usage in the context of both storage and other types of applications is easy and efficient.

Keywords: distributed storage, remote file access, cluster, high-speed network, MYRINET, zero-copy, memory registration, communication control, event notification, application programming interface

Procedia PDF Downloads 194
25767 Numerical Investigation of a Slightly Oblique Round Jet Flowing into a Uniform Counterflow Stream

Authors: Amani Amamou, Sabra Habli, Nejla Mahjoub Saïd, Philippe Bournot, Georges Le Palec

Abstract:

A counterflowing jet is a particular configuration of turbulent jets issuing into a moving ambient which has not carried much attention in literature compared with jet in a coflow or in a crossflow. This is due to the marked instability of the jet in a counterflow coupled with experimental and theoretical difficulties related to the flow inversion phenomenon. Nevertheless, jets in a counterflow are encountered in many engineering applications which required enhanced mixing as combustion, process and environmental engineering. In this work, we propose to investigate a round turbulent jet flowing into a uniform counterflow stream through a numerical approach. A hydrodynamic and thermal study of a slightly oblique round jets issuing into a uniform counterflow stream is carried out for different jet-to-counterflow velocity ratios ranging between 3.1 and 15. It is found that even a slight inclination of the jet in the vertical direction of the flow affects the structure and the velocity field of the counterflowing jet. In addition, the evolution of passive scalar temperature and pertinent length scales are presented at various velocity ratios, confirming that the flow is sensitive to directional perturbations.

Keywords: jet, counterflow, velocity, temperature, jet inclination

Procedia PDF Downloads 241
25766 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: machine learning, imbalanced data, data mining, big data

Procedia PDF Downloads 103
25765 Information Management Approach in the Prediction of Acute Appendicitis

Authors: Ahmad Shahin, Walid Moudani, Ali Bekraki

Abstract:

This research aims at presenting a predictive data mining model to handle an accurate diagnosis of acute appendicitis with patients for the purpose of maximizing the health service quality, minimizing morbidity/mortality, and reducing cost. However, acute appendicitis is the most common disease which requires timely accurate diagnosis and needs surgical intervention. Although the treatment of acute appendicitis is simple and straightforward, its diagnosis is still difficult because no single sign, symptom, laboratory or image examination accurately confirms the diagnosis of acute appendicitis in all cases. This contributes in increasing morbidity and negative appendectomy. In this study, the authors propose to generate an accurate model in prediction of patients with acute appendicitis which is based, firstly, on the segmentation technique associated to ABC algorithm to segment the patients; secondly, on applying fuzzy logic to process the massive volume of heterogeneous and noisy data (age, sex, fever, white blood cell, neutrophilia, CRP, urine, ultrasound, CT, appendectomy, etc.) in order to express knowledge and analyze the relationships among data in a comprehensive manner; and thirdly, on applying dynamic programming technique to reduce the number of data attributes. The proposed model is evaluated based on a set of benchmark techniques and even on a set of benchmark classification problems of osteoporosis, diabetes and heart obtained from the UCI data and other data sources.

Keywords: healthcare management, acute appendicitis, data mining, classification, decision tree

Procedia PDF Downloads 318
25764 Consortium Blockchain-based Model for Data Management Applications in the Healthcare Sector

Authors: Teo Hao Jing, Shane Ho Ken Wae, Lee Jin Yu, Burra Venkata Durga Kumar

Abstract:

Current distributed healthcare systems face the challenge of interoperability of health data. Storing electronic health records (EHR) in local databases causes them to be fragmented. This problem is aggravated as patients visit multiple healthcare providers in their lifetime. Existing solutions are unable to solve this issue and have caused burdens to healthcare specialists and patients alike. Blockchain technology was found to be able to increase the interoperability of health data by implementing digital access rules, enabling uniformed patient identity, and providing data aggregation. Consortium blockchain was found to have high read throughputs, is more trustworthy, more secure against external disruptions and accommodates transactions without fees. Therefore, this paper proposes a blockchain-based model for data management applications. In this model, a consortium blockchain is implemented by using a delegated proof of stake (DPoS) as its consensus mechanism. This blockchain allows collaboration between users from different organizations such as hospitals and medical bureaus. Patients serve as the owner of their information, where users from other parties require authorization from the patient to view their information. Hospitals upload the hash value of patients’ generated data to the blockchain, whereas the encrypted information is stored in a distributed cloud storage.

Keywords: blockchain technology, data management applications, healthcare, interoperability, delegated proof of stake

Procedia PDF Downloads 113
25763 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline Maria Ribeiro Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). Previously we developed and proposed a novel strategy capable of detecting patterns at borehole images that may point to regions that have tension and breakout characteristics, based on segmented images. In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge data set configurations.

Keywords: image segmentation, oil well visualization, classifiers, data-mining, visual computer

Procedia PDF Downloads 272
25762 Satellite Derived Snow Cover Status and Trends in the Indus Basin Reservoir

Authors: Muhammad Tayyab Afzal, Muhammad Arslan, Mirza Muhammad Waqar

Abstract:

Snow constitutes an important component of the cryosphere, characterized by high temporal and spatial variability. Because of the contribution of snow melt to water availability, snow is an important focus for research on climate change and adaptation. MODIS satellite data have been used to identify spatial-temporal trends in snow cover in the upper Indus basin. For this research MODIS satellite 8 day composite data of medium resolution (250m) have been analysed from 2001-2005.Pixel based supervised classification have been performed and extent of snow have been calculated of all the images. Results show large variation in snow cover between years while an increasing trend from west to east is observed. Temperature data for the Upper Indus Basin (UIB) have been analysed for seasonal and annual trends over the period 2001-2005 and calibrated with the results acquired by the research. From the analysis it is concluded that there are indications that regional warming is one of the factor that is affecting the hydrology of the upper Indus basin due to accelerated glacial melting during the simulation period, stream flow in the upper Indus basin can be predicted with a high degree of accuracy. This conclusion is also supported by the research of ICIMOD in which there is an observation that the average annual precipitation over a five year period is less than the observed stream flow and supported by positive temperature trends in all seasons.

Keywords: indus basin, MODIS, remote sensing, snow cover

Procedia PDF Downloads 360