Search results for: Malicious Cyber-Physical Data Injection
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7653

Search results for: Malicious Cyber-Physical Data Injection

7353 Classifying Bio-Chip Data using an Ant Colony System Algorithm

Authors: Minsoo Lee, Yearn Jeong Kim, Yun-mi Kim, Sujeung Cheong, Sookyung Song

Abstract:

Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.

Keywords: Ant Colony System, DNA chip data, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430
7352 Study of Biocomposites Based of Poly(Lactic Acid) and Olive Husk Flour

Authors: Samra Isadounene, Amar Boukerrou, Dalila Hammiche

Abstract:

In this work, the composites were prepared with poly(lactic acid) (PLA) and olive husk flour (OHF) with different percentages (10, 20 and 30%) using extrusion method followed by injection molding. The morphological, mechanical properties and thermal behavior of composites were investigated. Tensile strength and elongation at break of composites showed a decreasing trend with increasing fiber content. On the other hand, Young modulus and storage modulus were increased. The addition of OHF resulted in a decrease in thermal stability of composites. The presence of OHF led to an increase in percentage of crystallinity (Xc) of PLA matrix.

Keywords: Biopolymers, composites, mechanical properties, poly(lactic acid).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 954
7351 Trust and Reliability for Public Sector Data

Authors: Klaus Stranacher, Vesna Krnjic, Thomas Zefferer

Abstract:

The public sector holds large amounts of data of various areas such as social affairs, economy, or tourism. Various initiatives such as Open Government Data or the EU Directive on public sector information aim to make these data available for public and private service providers. Requirements for the provision of public sector data are defined by legal and organizational frameworks. Surprisingly, the defined requirements hardly cover security aspects such as integrity or authenticity. In this paper we discuss the importance of these missing requirements and present a concept to assure the integrity and authenticity of provided data based on electronic signatures. We show that our concept is perfectly suitable for the provisioning of unaltered data. We also show that our concept can also be extended to data that needs to be anonymized before provisioning by incorporating redactable signatures. Our proposed concept enhances trust and reliability of provided public sector data.

Keywords: Trusted Public Sector Data, Integrity, Authenticity, Reliability, Redactable Signatures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1728
7350 Performance Analysis of Traffic Classification with Machine Learning

Authors: Htay Htay Yi, Zin May Aye

Abstract:

Network security is role of the ICT environment because malicious users are continually growing that realm of education, business, and then related with ICT. The network security contravention is typically described and examined centrally based on a security event management system. The firewalls, Intrusion Detection System (IDS), and Intrusion Prevention System are becoming essential to monitor or prevent of potential violations, incidents attack, and imminent threats. In this system, the firewall rules are set only for where the system policies are needed. Dataset deployed in this system are derived from the testbed environment. The traffic as in DoS and PortScan traffics are applied in the testbed with firewall and IDS implementation. The network traffics are classified as normal or attacks in the existing testbed environment based on six machine learning classification methods applied in the system. It is required to be tested to get datasets and applied for DoS and PortScan. The dataset is based on CICIDS2017 and some features have been added. This system tested 26 features from the applied dataset. The system is to reduce false positive rates and to improve accuracy in the implemented testbed design. The system also proves good performance by selecting important features and comparing existing a dataset by machine learning classifiers.

Keywords: False negative rate, intrusion detection system, machine learning methods, performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1020
7349 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat

Abstract:

Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.

Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1788
7348 Studying the Effect of Nanoclays on the Mechanical Properties of Polypropylene/Polyamide Nanocomposites

Authors: Benalia Kouini, Aicha Serier

Abstract:

Nanocomposites based on polypropylene/polyamide 66 (PP/PA66) nanoblends containing organophilic montmorillonite (OMMT) and maleic anhydride grafted polypropylene (PP-g-MAH) were prepared by melt compounding method followed by injection molding. Two different types of nanoclays were used in this work. DELLITE LVF is the untreated nanoclay and DELLITE 67G is the treated one. The morphology of the nanocomposites was studied using the XR diffraction (XRD). The results indicate that the incorporation of treated nanoclay has a significant effect on the impact strength of PP/PA66 nanocomposites. Furthermore, it was found that XRD results revealed the intercalation, exfoliation of nanaclays of nanocomposites.

Keywords: Nanoclay, nanocomposites, polypropylene, polyamide, melt processing, mechanical properties.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1233
7347 Towards Development of Solution for Business Process-Oriented Data Analysis

Authors: M. Klimavicius

Abstract:

This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.

Keywords: Data warehouse, data analysis, business processmanagement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1359
7346 Implementation of a Motion Detection System

Authors: Asif Ansari, T.C.Manjunath, C. Ardil

Abstract:

In today-s competitive environment, the security concerns have grown tremendously. In the modern world, possession is known to be 9/10-ths of the law. Hence, it is imperative for one to be able to safeguard one-s property from worldly harms such as thefts, destruction of property, people with malicious intent etc. Due to the advent of technology in the modern world, the methodologies used by thieves and robbers for stealing have been improving exponentially. Therefore, it is necessary for the surveillance techniques to also improve with the changing world. With the improvement in mass media and various forms of communication, it is now possible to monitor and control the environment to the advantage of the owners of the property. The latest technologies used in the fight against thefts and destruction are the video surveillance and monitoring. By using the technologies, it is possible to monitor and capture every inch and second of the area in interest. However, so far the technologies used are passive in nature, i.e., the monitoring systems only help in detecting the crime but do not actively participate in stopping or curbing the crime while it takes place. Therefore, we have developed a methodology to detect the motion in a video stream environment and this is an idea to ensure that the monitoring systems not only actively participate in stopping the crime, but do so while the crime is taking place. Hence, a system is used to detect any motion in a live streaming video and once motion has been detected in the live stream, the software will activate a warning system and capture the live streaming video.

Keywords: Motion, Detection, System, Video, Crime, Matlab, Surveillance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4248
7345 Natural Convection of Water-Based CuO Nanofluids in a Cylindrical Enclosure

Authors: Baha Tulu Tanju, Kamil Kahveci

Abstract:

Buoyancy driven heat transfer of nanofluids in a cylindrical enclosure used as a control unit in the subsea hydrocarbon injection wells is investigated in this study. The governing equations obtained with the Boussinesq approximation are solved using Comsol Multiphysics finite element analysis and simulation software. The base fluid is water and CuO is used as nanoparticles. Solution is obtained for nanoparticle solid volume fraction of 8% and for Rayleigh number in the range of 105-107. The results show that nanoparticle usage in the cylindrical electronic control unit has a significant effect on the flow and heat transfer.

Keywords: CuO, enclosure, nanofluid, natural convection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1993
7344 Preliminary Overview of Data Mining Technology for Knowledge Management System in Institutions of Higher Learning

Authors: Muslihah Wook, Zawiyah M. Yusof, Mohd Zakree Ahmad Nazri

Abstract:

Data mining has been integrated into application systems to enhance the quality of the decision-making process. This study aims to focus on the integration of data mining technology and Knowledge Management System (KMS), due to the ability of data mining technology to create useful knowledge from large volumes of data. Meanwhile, KMS vitally support the creation and use of knowledge. The integration of data mining technology and KMS are popularly used in business for enhancing and sustaining organizational performance. However, there is a lack of studies that applied data mining technology and KMS in the education sector; particularly students- academic performance since this could reflect the IHL performance. Realizing its importance, this study seeks to integrate data mining technology and KMS to promote an effective management of knowledge within IHLs. Several concepts from literature are adapted, for proposing the new integrative data mining technology and KMS framework to an IHL.

Keywords: Data mining, Institutions of Higher Learning, Knowledge Management System, Students' academic performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2103
7343 Thailand National Biodiversity Database System with webMathematica and Google Earth

Authors: W. Katsarapong, W. Srisang, K. Jaroensutasinee, M. Jaroensutasinee

Abstract:

National Biodiversity Database System (NBIDS) has been developed for collecting Thai biodiversity data. The goal of this project is to provide advanced tools for querying, analyzing, modeling, and visualizing patterns of species distribution for researchers and scientists. NBIDS data record two types of datasets: biodiversity data and environmental data. Biodiversity data are specie presence data and species status. The attributes of biodiversity data can be further classified into two groups: universal and projectspecific attributes. Universal attributes are attributes that are common to all of the records, e.g. X/Y coordinates, year, and collector name. Project-specific attributes are attributes that are unique to one or a few projects, e.g., flowering stage. Environmental data include atmospheric data, hydrology data, soil data, and land cover data collecting by using GLOBE protocols. We have developed webbased tools for data entry. Google Earth KML and ArcGIS were used as tools for map visualization. webMathematica was used for simple data visualization and also for advanced data analysis and visualization, e.g., spatial interpolation, and statistical analysis. NBIDS will be used by park rangers at Khao Nan National Park, and researchers.

Keywords: GLOBE protocol, Biodiversity, Database System, ArcGIS, Google Earth and webMathematica.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1934
7342 Evaluation of Clustering Based on Preprocessing in Gene Expression Data

Authors: Seo Young Kim, Toshimitsu Hamasaki

Abstract:

Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.

Keywords: Gene expression, clustering, data preprocessing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1702
7341 Addressing Data Security in the Cloud

Authors: Marinela Mircea

Abstract:

The development of information and communication technology, the increased use of the internet, as well as the effects of the recession within the last years, have lead to the increased use of cloud computing based solutions, also called on-demand solutions. These solutions offer a large number of benefits to organizations as well as challenges and risks, mainly determined by data visualization in different geographic locations on the internet. As far as the specific risks of cloud environment are concerned, data security is still considered a peak barrier in adopting cloud computing. The present study offers an approach upon ensuring the security of cloud data, oriented towards the whole data life cycle. The final part of the study focuses on the assessment of data security in the cloud, this representing the bases in determining the potential losses and the premise for subsequent improvements and continuous learning.

Keywords: cloud computing, data life cycle, data security, security assessment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2109
7340 Numerical Investigation of Two-dimensional Boundary Layer Flow Over a Moving Surface

Authors: Mahmoud Zarrini, R.N. Pralhad

Abstract:

In this chapter, we have studied Variation of velocity in incompressible fluid over a moving surface. The boundary layer equations are on a fixed or continuously moving flat plate in the same or opposite direction to the free stream with suction and injection. The boundary layer equations are transferred from partial differential equations to ordinary differential equations. Numerical solutions are obtained by using Runge-Kutta and Shooting methods. We have found numerical solution to velocity and skin friction coefficient.

Keywords: Boundary layer, continuously moving surface, shooting method, skin friction coefficient.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545
7339 Antiinflammatory and Antinociceptive of Hydro Alcoholic Tanacetum balsamita L. Extract

Authors: S. Nasri, G. H. Amin, A. Azimi

Abstract:

The use of herbs to treat disease is accompanied with the history of human life. This research is aimed to study the anti-inflammatory and antinociceptive effects of hydroalcoholic extract of aerial parts of "Tanacetum balsamita balsamita". In the experimental studies 144 male mice are used. In the inflammatory test, animals were divided into six groups: Control, positive control (receiving Dexamethason at dose of 15mg/kg), and four experimental groups receiving Tanacetum balsamita balsamita hydroalcoholic extract at doses of 25, 50, 100 and 200mg/kg. Xylene was used to induce inflammation. Formalin was used to study the nociceptive effects. Animals were divided into six groups: control group, positive control group (receiving morphine) and four experimental groups receiving Tanacetum balsamita balsamita (Tb.) hydroalcoholic extract at doses of 25, 50, 100 and 200mg/kg. I.p. injection of drugs or normal saline was performed 30 minutes before test. The data were analyzed by using one way Variance analysis and Tukey post test. Aerial parts of Tanacetum balsamita balsamita hydroalcoholic extract decreased significantly inflammatory at dose of 200mg/kg (P<0/001) and caused a significant decrease and alleviated the nociception in both first and second phases at doses of 200mg/kg (p<0/001) and 100mg/kg (P<0/05). Tanacetum balsamita balsamita extract has the anti-inflammatory and anti-nociceptive effects which seems to be related with flavonoids especially Quercetin.

Keywords: Inflammation, nociception, Hydroalcoholic extract, aerial parts of Tanacetum balsamita balsamita L.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2000
7338 A Network Traffic Prediction Algorithm Based On Data Mining Technique

Authors: D. Prangchumpol

Abstract:

This paper is a description approach to predict incoming and outgoing data rate in network system by using association rule discover, which is one of the data mining techniques. Information of incoming and outgoing data in each times and network bandwidth are network performance parameters, which needed to solve in the traffic problem. Since congestion and data loss are important network problems. The result of this technique can predicted future network traffic. In addition, this research is useful for network routing selection and network performance improvement.

Keywords: Traffic prediction, association rule, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3616
7337 Fuzzy Processing of Uncertain Data

Authors: Petr Morávek, Miloš Šeda

Abstract:

In practice, we often come across situations where it is necessary to make decisions based on incomplete or uncertain data. In control systems it may be due to the unknown exact mathematical model, or its excessive complexity (e.g. nonlinearity) when it is necessary to simplify it, respectively, to solve it using a rule base. In the case of databases, searching data we compare a similarity measure with of the requirements of the selection with stored data, where both the select query and the data itself may contain vague terms, for example in the form of linguistic qualifiers. In this paper, we focus on the processing of uncertain data in databases and demonstrate it on the example multi-criteria decision making in the selection of variants, specified by higher number of technical parameters.

Keywords: fuzzy logic, linguistic variable, multicriteria decision

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1375
7336 The Study of Ultimate Response Guideline of Kuosheng BWR/6 Nuclear Power Plant Using TRACE and SNAP

Authors: J. R. Wang, J. H. Yang, Y. Chiang, H. C. Chen, C. Shih, S. W. Chen, S. C. Chiang, T. Y. Yu

Abstract:

In this study of ultimate response guideline (URG), Kuosheng BWR/6 nuclear power plant (NPP) TRACE model was established. The reactor depressurization, low pressure water injection, and containment venting are the main actions of URG. This research focuses to evaluate the efficiency of URG under Fukushima-like conditions. Additionally, the sensitivity study of URG was also performed in this research. The analysis results of TRACE present that URG can keep the peak cladding temperature (PCT) below 1088.7 K (the failure criteria) under Fukushima-like conditions. It implied that Kuosheng NPP was at the safe situation.

Keywords: BWR, TRACE, safety analysis, URG.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1136
7335 Automated Stereophotogrammetry Data Cleansing

Authors: Stuart Henry, Philip Morrow, John Winder, Bryan Scotney

Abstract:

The stereophotogrammetry modality is gaining more widespread use in the clinical setting. Registration and visualization of this data, in conjunction with conventional 3D volumetric image modalities, provides virtual human data with textured soft tissue and internal anatomical and structural information. In this investigation computed tomography (CT) and stereophotogrammetry data is acquired from 4 anatomical phantoms and registered using the trimmed iterative closest point (TrICP) algorithm. This paper fully addresses the issue of imaging artifacts around the stereophotogrammetry surface edge using the registered CT data as a reference. Several iterative algorithms are implemented to automatically identify and remove stereophotogrammetry surface edge outliers, improving the overall visualization of the combined stereophotogrammetry and CT data. This paper shows that outliers at the surface edge of stereophotogrammetry data can be successfully removed automatically.

Keywords: Data cleansing, stereophotogrammetry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1796
7334 Malware Beaconing Detection by Mining Large-scale DNS Logs for Targeted Attack Identification

Authors: Andrii Shalaginov, Katrin Franke, Xiongwei Huang

Abstract:

One of the leading problems in Cyber Security today is the emergence of targeted attacks conducted by adversaries with access to sophisticated tools. These attacks usually steal senior level employee system privileges, in order to gain unauthorized access to confidential knowledge and valuable intellectual property. Malware used for initial compromise of the systems are sophisticated and may target zero-day vulnerabilities. In this work we utilize common behaviour of malware called ”beacon”, which implies that infected hosts communicate to Command and Control servers at regular intervals that have relatively small time variations. By analysing such beacon activity through passive network monitoring, it is possible to detect potential malware infections. So, we focus on time gaps as indicators of possible C2 activity in targeted enterprise networks. We represent DNS log files as a graph, whose vertices are destination domains and edges are timestamps. Then by using four periodicity detection algorithms for each pair of internal-external communications, we check timestamp sequences to identify the beacon activities. Finally, based on the graph structure, we infer the existence of other infected hosts and malicious domains enrolled in the attack activities.

Keywords: Malware detection, network security, targeted attack.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5992
7333 An Improved Data Mining Method Applied to the Search of Relationship between Metabolic Syndrome and Lifestyles

Authors: Yi Chao Huang, Yu Ling Liao, Chiu Shuang Lin

Abstract:

A data cutting and sorting method (DCSM) is proposed to optimize the performance of data mining. DCSM reduces the calculation time by getting rid of redundant data during the data mining process. In addition, DCSM minimizes the computational units by splitting the database and by sorting data with support counts. In the process of searching for the relationship between metabolic syndrome and lifestyles with the health examination database of an electronics manufacturing company, DCSM demonstrates higher search efficiency than the traditional Apriori algorithm in tests with different support counts.

Keywords: Data mining, Data cutting and sorting method, Apriori algorithm, Metabolic syndrome

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550
7332 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1699
7331 Characterization of an Almond Shell Composite Based on PHBH

Authors: J. Ivorra-Martinez, L. Quiles-Carrillo, J. Gomez-Caturla, T. Boronat, R. Balart

Abstract:

The utilization of almond crop by-products to obtain Poly(3-hydroxybutyrat-co-3-hydroxyhexanoat) (PHBH)-based composites was carried out by using an extrusion process followed by an injection to obtain test samples. To improve the properties of the resulting composite, the incorporation of Oligomer Lactic Acid (OLA 8) as a coupling agent and plasticizer was additionally considered. A characterization process was carried out by the measurement of mechanical properties, thermal properties, surface morphology, and water absorption ability. The use of the almond residue allows obtaining composites based on PHBH with a higher environmental interest and lower cost.

Keywords: Almond shell, PHBH, composite, polymer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 336
7330 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, data mining, Hadoop, Map Reduce, MongoDB, NoSQL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 629
7329 Identifying Critical Success Factors for Data Quality Management through a Delphi Study

Authors: Maria Paula Santos, Ana Lucas

Abstract:

Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.

Keywords: Critical success factors, data quality, data quality management, Delphi, Q-Sort.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1056
7328 Secure Data Aggregation Using Clusters in Sensor Networks

Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik

Abstract:

Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.

Keywords: Aggregation, Clustering, Query Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1698
7327 IMDC: An Image-Mapped Data Clustering Technique for Large Datasets

Authors: Faruq A. Al-Omari, Nabeel I. Al-Fayoumi

Abstract:

In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.

Keywords: Data clustering, Data mining, Image-mapping, Pattern discovery, Predictive analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456
7326 Peakwise Smoothing of Data Models using Wavelets

Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan

Abstract:

Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.

Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714
7325 A New Precautionary Method for Measurement and Improvement the Data Quality

Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi

Abstract:

the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.

Keywords: Data quality, precaution, information system, measurement, improvement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1432
7324 An Efficient Data Mining Approach on Compressed Transactions

Authors: Jia-Yu Dai, Don-Lin Yang, Jungpin Wu, Ming-Chuan Hung

Abstract:

In an era of knowledge explosion, the growth of data increases rapidly day by day. Since data storage is a limited resource, how to reduce the data space in the process becomes a challenge issue. Data compression provides a good solution which can lower the required space. Data mining has many useful applications in recent years because it can help users discover interesting knowledge in large databases. However, existing compression algorithms are not appropriate for data mining. In [1, 2], two different approaches were proposed to compress databases and then perform the data mining process. However, they all lack the ability to decompress the data to their original state and improve the data mining performance. In this research a new approach called Mining Merged Transactions with the Quantification Table (M2TQT) was proposed to solve these problems. M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate itemsets which are impossible to become frequent in order to improve the performance of mining association rules. The experiments show that M2TQT performs better than existing approaches.

Keywords: Association rule, data mining, merged transaction, quantification table.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1930