Search results for: data security in genomics
7504 Incremental Learning of Independent Topic Analysis
Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda
Abstract:
In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.Keywords: Text mining, topic extraction, independent, incremental, independent component analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10597503 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data
Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala
Abstract:
Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.
Keywords: Databases, data mining, multi-agent, spatial datamart.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20457502 Latent Topic Based Medical Data Classification
Authors: Jian-hua Yeh, Shi-yi Kuo
Abstract:
This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.
Keywords: classification, latent topics, outlier adjustment, feature scaling
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16427501 Data Collection in Hospital Emergencies: A Questionnaire Survey
Authors: Nouha Mhimdi, Wahiba Ben Abdessalem Karaa, Henda Ben Ghezala
Abstract:
Many methods are used to collect data like questionnaires, surveys, focus group interviews. Or the collection of poor-quality data resulting, for example, from poorly designed questionnaires, the absence of good translators or interpreters, and the incorrect recording of data allow conclusions to be drawn that are not supported by the data or to focus only on the average effect of the program or policy. There are several solutions to avoid or minimize the most frequent errors, including obtaining expert advice on the design or adaptation of data collection instruments; or use technologies allowing better "anonymity" in the responses. In this context, and to overcome the aforementioned problems, we suggest in this paper an approach to achieve the collection of relevant data, by carrying out a large-scale questionnaire-based survey. We have been able to collect good quality, consistent and practical data on hospital emergencies to improve emergency services in hospitals, especially in the case of epidemics or pandemics.
Keywords: Data collection, survey, database, data analysis, hospital emergencies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6677500 Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data
Authors: J. D. D. Daniel, K. N. Goh, S. M. Yusop
Abstract:
Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.Keywords: Data Transformation Services (DTS), ObjectLinking and Embedding Database (OLEDB), Data Mart, OnlineAnalytical Processing (OLAP), Online Transactional Processing(OLTP).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20387499 Extraction of Data from Web Pages: A Vision Based Approach
Authors: P. S. Hiremath, Siddu P. Algur
Abstract:
With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.
Keywords: Web data records, web data regions, web mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19017498 Visual-Graphical Methods for Exploring Longitudinal Data
Authors: H. W. Ker
Abstract:
Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20957497 A Materialized Approach to the Integration of XML Documents: the OSIX System
Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet
Abstract:
The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.Keywords: Data integration, semi-structured data, views, XML.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15907496 Design of an Ensemble Learning Behavior Anomaly Detection Framework
Authors: Abdoulaye Diop, Nahid Emad, Thierry Winter, Mohamed Hilia
Abstract:
Data assets protection is a crucial issue in the cybersecurity field. Companies use logical access control tools to vault their information assets and protect them against external threats, but they lack solutions to counter insider threats. Nowadays, insider threats are the most significant concern of security analysts. They are mainly individuals with legitimate access to companies information systems, which use their rights with malicious intents. In several fields, behavior anomaly detection is the method used by cyber specialists to counter the threats of user malicious activities effectively. In this paper, we present the step toward the construction of a user and entity behavior analysis framework by proposing a behavior anomaly detection model. This model combines machine learning classification techniques and graph-based methods, relying on linear algebra and parallel computing techniques. We show the utility of an ensemble learning approach in this context. We present some detection methods tests results on an representative access control dataset. The use of some explored classifiers gives results up to 99% of accuracy.Keywords: Cybersecurity, data protection, access control, insider threat, user behavior analysis, ensemble learning, high performance computing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11527495 Digital Cinema Watermarking State of Art and Comparison
Authors: H. Kelkoul, Y. Zaz
Abstract:
Nowadays, the vigorous popularity of video processing techniques has resulted in an explosive growth of multimedia data illegal use. So, watermarking security has received much more attention. The purpose of this paper is to explore some watermarking techniques in order to observe their specificities and select the finest methods to apply in digital cinema domain against movie piracy by creating an invisible watermark that includes the date, time and the place where the hacking was done. We have studied three principal watermarking techniques in the frequency domain: Spread spectrum, Wavelet transform domain and finally the digital cinema watermarking transform domain. In this paper, a detailed technique is presented where embedding is performed using direct sequence spread spectrum technique in DWT transform domain. Experiment results shows that the algorithm provides high robustness and good imperceptibility.Keywords: Digital cinema, watermarking, wavelet, spread spectrum, JPEG2000.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11867494 Data-Driven Decision-Making in Digital Entrepreneurship
Authors: Abeba Nigussie Turi, Xiangming Samuel Li
Abstract:
Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.
Keywords: Startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8277493 Classifying Bio-Chip Data using an Ant Colony System Algorithm
Authors: Minsoo Lee, Yearn Jeong Kim, Yun-mi Kim, Sujeung Cheong, Sookyung Song
Abstract:
Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.Keywords: Ant Colony System, DNA chip data, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14687492 Enhanced Disk-Based Databases Towards Improved Hybrid In-Memory Systems
Authors: Samuel Kaspi, Sitalakshmi Venkatraman
Abstract:
In-memory database systems are becoming popular due to the availability and affordability of sufficiently large RAM and processors in modern high-end servers with the capacity to manage large in-memory database transactions. While fast and reliable inmemory systems are still being developed to overcome cache misses, CPU/IO bottlenecks and distributed transaction costs, disk-based data stores still serve as the primary persistence. In addition, with the recent growth in multi-tenancy cloud applications and associated security concerns, many organisations consider the trade-offs and continue to require fast and reliable transaction processing of diskbased database systems as an available choice. For these organizations, the only way of increasing throughput is by improving the performance of disk-based concurrency control. This warrants a hybrid database system with the ability to selectively apply an enhanced disk-based data management within the context of inmemory systems that would help improve overall throughput. The general view is that in-memory systems substantially outperform disk-based systems. We question this assumption and examine how a modified variation of access invariance that we call enhanced memory access, (EMA) can be used to allow very high levels of concurrency in the pre-fetching of data in disk-based systems. We demonstrate how this prefetching in disk-based systems can yield close to in-memory performance, which paves the way for improved hybrid database systems. This paper proposes a novel EMA technique and presents a comparative study between disk-based EMA systems and in-memory systems running on hardware configurations of equivalent power in terms of the number of processors and their speeds. The results of the experiments conducted clearly substantiate that when used in conjunction with all concurrency control mechanisms, EMA can increase the throughput of disk-based systems to levels quite close to those achieved by in-memory system. The promising results of this work show that enhanced disk-based systems facilitate in improving hybrid data management within the broader context of in-memory systems.
Keywords: Concurrency control, disk-based databases, inmemory systems, enhanced memory access (EMA).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20387491 Supergrid Modeling and Operation and Control of Multi Terminal DC Grids for the Deployment of a Meshed HVDC Grid in South Asia
Authors: Farhan Beg, Raymond Moberly
Abstract:
The Indian subcontinent is facing a massive challenge with regards to energy security in its member countries; to provide reliable electricity to facilitate development across various sectors of the economy and consequently achieve the developmental targets. The instability of the current precarious situation is observable in the frequent system failures and blackouts.
The deployment of interconnected electricity ‘Supergrid’ designed to carry huge quanta of power across the Indian sub-continent is proposed in this paper. Not only enabling energy security in the subcontinent it will also provide a platform for Renewable Energy Sources (RES) integration. This paper assesses the need and conditions for a Supergrid deployment and consequently proposes a meshed topology based on Voltage Source High Voltage Direct Current (VSC- HVDC) converters for the Supergrid modeling. Various control schemes for the control of voltage and power are utilized for the regulation of the network parameters. A 3 terminal Multi Terminal Direct Current (MTDC) network is used for the simulations.
Keywords: Super grid, Wind and Solar energy, High Voltage Direct Current, Electricity management, Load Flow Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28117490 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance
Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat
Abstract:
Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18327489 Towards Development of Solution for Business Process-Oriented Data Analysis
Authors: M. Klimavicius
Abstract:
This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.Keywords: Data warehouse, data analysis, business processmanagement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13937488 Preliminary Overview of Data Mining Technology for Knowledge Management System in Institutions of Higher Learning
Authors: Muslihah Wook, Zawiyah M. Yusof, Mohd Zakree Ahmad Nazri
Abstract:
Data mining has been integrated into application systems to enhance the quality of the decision-making process. This study aims to focus on the integration of data mining technology and Knowledge Management System (KMS), due to the ability of data mining technology to create useful knowledge from large volumes of data. Meanwhile, KMS vitally support the creation and use of knowledge. The integration of data mining technology and KMS are popularly used in business for enhancing and sustaining organizational performance. However, there is a lack of studies that applied data mining technology and KMS in the education sector; particularly students- academic performance since this could reflect the IHL performance. Realizing its importance, this study seeks to integrate data mining technology and KMS to promote an effective management of knowledge within IHLs. Several concepts from literature are adapted, for proposing the new integrative data mining technology and KMS framework to an IHL.
Keywords: Data mining, Institutions of Higher Learning, Knowledge Management System, Students' academic performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21427487 Thailand National Biodiversity Database System with webMathematica and Google Earth
Authors: W. Katsarapong, W. Srisang, K. Jaroensutasinee, M. Jaroensutasinee
Abstract:
National Biodiversity Database System (NBIDS) has been developed for collecting Thai biodiversity data. The goal of this project is to provide advanced tools for querying, analyzing, modeling, and visualizing patterns of species distribution for researchers and scientists. NBIDS data record two types of datasets: biodiversity data and environmental data. Biodiversity data are specie presence data and species status. The attributes of biodiversity data can be further classified into two groups: universal and projectspecific attributes. Universal attributes are attributes that are common to all of the records, e.g. X/Y coordinates, year, and collector name. Project-specific attributes are attributes that are unique to one or a few projects, e.g., flowering stage. Environmental data include atmospheric data, hydrology data, soil data, and land cover data collecting by using GLOBE protocols. We have developed webbased tools for data entry. Google Earth KML and ArcGIS were used as tools for map visualization. webMathematica was used for simple data visualization and also for advanced data analysis and visualization, e.g., spatial interpolation, and statistical analysis. NBIDS will be used by park rangers at Khao Nan National Park, and researchers.Keywords: GLOBE protocol, Biodiversity, Database System, ArcGIS, Google Earth and webMathematica.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19847486 Evaluation of Clustering Based on Preprocessing in Gene Expression Data
Authors: Seo Young Kim, Toshimitsu Hamasaki
Abstract:
Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.
Keywords: Gene expression, clustering, data preprocessing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17407485 Identity Management in Virtual Worlds Based on Biometrics Watermarking
Authors: S. Bader, N. Essoukri Ben Amara
Abstract:
With the technological development and rise of virtual worlds, these spaces are becoming more and more attractive for cybercriminals, hidden behind avatars and fictitious identities. Since access to these spaces is not restricted or controlled, some impostors take advantage of gaining unauthorized access and practicing cyber criminality. This paper proposes an identity management approach for securing access to virtual worlds. The major purpose of the suggested solution is to install a strong security mechanism to protect virtual identities represented by avatars. Thus, only legitimate users, through their corresponding avatars, are allowed to access the platform resources. Access is controlled by integrating an authentication process based on biometrics. In the request process for registration, a user fingerprint is enrolled and then encrypted into a watermark utilizing a cancelable and non-invertible algorithm for its protection. After a user personalizes their representative character, the biometric mark is embedded into the avatar through a watermarking procedure. The authenticity of the avatar identity is verified when it requests authorization for access. We have evaluated the proposed approach on a dataset of avatars from various virtual worlds, and we have registered promising performance results in terms of authentication accuracy, acceptation and rejection rates.Keywords: Identity management, security, biometrics authentication and authorization, avatar, virtual world.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16577484 Transformer Diagnosis Based on Coupled Circuits Method Modelling
Authors: Labar Hocine, Rekik Badri, Bounaya Kamel, Kelaiaia Mounia Samira
Abstract:
Diagnostic goal of transformers in service is to detect the winding or the core in fault. Transformers are valuable equipment which makes a major contribution to the supply security of a power system. Consequently, it is of great importance to minimize the frequency and duration of unwanted outages of power transformers. So, Frequency Response Analysis (FRA) is found to be a useful tool for reliable detection of incipient mechanical fault in a transformer, by finding winding or core defects. The authors propose as first part of this article, the coupled circuits method, because, it gives most possible exhaustive modelling of transformers. And as second part of this work, the application of FRA in low frequency in order to improve and simplify the response reading. This study can be useful as a base data for the other transformers of the same categories intended for distribution grid.
Keywords: Diagnostic, Coupled Circuit Method, FRA, Transformer Faults
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15217483 Formal Analysis of a Public-Key Algorithm
Authors: Markus Kaiser, Johannes Buchmann
Abstract:
In this article, a formal specification and verification of the Rabin public-key scheme in a formal proof system is presented. The idea is to use the two views of cryptographic verification: the computational approach relying on the vocabulary of probability theory and complexity theory and the formal approach based on ideas and techniques from logic and programming languages. A major objective of this article is the presentation of the first computer-proved implementation of the Rabin public-key scheme in Isabelle/HOL. Moreover, we explicate a (computer-proven) formalization of correctness as well as a computer verification of security properties using a straight-forward computation model in Isabelle/HOL. The analysis uses a given database to prove formal properties of our implemented functions with computer support. The main task in designing a practical formalization of correctness as well as efficient computer proofs of security properties is to cope with the complexity of cryptographic proving. We reduce this complexity by exploring a light-weight formalization that enables both appropriate formal definitions as well as efficient formal proofs. Consequently, we get reliable proofs with a minimal error rate augmenting the used database, what provides a formal basis for more computer proof constructions in this area.
Keywords: public-key encryption, Rabin public-key scheme, formalproof system, higher-order logic, formal verification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15377482 A Network Traffic Prediction Algorithm Based On Data Mining Technique
Authors: D. Prangchumpol
Abstract:
This paper is a description approach to predict incoming and outgoing data rate in network system by using association rule discover, which is one of the data mining techniques. Information of incoming and outgoing data in each times and network bandwidth are network performance parameters, which needed to solve in the traffic problem. Since congestion and data loss are important network problems. The result of this technique can predicted future network traffic. In addition, this research is useful for network routing selection and network performance improvement.
Keywords: Traffic prediction, association rule, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36697481 Fuzzy Processing of Uncertain Data
Authors: Petr Morávek, Miloš Šeda
Abstract:
In practice, we often come across situations where it is necessary to make decisions based on incomplete or uncertain data. In control systems it may be due to the unknown exact mathematical model, or its excessive complexity (e.g. nonlinearity) when it is necessary to simplify it, respectively, to solve it using a rule base. In the case of databases, searching data we compare a similarity measure with of the requirements of the selection with stored data, where both the select query and the data itself may contain vague terms, for example in the form of linguistic qualifiers. In this paper, we focus on the processing of uncertain data in databases and demonstrate it on the example multi-criteria decision making in the selection of variants, specified by higher number of technical parameters.Keywords: fuzzy logic, linguistic variable, multicriteria decision
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14187480 Automated Stereophotogrammetry Data Cleansing
Authors: Stuart Henry, Philip Morrow, John Winder, Bryan Scotney
Abstract:
The stereophotogrammetry modality is gaining more widespread use in the clinical setting. Registration and visualization of this data, in conjunction with conventional 3D volumetric image modalities, provides virtual human data with textured soft tissue and internal anatomical and structural information. In this investigation computed tomography (CT) and stereophotogrammetry data is acquired from 4 anatomical phantoms and registered using the trimmed iterative closest point (TrICP) algorithm. This paper fully addresses the issue of imaging artifacts around the stereophotogrammetry surface edge using the registered CT data as a reference. Several iterative algorithms are implemented to automatically identify and remove stereophotogrammetry surface edge outliers, improving the overall visualization of the combined stereophotogrammetry and CT data. This paper shows that outliers at the surface edge of stereophotogrammetry data can be successfully removed automatically.
Keywords: Data cleansing, stereophotogrammetry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18427479 A Survey on Voice over IP over Wireless LANs
Authors: Haniyeh Kazemitabar, Sameha Ahmed, Kashif Nisar, Abas B Said, Halabi B Hasbullah
Abstract:
Voice over Internet Protocol (VoIP) is a form of voice communication that uses audio data to transmit voice signals to the end user. VoIP is one of the most important technologies in the World of communication. Around, 20 years of research on VoIP, some problems of VoIP are still remaining. During the past decade and with growing of wireless technologies, we have seen that many papers turn their concentration from Wired-LAN to Wireless-LAN. VoIP over Wireless LAN (WLAN) faces many challenges due to the loose nature of wireless network. Issues like providing Quality of Service (QoS) at a good level, dedicating capacity for calls and having secure calls is more difficult rather than wired LAN. Therefore VoIP over WLAN (VoWLAN) remains a challenging research topic. In this paper we consolidate and address major VoWLAN issues. This research is helpful for those researchers wants to do research in Voice over IP technology over WLAN network.Keywords: Capacity, QoS, Security, VoIP Issues, WLAN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22457478 An Improved Data Mining Method Applied to the Search of Relationship between Metabolic Syndrome and Lifestyles
Authors: Yi Chao Huang, Yu Ling Liao, Chiu Shuang Lin
Abstract:
A data cutting and sorting method (DCSM) is proposed to optimize the performance of data mining. DCSM reduces the calculation time by getting rid of redundant data during the data mining process. In addition, DCSM minimizes the computational units by splitting the database and by sorting data with support counts. In the process of searching for the relationship between metabolic syndrome and lifestyles with the health examination database of an electronics manufacturing company, DCSM demonstrates higher search efficiency than the traditional Apriori algorithm in tests with different support counts.Keywords: Data mining, Data cutting and sorting method, Apriori algorithm, Metabolic syndrome
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15887477 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems
Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan
Abstract:
Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17367476 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.
Keywords: Apriori, Association rules mining, Big Data, data mining, Hadoop, Map Reduce, MongoDB, NoSQL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6947475 Identifying Critical Success Factors for Data Quality Management through a Delphi Study
Authors: Maria Paula Santos, Ana Lucas
Abstract:
Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.
Keywords: Critical success factors, data quality, data quality management, Delphi, Q-Sort.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1109