Search results for: Data mining andInformation Extraction
7231 A Materialized Approach to the Integration of XML Documents: the OSIX System
Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet
Abstract:
The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.Keywords: Data integration, semi-structured data, views, XML.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15907230 Translator Design to Model Cpp Files
Authors: Er. Satwinder Singh, Dr. K.S. Kahlon, Rakesh Kumar, Er. Gurjeet Singh
Abstract:
The most reliable and accurate description of the actual behavior of a software system is its source code. However, not all questions about the system can be answered directly by resorting to this repository of information. What the reverse engineering methodology aims at is the extraction of abstract, goal-oriented “views" of the system, able to summarize relevant properties of the computation performed by the program. While concentrating on reverse engineering we had modeled the C++ files by designing the translator.
Keywords: Translator, Modeling, UML, DYNO, ISVis, TED.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15347229 Data-Driven Decision-Making in Digital Entrepreneurship
Authors: Abeba Nigussie Turi, Xiangming Samuel Li
Abstract:
Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.
Keywords: Startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8277228 Classifying Bio-Chip Data using an Ant Colony System Algorithm
Authors: Minsoo Lee, Yearn Jeong Kim, Yun-mi Kim, Sujeung Cheong, Sookyung Song
Abstract:
Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.Keywords: Ant Colony System, DNA chip data, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14687227 Trust and Reliability for Public Sector Data
Authors: Klaus Stranacher, Vesna Krnjic, Thomas Zefferer
Abstract:
The public sector holds large amounts of data of various areas such as social affairs, economy, or tourism. Various initiatives such as Open Government Data or the EU Directive on public sector information aim to make these data available for public and private service providers. Requirements for the provision of public sector data are defined by legal and organizational frameworks. Surprisingly, the defined requirements hardly cover security aspects such as integrity or authenticity. In this paper we discuss the importance of these missing requirements and present a concept to assure the integrity and authenticity of provided data based on electronic signatures. We show that our concept is perfectly suitable for the provisioning of unaltered data. We also show that our concept can also be extended to data that needs to be anonymized before provisioning by incorporating redactable signatures. Our proposed concept enhances trust and reliability of provided public sector data.Keywords: Trusted Public Sector Data, Integrity, Authenticity, Reliability, Redactable Signatures.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17587226 A New DIDS Design Based on a Combination Feature Selection Approach
Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman
Abstract:
Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19407225 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance
Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat
Abstract:
Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18327224 Towards Development of Solution for Business Process-Oriented Data Analysis
Authors: M. Klimavicius
Abstract:
This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.Keywords: Data warehouse, data analysis, business processmanagement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13927223 Towards a Secure Storage in Cloud Computing
Authors: Mohamed Elkholy, Ahmed Elfatatry
Abstract:
Cloud computing has emerged as a flexible computing paradigm that reshaped the Information Technology map. However, cloud computing brought about a number of security challenges as a result of the physical distribution of computational resources and the limited control that users have over the physical storage. This situation raises many security challenges for data integrity and confidentiality as well as authentication and access control. This work proposes a security mechanism for data integrity that allows a data owner to be aware of any modification that takes place to his data. The data integrity mechanism is integrated with an extended Kerberos authentication that ensures authorized access control. The proposed mechanism protects data confidentiality even if data are stored on an untrusted storage. The proposed mechanism has been evaluated against different types of attacks and proved its efficiency to protect cloud data storage from different malicious attacks.Keywords: Access control, data integrity, data confidentiality, Kerberos authentication, cloud security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17717222 Thailand National Biodiversity Database System with webMathematica and Google Earth
Authors: W. Katsarapong, W. Srisang, K. Jaroensutasinee, M. Jaroensutasinee
Abstract:
National Biodiversity Database System (NBIDS) has been developed for collecting Thai biodiversity data. The goal of this project is to provide advanced tools for querying, analyzing, modeling, and visualizing patterns of species distribution for researchers and scientists. NBIDS data record two types of datasets: biodiversity data and environmental data. Biodiversity data are specie presence data and species status. The attributes of biodiversity data can be further classified into two groups: universal and projectspecific attributes. Universal attributes are attributes that are common to all of the records, e.g. X/Y coordinates, year, and collector name. Project-specific attributes are attributes that are unique to one or a few projects, e.g., flowering stage. Environmental data include atmospheric data, hydrology data, soil data, and land cover data collecting by using GLOBE protocols. We have developed webbased tools for data entry. Google Earth KML and ArcGIS were used as tools for map visualization. webMathematica was used for simple data visualization and also for advanced data analysis and visualization, e.g., spatial interpolation, and statistical analysis. NBIDS will be used by park rangers at Khao Nan National Park, and researchers.Keywords: GLOBE protocol, Biodiversity, Database System, ArcGIS, Google Earth and webMathematica.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19847221 Javanese Character Recognition Using Hidden Markov Model
Authors: Anastasia Rita Widiarti, Phalita Nari Wastu
Abstract:
Hidden Markov Model (HMM) is a stochastic method which has been used in various signal processing and character recognition. This study proposes to use HMM to recognize Javanese characters from a number of different handwritings, whereby HMM is used to optimize the number of state and feature extraction. An 85.7 % accuracy is obtained as the best result in 16-stated vertical model using pure HMM. This initial result is satisfactory for prompting further research.Keywords: Character recognition, off-line handwritingrecognition, Hidden Markov Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19897220 Evaluation of Clustering Based on Preprocessing in Gene Expression Data
Authors: Seo Young Kim, Toshimitsu Hamasaki
Abstract:
Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.
Keywords: Gene expression, clustering, data preprocessing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17407219 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings
Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank
Abstract:
Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].Keywords: data mining, protein secondary structure prediction, parallelization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15967218 Addressing Data Security in the Cloud
Authors: Marinela Mircea
Abstract:
The development of information and communication technology, the increased use of the internet, as well as the effects of the recession within the last years, have lead to the increased use of cloud computing based solutions, also called on-demand solutions. These solutions offer a large number of benefits to organizations as well as challenges and risks, mainly determined by data visualization in different geographic locations on the internet. As far as the specific risks of cloud environment are concerned, data security is still considered a peak barrier in adopting cloud computing. The present study offers an approach upon ensuring the security of cloud data, oriented towards the whole data life cycle. The final part of the study focuses on the assessment of data security in the cloud, this representing the bases in determining the potential losses and the premise for subsequent improvements and continuous learning.Keywords: cloud computing, data life cycle, data security, security assessment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21617217 Fuzzy Processing of Uncertain Data
Authors: Petr Morávek, Miloš Šeda
Abstract:
In practice, we often come across situations where it is necessary to make decisions based on incomplete or uncertain data. In control systems it may be due to the unknown exact mathematical model, or its excessive complexity (e.g. nonlinearity) when it is necessary to simplify it, respectively, to solve it using a rule base. In the case of databases, searching data we compare a similarity measure with of the requirements of the selection with stored data, where both the select query and the data itself may contain vague terms, for example in the form of linguistic qualifiers. In this paper, we focus on the processing of uncertain data in databases and demonstrate it on the example multi-criteria decision making in the selection of variants, specified by higher number of technical parameters.Keywords: fuzzy logic, linguistic variable, multicriteria decision
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14187216 Automated Stereophotogrammetry Data Cleansing
Authors: Stuart Henry, Philip Morrow, John Winder, Bryan Scotney
Abstract:
The stereophotogrammetry modality is gaining more widespread use in the clinical setting. Registration and visualization of this data, in conjunction with conventional 3D volumetric image modalities, provides virtual human data with textured soft tissue and internal anatomical and structural information. In this investigation computed tomography (CT) and stereophotogrammetry data is acquired from 4 anatomical phantoms and registered using the trimmed iterative closest point (TrICP) algorithm. This paper fully addresses the issue of imaging artifacts around the stereophotogrammetry surface edge using the registered CT data as a reference. Several iterative algorithms are implemented to automatically identify and remove stereophotogrammetry surface edge outliers, improving the overall visualization of the combined stereophotogrammetry and CT data. This paper shows that outliers at the surface edge of stereophotogrammetry data can be successfully removed automatically.
Keywords: Data cleansing, stereophotogrammetry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18427215 Mixtures of Monotone Networks for Prediction
Authors: Marina Velikova, Hennie Daniels, Ad Feelders
Abstract:
In many data mining applications, it is a priori known that the target function should satisfy certain constraints imposed by, for example, economic theory or a human-decision maker. In this paper we consider partially monotone prediction problems, where the target variable depends monotonically on some of the input variables but not on all. We propose a novel method to construct prediction models, where monotone dependences with respect to some of the input variables are preserved by virtue of construction. Our method belongs to the class of mixture models. The basic idea is to convolute monotone neural networks with weight (kernel) functions to make predictions. By using simulation and real case studies, we demonstrate the application of our method. To obtain sound assessment for the performance of our approach, we use standard neural networks with weight decay and partially monotone linear models as benchmark methods for comparison. The results show that our approach outperforms partially monotone linear models in terms of accuracy. Furthermore, the incorporation of partial monotonicity constraints not only leads to models that are in accordance with the decision maker's expertise, but also reduces considerably the model variance in comparison to standard neural networks with weight decay.Keywords: mixture models, monotone neural networks, partially monotone models, partially monotone problems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12467214 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis
Authors: William Xiu Shun Wong, Myungsu Lim, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Kee-Young Kwahk, Namgyu Kim
Abstract:
Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.Keywords: Big data, social network analysis, text mining, topic modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16617213 Stereotypical Motor Movement Recognition Using Microsoft Kinect with Artificial Neural Network
Authors: M. Jazouli, S. Elhoufi, A. Majda, A. Zarghili, R. Aalouane
Abstract:
Autism spectrum disorder is a complex developmental disability. It is defined by a certain set of behaviors. Persons with Autism Spectrum Disorders (ASD) frequently engage in stereotyped and repetitive motor movements. The objective of this article is to propose a method to automatically detect this unusual behavior. Our study provides a clinical tool which facilitates for doctors the diagnosis of ASD. We focus on automatic identification of five repetitive gestures among autistic children in real time: body rocking, hand flapping, fingers flapping, hand on the face and hands behind back. In this paper, we present a gesture recognition system for children with autism, which consists of three modules: model-based movement tracking, feature extraction, and gesture recognition using artificial neural network (ANN). The first one uses the Microsoft Kinect sensor, the second one chooses points of interest from the 3D skeleton to characterize the gestures, and the last one proposes a neural connectionist model to perform the supervised classification of data. The experimental results show that our system can achieve above 93.3% recognition rate.
Keywords: ASD, stereotypical motor movements, repetitive gesture, kinect, artificial neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19067212 Processing and Economic Analysis of Rain Tree (Samanea saman) Pods for Village Level Hydrous Bioethanol Production
Authors: Dharell B. Siano, Wendy C. Mateo, Victorino T. Taylan, Francisco D. Cuaresma
Abstract:
Biofuel is one of the renewable energy sources adapted by the Philippine government in order to lessen the dependency on foreign fuel and to reduce carbon dioxide emissions. Rain tree pods were seen to be a promising source of bioethanol since it contains significant amount of fermentable sugars. The study was conducted to establish the complete procedure in processing rain tree pods for village level hydrous bioethanol production. Production processes were done for village level hydrous bioethanol production from collection, drying, storage, shredding, dilution, extraction, fermentation, and distillation. The feedstock was sundried, and moisture content was determined at a range of 20% to 26% prior to storage. Dilution ratio was 1:1.25 (1 kg of pods = 1.25 L of water) and after extraction process yielded a sugar concentration of 22 0Bx to 24 0Bx. The dilution period was three hours. After three hours of diluting the samples, the juice was extracted using extractor with a capacity of 64.10 L/hour. 150 L of rain tree pods juice was extracted and subjected to fermentation process using a village level anaerobic bioreactor. Fermentation with yeast (Saccharomyces cerevisiae) can fasten up the process, thus producing more ethanol at a shorter period of time; however, without yeast fermentation, it also produces ethanol at lower volume with slower fermentation process. Distillation of 150 L of fermented broth was done for six hours at 85 °C to 95 °C temperature (feedstock) and 74 °C to 95 °C temperature of the column head (vapor state of ethanol). The highest volume of ethanol recovered was established at with yeast fermentation at five-day duration with a value of 14.89 L and lowest actual ethanol content was found at without yeast fermentation at three-day duration having a value of 11.63 L. In general, the results suggested that rain tree pods had a very good potential as feedstock for bioethanol production. Fermentation of rain tree pods juice can be done with yeast and without yeast.
Keywords: Fermentation, hydrous bioethanol, rain tree pods, village level.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15627211 LCA/CFD Studies of Artisanal Brick Manufacture in Mexico
Authors: H. A. Lopez-Aguilar, E. A. Huerta-Reynoso, J. A. Gomez, J. A. Duarte-Moller, A. Perez-Hernandez
Abstract:
Environmental performance of artisanal brick manufacture was studied by Lifecycle Assessment (LCA) methodology and Computational Fluid Dynamics (CFD) analysis in Mexico. The main objective of this paper is to evaluate the environmental impact during artisanal brick manufacture. LCA cradle-to-gate approach was complemented with CFD analysis to carry out an Environmental Impact Assessment (EIA). The lifecycle includes the stages of extraction, baking and transportation to the gate. The functional unit of this study was the production of a single brick in Chihuahua, Mexico and the impact categories studied were carcinogens, respiratory organics and inorganics, climate change radiation, ozone layer depletion, ecotoxicity, acidification/ eutrophication, land use, mineral use and fossil fuels. Laboratory techniques for fuel characterization, gas measurements in situ, and AP42 emission factors were employed in order to calculate gas emissions for inventory data. The results revealed that the categories with greater impacts are ecotoxicity and carcinogens. The CFD analysis is helpful in predicting the thermal diffusion and contaminants from a defined source. LCA-CFD synergy complemented the EIA and allowed us to identify the problem of thermal efficiency within the system.
Keywords: LCA, CFD, brick, artisanal.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18757210 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm
Authors: Ghada Badr, Arwa Alturki
Abstract:
The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.Keywords: Alignment, RNA secondary structure, pairwise, component-based, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9747209 Identifying Critical Success Factors for Data Quality Management through a Delphi Study
Authors: Maria Paula Santos, Ana Lucas
Abstract:
Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.
Keywords: Critical success factors, data quality, data quality management, Delphi, Q-Sort.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11097208 Secure Data Aggregation Using Clusters in Sensor Networks
Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik
Abstract:
Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.Keywords: Aggregation, Clustering, Query Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17347207 A New Protocol for Concealed Data Aggregation in Wireless Sensor Networks
Authors: M. Abbasi Dezfouli, S. Mazraeh, M. H. Yektaie
Abstract:
Wireless sensor networks (WSN) consists of many sensor nodes that are placed on unattended environments such as military sites in order to collect important information. Implementing a secure protocol that can prevent forwarding forged data and modifying content of aggregated data and has low delay and overhead of communication, computing and storage is very important. This paper presents a new protocol for concealed data aggregation (CDA). In this protocol, the network is divided to virtual cells, nodes within each cell produce a shared key to send and receive of concealed data with each other. Considering to data aggregation in each cell is locally and implementing a secure authentication mechanism, data aggregation delay is very low and producing false data in the network by malicious nodes is not possible. To evaluate the performance of our proposed protocol, we have presented computational models that show the performance and low overhead in our protocol.Keywords: Wireless Sensor Networks, Security, Concealed Data Aggregation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17357206 Thermodynamic Evaluation of Coupling APR1400 with a Thermal Desalination Plant
Authors: M. Gomaa Abdoelatef, Robert M. Field, Lee, Yong-Kwan
Abstract:
Growing human population has placed increased demands on water supplies and spurred a heightened interest in desalination infrastructure. Key elements of the economics of desalination projects are thermal and electrical inputs. With growing concerns over use of fossil fuels to (indirectly) supply these inputs, coupling of desalination with nuclear power production represents a significant opportunity. Individually, nuclear and desalination technologies have a long history and are relatively mature. For desalination, Reverse Osmosis (RO) has the lowest energy inputs. However, the economically driven output quality of the water produced using RO, which uses only electrical inputs, is lower than the output water quality from thermal desalination plants. Therefore, modern desalination projects consider that RO should be coupled with thermal desalination technologies (MSF, MED, or MED-TVC) with attendant steam inputs to permit blending to produce various qualities of water. A large nuclear facility is well positioned to dispatch large quantities of both electrical and thermal power. This paper considers the supply of thermal energy to a large desalination facility to examine heat balance impact on the nuclear steam cycle. The APR1400 nuclear plant is selected as prototypical from both a capacity and turbine cycle heat balance perspective to examine steam supply and the impact on electrical output. Extraction points and quantities of steam are considered parametrically along with various types of thermal desalination technologies to form the basis for further evaluations of economically optimal approaches to the interface of nuclear power production with desalination projects. In our study, the thermodynamic evaluation will be executed by DE-TOP, an IAEA sponsored program. DE-TOP has capabilities to analyze power generation systems coupled to desalination plants through various steam extraction positions, taking into consideration the isolation loop between the nuclear and the thermal desalination facilities (i.e., for radiological isolation).Keywords: APR1400, Cogeneration, Desalination, DE-TOP, IAEA, MED, MED-TVC, MSF, RO.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28377205 Human Fall Detection by FMCW Radar Based on Time-Varying Range-Doppler Features
Authors: Xiang Yu, Chuntao Feng, Lu Yang, Meiyang Song, Wenhao Zhou
Abstract:
The existing two-dimensional micro-Doppler features extraction ignores the correlation information between the spatial and temporal dimension features. For the range-Doppler map, the time dimension is introduced, and a frequency modulation continuous wave (FMCW) radar human fall detection algorithm based on time-varying range-Doppler features is proposed. Firstly, the range-Doppler sequence maps are generated from the echo signals of the continuous motion of the human body collected by the radar. Then the three-dimensional data cube composed of multiple frames of range-Doppler maps is input into the three-dimensional Convolutional Neural Network (3D CNN). The spatial and temporal features of time-varying range-Doppler are extracted by the convolution layer and pool layer at the same time. Finally, the extracted spatial and temporal features are input into the fully connected layer for classification. The experimental results show that the proposed fall detection algorithm has a detection accuracy of 95.66%.
Keywords: FMCW radar, fall detection, 3D CNN, time-varying range-Doppler features.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5217204 The New Method of Concealed Data Aggregation in Wireless Sensor: A Case Study
Authors: M. Abbasi Dezfouli, S. Mazraeh, M. H. Yektaie
Abstract:
Wireless sensor networks (WSN) consists of many sensor nodes that are placed on unattended environments such as military sites in order to collect important information. Implementing a secure protocol that can prevent forwarding forged data and modifying content of aggregated data and has low delay and overhead of communication, computing and storage is very important. This paper presents a new protocol for concealed data aggregation (CDA). In this protocol, the network is divided to virtual cells, nodes within each cell produce a shared key to send and receive of concealed data with each other. Considering to data aggregation in each cell is locally and implementing a secure authentication mechanism, data aggregation delay is very low and producing false data in the network by malicious nodes is not possible. To evaluate the performance of our proposed protocol, we have presented computational models that show the performance and low overhead in our protocol.
Keywords: Wireless Sensor Networks, Security, Concealed Data Aggregation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17687203 Peakwise Smoothing of Data Models using Wavelets
Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan
Abstract:
Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17507202 A New Precautionary Method for Measurement and Improvement the Data Quality
Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi
Abstract:
the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.
Keywords: Data quality, precaution, information system, measurement, improvement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1468