Search results for: Biological data

7571 Use XML Format like a Model of Data Backup

Authors: Souleymane Oumtanaga, Kadjo Tanon Lambert, Koné Tiémoman, Tety Pierre, Dowa N’sreke Florent

Abstract:

Nowadays data backup format doesn-t cease to appear raising so the anxiety on their accessibility and their perpetuity. XML is one of the most promising formats to guarantee the integrity of data. This article suggests while showing one thing man can do with XML. Indeed XML will help to create a data backup model. The main task will consist in defining an application in JAVA able to convert information of a database in XML format and restore them later.

Keywords: Backup, Proprietary format, parser, syntactic tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729

7570 REDUCER – An Architectural Design Pattern for Reducing Large and Noisy Data Sets

Authors: Apkar Salatian

Abstract:

To relieve the burden of reasoning on a point to point basis, in many domains there is a need to reduce large and noisy data sets into trends for qualitative reasoning. In this paper we propose and describe a new architectural design pattern called REDUCER for reducing large and noisy data sets that can be tailored for particular situations. REDUCER consists of 2 consecutive processes: Filter which takes the original data and removes outliers, inconsistencies or noise; and Compression which takes the filtered data and derives trends in the data. In this seminal article we also show how REDUCER has successfully been applied to 3 different case studies.

Keywords: Design Pattern, filtering, compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1489

7569 Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features

Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova

Abstract:

The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.

Keywords: Emotion recognition, facial recognition, signal processing, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2017

7568 On the Mathematical Structure and Algorithmic Implementation of Biochemical Network Models

Authors: Paola Lecca

Abstract:

Modeling and simulation of biochemical reactions is of great interest in the context of system biology. The central dogma of this re-emerging area states that it is system dynamics and organizing principles of complex biological phenomena that give rise to functioning and function of cells. Cell functions, such as growth, division, differentiation and apoptosis are temporal processes, that can be understood if they are treated as dynamic systems. System biology focuses on an understanding of functional activity from a system-wide perspective and, consequently, it is defined by two hey questions: (i) how do the components within a cell interact, so as to bring about its structure and functioning? (ii) How do cells interact, so as to develop and maintain higher levels of organization and functions? In recent years, wet-lab biologists embraced mathematical modeling and simulation as two essential means toward answering the above questions. The credo of dynamics system theory is that the behavior of a biological system is given by the temporal evolution of its state. Our understanding of the time behavior of a biological system can be measured by the extent to which a simulation mimics the real behavior of that system. Deviations of a simulation indicate either limitations or errors in our knowledge. The aim of this paper is to summarize and review the main conceptual frameworks in which models of biochemical networks can be developed. In particular, we review the stochastic molecular modelling approaches, by reporting the principal conceptualizations suggested by A. A. Markov, P. Langevin, A. Fokker, M. Planck, D. T. Gillespie, N. G. van Kampfen, and recently by D. Wilkinson, O. Wolkenhauer, P. S. Jöberg and by the author.

Keywords: Mathematical structure, algorithmic implementation, biochemical network models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1556

7567 Analysis of Data Gathering Schemes for Layered Sensor Networks with Multihop Polling

Authors: Bhed Bahadur Bista, Danda B. Rawat

Abstract:

In this paper, we investigate multihop polling and data gathering schemes in layered sensor networks in order to extend the life time of the networks. A network consists of three layers. The lowest layer contains sensors. The middle layer contains so called super nodes with higher computational power, energy supply and longer transmission range than sensor nodes. The top layer contains a sink node. A node in each layer controls a number of nodes in lower layer by polling mechanism to gather data. We will present four types of data gathering schemes: intermediate nodes do not queue data packet, queue single packet, queue multiple packets and aggregate data, to see which data gathering scheme is more energy efficient for multihop polling in layered sensor networks.

Keywords: layered sensor network, polling, data gatheringschemes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1567

7566 Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.

Keywords: Clustering, Categorical, Incremental, Frequency, Domain

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1820

7565 Multimedia Data Fusion for Event Detection in Twitter by Using Dempster-Shafer Evidence Theory

Authors: Samar M. Alqhtani, Suhuai Luo, Brian Regan

Abstract:

Data fusion technology can be the best way to extract useful information from multiple sources of data. It has been widely applied in various applications. This paper presents a data fusion approach in multimedia data for event detection in twitter by using Dempster-Shafer evidence theory. The methodology applies a mining algorithm to detect the event. There are two types of data in the fusion. The first is features extracted from text by using the bag-ofwords method which is calculated using the term frequency-inverse document frequency (TF-IDF). The second is the visual features extracted by applying scale-invariant feature transform (SIFT). The Dempster - Shafer theory of evidence is applied in order to fuse the information from these two sources. Our experiments have indicated that comparing to the approaches using individual data source, the proposed data fusion approach can increase the prediction accuracy for event detection. The experimental result showed that the proposed method achieved a high accuracy of 0.97, comparing with 0.93 with texts only, and 0.86 with images only.

Keywords: Data fusion, Dempster-Shafer theory, data mining, event detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798

7564 Model Order Reduction for Frequency Response and Effect of Order of Method for Matching Condition

Authors: Aref Ghafouri, Mohammad Javad Mollakazemi, Farhad Asadi

Abstract:

In this paper, model order reduction method is used for approximation in linear and nonlinearity aspects in some experimental data. This method can be used for obtaining offline reduced model for approximation of experimental data and can produce and follow the data and order of system and also it can match to experimental data in some frequency ratios. In this study, the method is compared in different experimental data and influence of choosing of order of the model reduction for obtaining the best and sufficient matching condition for following the data is investigated in format of imaginary and reality part of the frequency response curve and finally the effect and important parameter of number of order reduction in nonlinear experimental data is explained further.

Keywords: Frequency response, Order of model reduction, frequency matching condition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057

7563 Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R

Authors: Jaya Mathew

Abstract:

Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.

Keywords: Predictive maintenance, machine learning, big data, cloud, on premise SQL, R.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919

7562 Big Data Strategy for Telco: Network Transformation

Authors: F. Amin, S. Feizi

Abstract:

Big data has the potential to improve the quality of services; enable infrastructure that businesses depend on to adapt continually and efficiently; improve the performance of employees; help organizations better understand customers; and reduce liability risks. Analytics and marketing models of fixed and mobile operators are falling short in combating churn and declining revenue per user. Big Data presents new method to reverse the way and improve profitability. The benefits of Big Data and next-generation network, however, are more exorbitant than improved customer relationship management. Next generation of networks are in a prime position to monetize rich supplies of customer information—while being mindful of legal and privacy issues. As data assets are transformed into new revenue streams will become integral to high performance.

Keywords: Big Data, Next Generation Networks, Network Transformation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2515

7561 Developing Manufacturing Process for the Graphene Sensors

Authors: Abdullah Faqihi, John Hedley

Abstract:

Biosensors play a significant role in the healthcare sectors, scientific and technological progress. Developing electrodes that are easy to manufacture and deliver better electrochemical performance is advantageous for diagnostics and biosensing. They can be implemented extensively in various analytical tasks such as drug discovery, food safety, medical diagnostics, process controls, security and defence, in addition to environmental monitoring. Development of biosensors aims to create high-performance electrochemical electrodes for diagnostics and biosensing. A biosensor is a device that inspects the biological and chemical reactions generated by the biological sample. A biosensor carries out biological detection via a linked transducer and transmits the biological response into an electrical signal; stability, selectivity, and sensitivity are the dynamic and static characteristics that affect and dictate the quality and performance of biosensors. In this research, a developed experimental study for laser scribing technique for graphene oxide inside a vacuum chamber for processing of graphene oxide is presented. The processing of graphene oxide (GO) was achieved using the laser scribing technique. The effect of the laser scribing on the reduction of GO was investigated under two conditions: atmosphere and vacuum. GO solvent was coated onto a LightScribe DVD. The laser scribing technique was applied to reduce GO layers to generate rGO. The micro-details for the morphological structures of rGO and GO were visualised using scanning electron microscopy (SEM) and Raman spectroscopy so that they could be examined. The first electrode was a traditional graphene-based electrode model, made under normal atmospheric conditions, whereas the second model was a developed graphene electrode fabricated under a vacuum state using a vacuum chamber. The purpose was to control the vacuum conditions, such as the air pressure and the temperature during the fabrication process. The parameters to be assessed include the layer thickness and the continuous environment. Results presented show high accuracy and repeatability achieving low cost productivity.

Keywords: Laser scribing, LightScribe DVD, graphene oxide, scanning electron microscopy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 660

7560 Detecting Rat’s Kidney Inflammation Using Real Time Photoacoustic Tomography

Authors: M. Y. Lee, D. H. Shin, S. H. Park, W.C. Ham, S.K. Ko, C. G. Song

Abstract:

Photoacoustic Tomography (PAT) is a promising medical imaging modality that combines optical imaging contrast with the spatial resolution of ultrasound imaging. It can also distinguish the changes in biological features. But, real-time PAT system should be confirmed due to photoacoustic effect for tissue. Thus, we have developed a real-time PAT system using a custom-developed data acquisition board and ultrasound linear probe. To evaluate performance of our system, phantom test was performed. As a result of those experiments, the system showed satisfactory performance and its usefulness has been confirmed. We monitored the degradation of inflammation which induced on the rat’s kidney using real-time PAT.

Keywords: Photoacoustic tomography, inflammation detection, rat, kidney, contrast agent, ultrasound.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1367

7559 Using Perspective Schemata to Model the ETL Process

Authors: Valeria M. Pequeno, Joao Carlos G. M. Pires

Abstract:

Data Warehouses (DWs) are repositories which contain the unified history of an enterprise for decision support. The data must be Extracted from information sources, Transformed and integrated to be Loaded (ETL) into the DW, using ETL tools. These tools focus on data movement, where the models are only used as a means to this aim. Under a conceptual viewpoint, the authors want to innovate the ETL process in two ways: 1) to make clear compatibility between models in a declarative fashion, using correspondence assertions and 2) to identify the instances of different sources that represent the same entity in the real-world. This paper presents the overview of the proposed framework to model the ETL process, which is based on the use of a reference model and perspective schemata. This approach provides the designer with a better understanding of the semantic associated with the ETL process.

Keywords: conceptual data model, correspondence assertions, data warehouse, data integration, ETL process, object relational database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510

7558 Collaborative Education Practice in a Data Structure E-Learning Course

Authors: Gang Chen, Ruimin Shen

Abstract:

This paper presented a collaborative education model, which consists four parts: collaborative teaching, collaborative working, collaborative training and interaction. Supported by an e-learning platform, collaborative education was practiced in a data structure e-learning course. Data collected shows that most of students accept collaborative education. This paper goes one step attempting to determine which aspects appear to be most important or helpful in collaborative education.

Keywords: Collaborative work, education, data structures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1689

7557 Generic Data Warehousing for Consumer Electronics Retail Industry

Authors: S. Habte, K. Ouazzane, P. Patel, S. Patel

Abstract:

The dynamic and highly competitive nature of the consumer electronics retail industry means that businesses in this industry are experiencing different decision making challenges in relation to pricing, inventory control, consumer satisfaction and product offerings. To overcome the challenges facing retailers and create opportunities, we propose a generic data warehousing solution which can be applied to a wide range of consumer electronics retailers with a minimum configuration. The solution includes a dimensional data model, a template SQL script, a high level architectural descriptions, ETL tool developed using C#, a set of APIs, and data access tools. It has been successfully applied by ASK Outlets Ltd UK resulting in improved productivity and enhanced sales growth.

Keywords: Consumer electronics retail, dimensional data model, data analysis, generic data warehousing, reporting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1383

7556 Microfluidic Paper-Based Electrochemical Biosensor

Authors: Ahmad Manbohi, Seyyed Hamid Ahmadi

Abstract:

A low-cost paper-based microfluidic device (PAD) for the multiplex electrochemical determination of glucose, uric acid, and dopamine in biological fluids was developed. Using wax printing, PAD containing a central zone, six channels, and six detection zones was fabricated, and the electrodes were printed on detection zones using pre-made electrodes template. For each analyte, two detection zones were used. The carbon working electrode was coated with chitosan-BSA (and enzymes for glucose and uric acid). To detect glucose and uric acid, enzymatic reactions were employed. These reactions involve enzyme-catalyzed redox reactions of the analytes and produce free electrons for electrochemical measurement. Calibration curves were linear (R^² > 0.980) in the range of 0-80 mM for glucose, 0.09–0.9 mM for dopamine, and 0–50 mM for uric acid, respectively. Blood samples were successfully analyzed by the proposed method.

Keywords: Multiplex, microfluidic paper-based electrochemical biosensors, biomarkers, biological fluids.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1605

7555 Effect of Different Methods of Soil Fertility on Grain Yield and Chickpea Quality

Authors: Mohammadi K., Ghalavand A., Aghaalikhani M

Abstract:

In order to evaluation the effects of natural, biological and chemical fertilizers on grain yield and chickpea quality, field experiments were carried out in 2007 and 2008 growing seasons. In this research the effects of different organic, chemical and biological fertilizers were investigated on grain yield and quality of chickpea. Experimental units were arranged in split-split plots based on randomized complete blocks with three replications. The highest amounts of yield and yield components were obtained in G1×N5 interaction. Significant increasing of N, P, K, Fe and Mg content in leaves and grains emphasized on superiority of mentioned treatment because each one of these nutrients has an approved role in chlorophyll synthesis and photosynthesis ability of the crop. The combined application of compost, farmyard manure and chemical phosphorus (N5) had the best grain quality due to high protein, starch and total sugar contents, low crude fiber and reduced cooking time.

Keywords: soil fertility, grain yield, chickpea, natural resources.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2597

7554 An Algebra for Protein Structure Data

Authors: Yanchao Wang, Rajshekhar Sunderraman

Abstract:

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542

7553 Atomic Force Microscopy (AFM)Topographical Surface Characterization of Multilayer-Coated and Uncoated Carbide Inserts

Authors: Samy E. Oraby, Ayman M. Alaskari

Abstract:

In recent years, scanning probe atomic force microscopy SPM AFM has gained acceptance over a wide spectrum of research and science applications. Most fields focuses on physical, chemical, biological while less attention is devoted to manufacturing and machining aspects. The purpose of the current study is to assess the possible implementation of the SPM AFM features and its NanoScope software in general machining applications with special attention to the tribological aspects of cutting tool. The surface morphology of coated and uncoated as-received carbide inserts is examined, analyzed, and characterized through the determination of the appropriate scanning setting, the suitable data type imaging techniques and the most representative data analysis parameters using the MultiMode SPM AFM in contact mode. The NanoScope operating software is used to capture realtime three data types images: “Height", “Deflection" and “Friction". Three scan sizes are independently performed: 2, 6, and 12 μm with a 2.5 μm vertical range (Z). Offline mode analysis includes the determination of three functional topographical parameters: surface “Roughness", power spectral density “PSD" and “Section". The 12 μm scan size in association with “Height" imaging is found efficient to capture every tiny features and tribological aspects of the examined surface. Also, “Friction" analysis is found to produce a comprehensive explanation about the lateral characteristics of the scanned surface. Configuration of many surface defects and drawbacks has been precisely detected and analyzed.

Keywords: SPM AFM contact mode, carbide inserts, scan size, surface defects, surface roughness, PSD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7269

7552 A Combined Cipher Text Policy Attribute-Based Encryption and Timed-Release Encryption Method for Securing Medical Data in Cloud

Authors: G. Shruthi, Purohit Shrinivasacharya

Abstract:

The biggest problem in cloud is securing an outsourcing data. A cloud environment cannot be considered to be trusted. It becomes more challenging when outsourced data sources are managed by multiple outsourcers with different access rights. Several methods have been proposed to protect data confidentiality against the cloud service provider to support fine-grained data access control. We propose a method with combined Cipher Text Policy Attribute-based Encryption (CP-ABE) and Timed-release encryption (TRE) secure method to control medical data storage in public cloud.

Keywords: Attribute, encryption, security, trapdoor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 758

7551 Kinetics Studies on Biological Treatment of Tannery Wastewater Using Mixed Culture

Authors: G.Durai, N.Rajamohan, C.Karthikeyan, M.Rajasimman

Abstract:

In this study, aerobic digestion of tannery industry wastewater was carried out using mixed culture obtained from common effluent treatment plant treating tannery wastewater. The effect of pH, temperature, inoculum concentration, agitation speed and initial substrate concentration on the reduction of organic matters were found. The optimum conditions for COD reduction was found to be pH - 7 (60%), temperature - 30ÔùªC (61%), inoculum concentration - 2% (61%), agitation speed - 150rpm (65%) and initial substrate concentration - 1560 mg COD/L (74%). Kinetics studies were carried by using Monod model, First order, Diffusional model and Singh model. From the results it was found that the Monod model suits well for the degradation of tannery wastewater using mixed microbial consortium.

Keywords: Tannery, Wastewater, Biological treatment, Aerobic, Mixed culture, Kinetics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3166

7550 Study the Biological Activities of Tribulus Terrestris Extracts

Authors: Ahmed A. Hussain, Abbas A. Mohammed, Heba. H. Ibrahim, Amir H. Abbas

Abstract:

In this study the extracts of the Iraqi herb Tribulus terrestris (Al-Hassage or Al-Kutub) was done by using of polar and non polar solvents, then the biological activity of these extractants was studied in three fields, First, the antibacterial activity (in vitro) on gram positive bacteria (Staphylococcus aureus), and gram negative bacteria (E. coli, Proteus vulgaris, Pseudomonas aerugiuosa, and Klebsiella), all extracts showed considerable activity against all bacteria. Second, the effect of extracts on free serum testosterone level in male mice (in vivo), the alcoholic, and acetonitrilic extracts showed significant (P < 0.05) increase in free serum testosterone level, and we found that the extracts contained compounds with less genotoxic effects in mice germ cells. 3rd, was to study the effect of methanolic extract of T. terrestris in diabetes management.

Keywords: Genotoxic, germ cells, tribulus terrestris, testosterone.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4690

7549 EPR Hiding in Medical Images for Telemedicine

Authors: K. A. Navas, S. Archana Thampy, M. Sasikumar

Abstract:

Medical image data hiding has strict constrains such as high imperceptibility, high capacity and high robustness. Achieving these three requirements simultaneously is highly cumbersome. Some works have been reported in the literature on data hiding, watermarking and stegnography which are suitable for telemedicine applications. None is reliable in all aspects. Electronic Patient Report (EPR) data hiding for telemedicine demand it blind and reversible. This paper proposes a novel approach to blind reversible data hiding based on integer wavelet transform. Experimental results shows that this scheme outperforms the prior arts in terms of zero BER (Bit Error Rate), higher PSNR (Peak Signal to Noise Ratio), and large EPR data embedding capacity with WPSNR (Weighted Peak Signal to Noise Ratio) around 53 dB, compared with the existing reversible data hiding schemes.

Keywords: Biomedical imaging, Data security, Datacommunication, Teleconferencing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2752

7548 A Robust Method for Encrypted Data Hiding Technique Based on Neighborhood Pixels Information

Authors: Ali Shariq Imran, M. Younus Javed, Naveed Sarfraz Khattak

Abstract:

This paper presents a novel method for data hiding based on neighborhood pixels information to calculate the number of bits that can be used for substitution and modified Least Significant Bits technique for data embedding. The modified solution is independent of the nature of the data to be hidden and gives correct results along with un-noticeable image degradation. The technique, to find the number of bits that can be used for data hiding, uses the green component of the image as it is less sensitive to human eye and thus it is totally impossible for human eye to predict whether the image is encrypted or not. The application further encrypts the data using a custom designed algorithm before embedding bits into image for further security. The overall process consists of three main modules namely embedding, encryption and extraction cm.

Keywords: Data hiding, image processing, information security, stagonography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2340

7547 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Authors: Yogita, Durga Toshniwal

Abstract:

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2636

7546 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1173

7545 The Effect of Measurement Distribution on System Identification and Detection of Behavior of Nonlinearities of Data

Authors: Mohammad Javad Mollakazemi, Farhad Asadi, Aref Ghafouri

Abstract:

In this paper, we considered and applied parametric modeling for some experimental data of dynamical system. In this study, we investigated the different distribution of output measurement from some dynamical systems. Also, with variance processing in experimental data we obtained the region of nonlinearity in experimental data and then identification of output section is applied in different situation and data distribution. Finally, the effect of the spanning the measurement such as variance to identification and limitation of this approach is explained.

Keywords: Gaussian process, Nonlinearity distribution, Particle filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721

7544 Exponentially Weighted Simultaneous Estimation of Several Quantiles

Authors: Valeriy Naumov, Olli Martikainen

Abstract:

In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for heavy-tailed data.

Keywords: Quantile estimation, data stream, heavy-taileddistribution, tail index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1531

7543 Construct Pairwise Test Suites Based on the Bak-Sneppen Model of Biological Evolution

Authors: Jianjun Yuan, Changjun Jiang

Abstract:

Pairwise testing, which requires that every combination of valid values of each pair of system factors be covered by at lease one test case, plays an important role in software testing since many faults are caused by unexpected 2-way interactions among system factors. Although meta-heuristic strategies like simulated annealing can generally discover smaller pairwise test suite, they may cost more time to perform search, compared with greedy algorithms. We propose a new method, improved Extremal Optimization (EO) based on the Bak-Sneppen (BS) model of biological evolution, for constructing pairwise test suites and define fitness function according to the requirement of improved EO. Experimental results show that improved EO gives similar size of resulting pairwise test suite and yields an 85% reduction in solution time over SA.

Keywords: Covering Arrays, Extremal Optimization, Simulated Annealing, Software Testing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1778

7542 Methodology for Quantifying the Meaning of Information in Biological Systems

Authors: Richard L. Summers

Abstract:

The advanced computational analysis of biological systems is becoming increasingly dependent upon an understanding of the information-theoretic structure of the materials, energy and interactive processes that comprise those systems. The stability and survival of these living systems is fundamentally contingent upon their ability to acquire and process the meaning of information concerning the physical state of its biological continuum (biocontinuum). The drive for adaptive system reconciliation of a divergence from steady state within this biocontinuum can be described by an information metric-based formulation of the process for actionable knowledge acquisition that incorporates the axiomatic inference of Kullback-Leibler information minimization driven by survival replicator dynamics. If the mathematical expression of this process is the Lagrangian integrand for any change within the biocontinuum then it can also be considered as an action functional for the living system. In the direct method of Lyapunov, such a summarizing mathematical formulation of global system behavior based on the driving forces of energy currents and constraints within the system can serve as a platform for the analysis of stability. As the system evolves in time in response to biocontinuum perturbations, the summarizing function then conveys information about its overall stability. This stability information portends survival and therefore has absolute existential meaning for the living system. The first derivative of the Lyapunov energy information function will have a negative trajectory toward a system steady state if the driving force is dissipating. By contrast, system instability leading to system dissolution will have a positive trajectory. The direction and magnitude of the vector for the trajectory then serves as a quantifiable signature of the meaning associated with the living system’s stability information, homeostasis and survival potential.

Keywords: Semiotic meaning, Shannon information, Lyapunov, living systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 514