Search results for: whole exome sequencing data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25334

Search results for: whole exome sequencing data

24704 Measured versus Default Interstate Traffic Data in New Mexico, USA

Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder

Abstract:

This study investigates how the site specific traffic data differs from the Mechanistic Empirical Pavement Design Software default values. Two Weigh-in-Motion (WIM) stations were installed in Interstate-40 (I-40) and Interstate-25 (I-25) to developed site specific data. A computer program named WIM Data Analysis Software (WIMDAS) was developed using Microsoft C-Sharp (.Net) for quality checking and processing of raw WIM data. A complete year data from November 2013 to October 2014 was analyzed using the developed WIM Data Analysis Program. After that, the vehicle class distribution, directional distribution, lane distribution, monthly adjustment factor, hourly distribution, axle load spectra, average number of axle per vehicle, axle spacing, lateral wander distribution, and wheelbase distribution were calculated. Then a comparative study was done between measured data and AASHTOWare default values. It was found that the measured general traffic inputs for I-40 and I-25 significantly differ from the default values.

Keywords: AASHTOWare, traffic, weigh-in-motion, axle load distribution

Procedia PDF Downloads 339
24703 Design of Knowledge Management System with Geographic Information System

Authors: Angga Hidayah Ramadhan, Luciana Andrawina, M. Azani Hasibuan

Abstract:

Data will be as a core of the decision if it has a good treatment or process, which is process that data into information, and information into knowledge to make a wisdom or decision. Today, many companies have not realize it include XYZ University Admission Directorate as executor of National Admission called Seleksi Masuk Bersama (SMB) that during the time, the workers only uses their feeling to make a decision. Whereas if it done, then that company can analyze the data to make a right decision to get a pin sales from student candidate or registrant that follow SMB as many as possible. Therefore, needs Knowledge Management System (KMS) with Geographic Information System (GIS) use 5C4C that can process that company data becomes more useful and can help make decisions. This information system can process data into information based on the pin sold data with 5C (Contextualized, Categorize, Calculation, Correction, Condensed) and convert information into knowledge with 4C (Comparing, Consequence, Connection, Conversation) that has been several steps until these data can be useful to make easier to take a decision or wisdom, resolve problems, communicate, and quicker to learn to the employees have not experience and also for ease of viewing/visualization based on spatial data that equipped with GIS functionality that can be used to indicate events in each province with indicator that facilitate in this system. The system also have a function to save the tacit on the system then to be proceed into explicit in expert system based on the problems that will be found from the consequences of information. With the system each team can make a decision with same ways, structured, and the important is based on the actual event/data.

Keywords: 5C4C, data, information, knowledge

Procedia PDF Downloads 459
24702 A Policy Strategy for Building Energy Data Management in India

Authors: Shravani Itkelwar, Deepak Tewari, Bhaskar Natarajan

Abstract:

The energy consumption data plays a vital role in energy efficiency policy design, implementation, and impact assessment. Any demand-side energy management intervention's success relies on the availability of accurate, comprehensive, granular, and up-to-date data on energy consumption. The Building sector, including residential and commercial, is one of the largest consumers of energy in India after the Industrial sector. With economic growth and increasing urbanization, the building sector is projected to grow at an unprecedented rate, resulting in a 5.6 times escalation in energy consumption till 2047 compared to 2017. Therefore, energy efficiency interventions will play a vital role in decoupling the floor area growth and associated energy demand, thereby increasing the need for robust data. In India, multiple institutions are involved in the collection and dissemination of data. This paper focuses on energy consumption data management in the building sector in India for both residential and commercial segments. It evaluates the robustness of data available through administrative and survey routes to estimate the key performance indicators and identify critical data gaps for making informed decisions. The paper explores several issues in the data, such as lack of comprehensiveness, non-availability of disaggregated data, the discrepancy in different data sources, inconsistent building categorization, and others. The identified data gaps are justified with appropriate examples. Moreover, the paper prioritizes required data in order of relevance to policymaking and groups it into "available," "easy to get," and "hard to get" categories. The paper concludes with recommendations to address the data gaps by leveraging digital initiatives, strengthening institutional capacity, institutionalizing exclusive building energy surveys, and standardization of building categorization, among others, to strengthen the management of building sector energy consumption data.

Keywords: energy data, energy policy, energy efficiency, buildings

Procedia PDF Downloads 182
24701 A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures

Authors: Silvina Caíno-Lores, Jesús Carretero

Abstract:

Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that improving data locality is key to move towards exascale infrastructures efficiently, as solutions to this problem aim to reduce the bandwidth consumed in data transfers, and the overheads that arise from them. There are several techniques that attempt to move computations closer to the data. In this survey we analyse the different mechanisms that have been proposed to provide data locality for large scale high-performance and high-throughput systems. This survey intends to assist scientific computing community in understanding the various technical aspects and strategies that have been reported in recent literature regarding data locality. As a result, we present an overview of locality-oriented techniques, which are grouped in four main categories: application development, task scheduling, in-memory computing and storage platforms. Finally, the authors include a discussion on future research lines and synergies among the former techniques.

Keywords: data locality, data-centric computing, large scale infrastructures, cloud computing

Procedia PDF Downloads 257
24700 Leukocyte Transcriptome Analysis of Patients with Obesity-Related High Output Heart Failure

Authors: Samantha A. Cintron, Janet Pierce, Mihaela E. Sardiu, Diane Mahoney, Jill Peltzer, Bhanu Gupta, Qiuhua Shen

Abstract:

High output heart failure (HOHF) is characterized a high output state resulting from an underlying disease process and is commonly caused by obesity. As obesity levels increase, more individuals will be at risk for obesity-related HOHF. However, the underlying pathophysiologic mechanisms of obesity-related HOHF are not well understood and need further research. The aim of the study was to describe the differences in leukocyte transcriptomes of morbidly obese patients with HOHF and those with non-HOHF. In this cross-sectional study, the study team collected blood samples, demographics, and clinical data of six patients with morbid obesity and HOHF and six patients with morbid obesity and non-HOHF. The study team isolated the peripheral blood leukocyte RNA and applied stranded total RNA sequencing. Differential gene expression was calculated, and Ingenuity Pathway Analysis software was used to interpret the canonical pathways, functional changes, upstream regulators, and mechanistic and causal networks that were associated with the significantly different leukocyte transcriptomes. The study team identified 116 differentially expressed genes; 114 were upregulated, and 2 were downregulated in the HOHF group (Benjamini-Hochberg adjusted p-value ≤ 0.05 and log2(fold-change) of ±1). The differentially expressed genes were involved with cell proliferation, mitochondrial function, erythropoiesis, erythrocyte stability, and apoptosis. The top upregulated canonical pathways associated with differentially expressed genes were autophagy, adenosine monophosphate-activated protein kinase signaling, and senescence pathways. Upstream regulator GATA Binding Protein 1 (GATA1) and a network associated with nuclear factor kappa-light chain-enhancer of activated B cells (NF-kB) were also identified based on the different leukocyte transcriptomes of morbidly obese patients with HOHF and non-HOHF. To the author’s best knowledge, this is the first study that reported the differential gene expression in patients with obesity-related HOHF and demonstrated the unique pathophysiologic mechanisms underlying the disease. Further research is needed to determine the role of cellular function and maintenance, inflammation, and iron homeostasis in obesity-related HOHF.

Keywords: cardiac output, heart failure, obesity, transcriptomics

Procedia PDF Downloads 51
24699 Maximizing Bidirectional Green Waves for Major Road Axes

Authors: Christian Liebchen

Abstract:

Both from an environmental perspective and with respect to road traffic flow quality, planning so-called green waves along major road axes is a well-established target for traffic engineers. For one-way road axes (e.g. the Avenues in Manhattan), this is a trivial downstream task. For bidirectional arterials, the well-known necessary condition for establishing a green wave in both directions is that the driving times between two subsequent crossings must be an integer multiple of half of the cycle time of the signal programs at the nodes. In this paper, we propose an integer linear optimization model to establish fixed-time green waves in both directions that are as long and as wide as possible, even in the situation where the driving time condition is not fulfilled. In particular, we are considering an arterial along whose nodes separate left-turn signal groups are realized. In our computational results, we show that scheduling left-turn phases before or after the straight phases can reduce waiting times along the arterial. Moreover, we show that there is always a solution with green waves in both directions that are as long and as wide as possible, where absolute priority is put on just one direction. Compared to optimizing both directions together, establishing an ideal green wave into one direction can only provide suboptimal quality when considering prioritized parts of a green band (e.g., first few seconds).

Keywords: traffic light coordination, synchronization, phase sequencing, green waves, integer programming

Procedia PDF Downloads 111
24698 Wind Speed Data Analysis in Colombia in 2013 and 2015

Authors: Harold P. Villota, Alejandro Osorio B.

Abstract:

The energy meteorology is an area for study energy complementarity and the use of renewable sources in interconnected systems. Due to diversify the energy matrix in Colombia with wind sources, is necessary to know the data bases about this one. However, the time series given by 260 automatic weather stations have empty, and no apply data, so the purpose is to fill the time series selecting two years to characterize, impute and use like base to complete the data between 2005 and 2020.

Keywords: complementarity, wind speed, renewable, colombia, characteri, characterization, imputation

Procedia PDF Downloads 161
24697 Comprehensive Longitudinal Multi-omic Profiling in Weight Gain and Insulin Resistance

Authors: Christine Y. Yeh, Brian D. Piening, Sarah M. Totten, Kimberly Kukurba, Wenyu Zhou, Kevin P. F. Contrepois, Gucci J. Gu, Sharon Pitteri, Michael Snyder

Abstract:

Three million deaths worldwide are attributed to obesity. However, the biomolecular mechanisms that describe the link between adiposity and subsequent disease states are poorly understood. Insulin resistance characterizes approximately half of obese individuals and is a major cause of obesity-mediated diseases such as Type II diabetes, hypertension and other cardiovascular diseases. This study makes use of longitudinal quantitative and high-throughput multi-omics (genomics, epigenomics, transcriptomics, glycoproteomics etc.) methodologies on blood samples to develop multigenic and multi-analyte signatures associated with weight gain and insulin resistance. Participants of this study underwent a 30-day period of weight gain via excessive caloric intake followed by a 60-day period of restricted dieting and return to baseline weight. Blood samples were taken at three different time points per patient: baseline, peak-weight and post weight loss. Patients were characterized as either insulin resistant (IR) or insulin sensitive (IS) before having their samples processed via longitudinal multi-omic technologies. This comparative study revealed a wealth of biomolecular changes associated with weight gain after using methods in machine learning, clustering, network analysis etc. Pathways of interest included those involved in lipid remodeling, acute inflammatory response and glucose metabolism. Some of these biomolecules returned to baseline levels as the patient returned to normal weight whilst some remained elevated. IR patients exhibited key differences in inflammatory response regulation in comparison to IS patients at all time points. These signatures suggest differential metabolism and inflammatory pathways between IR and IS patients. Biomolecular differences associated with weight gain and insulin resistance were identified on various levels: in gene expression, epigenetic change, transcriptional regulation and glycosylation. This study was not only able to contribute to new biology that could be of use in preventing or predicting obesity-mediated diseases, but also matured novel biomedical informatics technologies to produce and process data on many comprehensive omics levels.

Keywords: insulin resistance, multi-omics, next generation sequencing, proteogenomics, type ii diabetes

Procedia PDF Downloads 425
24696 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 385
24695 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 103
24694 Digital Revolution a Veritable Infrastructure for Technological Development

Authors: Osakwe Jude Odiakaosa

Abstract:

Today’s digital society is characterized by e-education or e-learning, e-commerce, and so on. All these have been propelled by digital revolution. Digital technology such as computer technology, Global Positioning System (GPS) and Geographic Information System (GIS) has been having a tremendous impact on the field of technology. This development has positively affected the scope, methods, speed of data acquisition, data management and the rate of delivery of the results (map and other map products) of data processing. This paper tries to address the impact of revolution brought by digital technology.

Keywords: digital revolution, internet, technology, data management

Procedia PDF Downloads 447
24693 Haplotypes of the Human Leukocyte Antigen-G Different HIV-1 Groups from the Netherlands

Authors: A. Alyami, S. Christmas, K. Neeltje, G. Pollakis, B. Paxton, Z. Al-Bayati

Abstract:

The Human leukocyte antigen-G (HLA-G) molecule plays an important role in immunomodulation. To date, 16 untranslated regions (UTR) HLA-G haplotypes have been previously defined by sequenced SNPs in the coding region. From these, UTR-1, UTR-2, UTR-3, UTR-4, UTR-5, UTR-6 and UTR-7 are the most frequent 3’UTR haplotypes at the global level. UTR-1 is associated with higher levels of soluble HLA-G and HLA-G expression, whereas UTR-5 and UTR-7 are linked with low levels of soluble HLA-G and HLA-G expression. Human immunodeficiency virus type 1 (HIV-1) infection results in the progressive loss of immune function in infected individuals. The virus escape mechanism typically includes T lymphocytes and NK cell recognition and lyses by classical HLA-A and B down-regulation, which has been associated with non-classical HLA-G molecule up-regulation, respectively. We evaluated the haplotypes of the HLA-G 3′ untranslated region frequencies observed in three HIV-1 groups from the Netherlands and their susceptibility to develop infection. The three groups are made up of mainly men who have sex with men (MSM), injection drug users (IDU) and a high-risk-seronegative (HRSN) group. DNA samples were amplified with published primers prior sequencing. According to our results, the low expresser frequencies show higher in HRSN compared to other groups. This is indicating that 3’UTR polymorphisms may be identified as potential prognostic biomarkers to determine susceptibility to HIV.

Keywords: Human leukocyte antigen-G (HLA-G) , men who have sex with men (MSM), injection drug users (IDU), high-risk-seronegative (HRSN) group, high-untranslated region (UTR)

Procedia PDF Downloads 152
24692 BigCrypt: A Probable Approach of Big Data Encryption to Protect Personal and Business Privacy

Authors: Abdullah Al Mamun, Talal Alkharobi

Abstract:

As data size is growing up, people are became more familiar to store big amount of secret information into cloud storage. Companies are always required to need transfer massive business files from one end to another. We are going to lose privacy if we transmit it as it is and continuing same scenario repeatedly without securing the communication mechanism means proper encryption. Although asymmetric key encryption solves the main problem of symmetric key encryption but it can only encrypt limited size of data which is inapplicable for large data encryption. In this paper we propose a probable approach of pretty good privacy for encrypt big data using both symmetric and asymmetric keys. Our goal is to achieve encrypt huge collection information and transmit it through a secure communication channel for committing the business and personal privacy. To justify our method an experimental dataset from three different platform is provided. We would like to show that our approach is working for massive size of various data efficiently and reliably.

Keywords: big data, cloud computing, cryptography, hadoop, public key

Procedia PDF Downloads 315
24691 Implementation of Big Data Concepts Led by the Business Pressures

Authors: Snezana Savoska, Blagoj Ristevski, Violeta Manevska, Zlatko Savoski, Ilija Jolevski

Abstract:

Big data is widely accepted by the pharmaceutical companies as a result of business demands create through legal pressure. Pharmaceutical companies have many legal demands as well as standards’ demands and have to adapt their procedures to the legislation. To manage with these demands, they have to standardize the usage of the current information technology and use the latest software tools. This paper highlights some important aspects of experience with big data projects implementation in a pharmaceutical Macedonian company. These projects made improvements of their business processes by the help of new software tools selected to comply with legal and business demands. They use IT as a strategic tool to obtain competitive advantage on the market and to reengineer the processes towards new Internet economy and quality demands. The company is required to manage vast amounts of structured as well as unstructured data. For these reasons, they implement projects for emerging and appropriate software tools which have to deal with big data concepts accepted in the company.

Keywords: big data, unstructured data, SAP ERP, documentum

Procedia PDF Downloads 268
24690 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis

Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales

Abstract:

This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.

Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis

Procedia PDF Downloads 191
24689 The Isolation of Enterobacter Ludwigii Strain T976 from Nicotiana Tabacum L. Yunyan 97 and Its Application Study

Authors: Gao Qin, Hu Liwei, Dong Xiangzhou, Zhu Qifa, Cheng Tingming, Zhao Limei, Yang Mengmeng, Zhai Zhen, Dai Huaxin, Liang Taibo, Zhang Shixiang, Xue Chaoqun

Abstract:

The functional strain T976 for starch degradation was isolated from Nicotiana tabacum L. Yunyan 97 tobacco leaves, the ratio of starch hydrolysis transparent circle diameter to colony diameter of the strain was 4.14, 16S rDNA sequencing identified these strains as Enterobacter ludwigii. Then Enterobacter ludwigii T976 was fermented and spaying Yunyan 97 plant in vigorous growing stage. The results of once spraying fermentation broth of Enterobacter ludwigii T976 showed that starch content of upper leaves decreased slightly, from 3.77% to 3.1%, the reducing sugar content increased from 4.39% to 5.53%, and the total sugar content increased from 5.82% to 7.39%. The chemical content was also checked after three time spraying. The starch content of middle leaves decreased from 5.63% to 3.74%, while the content of total sugar and reducing sugar decreased slightly. And the starch content of upper leaves decreased from 7.62% to 4.78%, the total sugar and reducing sugar decreased slightly, and starch content of middle leaf decreased from 6.27% to 3.62%, the total sugar and reducing sugar did not change much, and other chemical components were in a suitable range.

Keywords: nicotiana tabacum, yunyan 97, leaf, starch, degradation, enterobacter ludwigii

Procedia PDF Downloads 53
24688 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network

Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar

Abstract:

Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.

Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network

Procedia PDF Downloads 514
24687 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 533
24686 Hierarchical Checkpoint Protocol in Data Grids

Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed

Abstract:

Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.

Keywords: data grids, fault tolerance, clustering, chandy-lamport

Procedia PDF Downloads 335
24685 Production of Lignocellulosic Enzymes by Bacillus safensis LCX Using Agro-Food Wastes in Solid State Fermentation

Authors: Abeer A. Q. Ahmed, Tracey McKay

Abstract:

The increasing demand for renewable fuels and chemicals is pressuring manufacturing industry toward finding more sustainable cost-effective resources. Lignocellulose, such as agro-food wastes, is a suitable equivalent to petroleum for fine chemicals and fuels production. The complex structure of lignocellulose, however, requires a variety of enzymes in order to degrade its components into their respective building blocks that can be used further for the production of various value added products. This study aimed to isolate bacterial strain with the ability to produce a variety of lignocellulosic enzymes. One bacterial isolate was identified by 16S rRNA gene sequencing and phylogenetic analysis as Bacillus safensis LCX found to have CMCase, xylanase, manganese peroxidase, lignin peroxidase, and laccase activities. The enzymes production was induced by growing Bacillus safensis LCX in solid state fermentation using wheat straw, wheat bran, and corn stover. The activities of enzymes were determined by specific colorimetric assays. This study presents Bacillus safensis LCX as a promising source for lignocellulosic enzymes. These findings can extend the knowledge on agro-food wastes valorization strategies toward a sustainable production of fuels and chemicals.

Keywords: Bacillus safensis LCX, high valued chemicals, lignocellulosic enzymes, solid state fermentation

Procedia PDF Downloads 292
24684 An Observation of the Information Technology Research and Development Based on Article Data Mining: A Survey Study on Science Direct

Authors: Muhammet Dursun Kaya, Hasan Asil

Abstract:

One of the most important factors of research and development is the deep insight into the evolutions of scientific development. The state-of-the-art tools and instruments can considerably assist the researchers, and many of the world organizations have become aware of the advantages of data mining for the acquisition of the knowledge required for the unstructured data. This paper was an attempt to review the articles on the information technology published in the past five years with the aid of data mining. A clustering approach was used to study these articles, and the research results revealed that three topics, namely health, innovation, and information systems, have captured the special attention of the researchers.

Keywords: information technology, data mining, scientific development, clustering

Procedia PDF Downloads 275
24683 Security in Resource Constraints: Network Energy Efficient Encryption

Authors: Mona Almansoori, Ahmed Mustafa, Ahmad Elshamy

Abstract:

Wireless nodes in a sensor network gather and process critical information designed to process and communicate, information flooding through such network is critical for decision making and data processing, the integrity of such data is one of the most critical factors in wireless security without compromising the processing and transmission capability of the network. This paper presents mechanism to securely transmit data over a chain of sensor nodes without compromising the throughput of the network utilizing available battery resources available at the sensor node.

Keywords: hybrid protocol, data integrity, lightweight encryption, neighbor based key sharing, sensor node data processing, Z-MAC

Procedia PDF Downloads 142
24682 Data Mining Techniques for Anti-Money Laundering

Authors: M. Sai Veerendra

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.

Keywords: data mining, clustering, money laundering, anti-money laundering solutions

Procedia PDF Downloads 532
24681 De Novo Assembly and Characterization of the Transcriptome from the Fluoroacetate Producing Plant, Dichapetalum Cymosum

Authors: Selisha A. Sooklal, Phelelani Mpangase, Shaun Aron, Karl Rumbold

Abstract:

Organically bound fluorine (C-F bond) is extremely rare in nature. Despite this, the first fluorinated secondary metabolite, fluoroacetate, was isolated from the plant Dichapetalum cymosum (commonly known as Gifblaar). However, the enzyme responsible for fluorination (fluorinase) in Gifblaar was never isolated and very little progress has been achieved in understanding this process in higher plants. Fluorinated compounds have vast applications in the pharmaceutical, agrochemical and fine chemicals industries. Consequently, an enzyme capable of catalysing a C-F bond has great potential as a biocatalyst in the industry considering that the field of fluorination is virtually synthetic. As with any biocatalyst, a range of these enzymes are required. Therefore, it is imperative to expand the exploration for novel fluorinases. This study aimed to gain molecular insights into secondary metabolite biosynthesis in Gifblaar using a high-throughput sequencing-based approach. Mechanical wounding studies were performed using Gifblaar leaf tissue in order to induce expression of the fluorinase. The transcriptome of the wounded and unwounded plant was then sequenced on the Illumina HiSeq platform. A total of 26.4 million short sequence reads were assembled into 77 845 transcripts using Trinity. Overall, 68.6 % of transcripts were annotated with gene identities using public databases (SwissProt, TrEMBL, GO, COG, Pfam, EC) with an E-value threshold of 1E-05. Sequences exhibited the greatest homology to the model plant, Arabidopsis thaliana (27 %). A total of 244 annotated transcripts were found to be differentially expressed between the wounded and unwounded plant. In addition, secondary metabolic pathways present in Gifblaar were successfully reconstructed using Pathway tools. Due to lack of genetic information for plant fluorinases, a transcript failed to be annotated as a fluorinating enzyme. Thus, a local database containing the 5 existing bacterial fluorinases was created. Fifteen transcripts having homology to partial regions of existing fluorinases were found. In efforts to obtain the full coding sequence of the Gifblaar fluorinase, primers were designed targeting the regions of homology and genome walking will be performed to amplify the unknown regions. This is the first genetic data available for Gifblaar. It has provided novel insights into the mechanisms of metabolite biosynthesis and will allow for the discovery of the first eukaryotic fluorinase.

Keywords: biocatalyst, fluorinase, gifblaar, transcriptome

Procedia PDF Downloads 269
24680 Decision Tree Based Scheduling for Flexible Job Shops with Multiple Process Plans

Authors: H.-H. Doh, J.-M. Yu, Y.-J. Kwon, J.-H. Shin, H.-W. Kim, S.-H. Nam, D.-H. Lee

Abstract:

This paper suggests a decision tree based approach for flexible job shop scheduling with multiple process plans, i. e. each job can be processed through alternative operations, each of which can be processed on alternative machines. The main decision variables are: (a) selecting operation/machine pair; and (b) sequencing the jobs assigned to each machine. As an extension of the priority scheduling approach that selects the best priority rule combination after many simulation runs, this study suggests a decision tree based approach in which a decision tree is used to select a priority rule combination adequate for a specific system state and hence the burdens required for developing simulation models and carrying out simulation runs can be eliminated. The decision tree based scheduling approach consists of construction and scheduling modules. In the construction module, a decision tree is constructed using a four-stage algorithm, and in the scheduling module, a priority rule combination is selected using the decision tree. To show the performance of the decision tree based approach suggested in this study, a case study was done on a flexible job shop with reconfigurable manufacturing cells and a conventional job shop, and the results are reported by comparing it with individual priority rule combinations for the objectives of minimizing total flow time and total tardiness.

Keywords: flexible job shop scheduling, decision tree, priority rules, case study

Procedia PDF Downloads 352
24679 Development of New Technology Evaluation Model by Using Patent Information and Customers' Review Data

Authors: Kisik Song, Kyuwoong Kim, Sungjoo Lee

Abstract:

Many global firms and corporations derive new technology and opportunity by identifying vacant technology from patent analysis. However, previous studies failed to focus on technologies that promised continuous growth in industrial fields. Most studies that derive new technology opportunities do not test practical effectiveness. Since previous studies depended on expert judgment, it became costly and time-consuming to evaluate new technologies based on patent analysis. Therefore, research suggests a quantitative and systematic approach to technology evaluation indicators by using patent data to and from customer communities. The first step involves collecting two types of data. The data is used to construct evaluation indicators and apply these indicators to the evaluation of new technologies. This type of data mining allows a new method of technology evaluation and better predictor of how new technologies are adopted.

Keywords: data mining, evaluating new technology, technology opportunity, patent analysis

Procedia PDF Downloads 371
24678 Rapid Detection and Differentiation of Camel Pox, Contagious Ecthyma and Papilloma Viruses in Clinical Samples of Camels Using a Multiplex PCR

Authors: A. I. Khalafalla, K. A. Al-Busada, I. M. El-Sabagh

Abstract:

Pox and pox-like diseases of camels are a group of exanthematous skin conditions that have become increasingly important economically. They may be caused by three distinct viruses: camelpox virus (CMPV), camel contagious ecthyma virus (CCEV) and camel papillomavirus (CAPV). These diseases are difficult to differentiate based on clinical presentation in disease outbreaks. Molecular methods such as PCR targeting species-specific genes have been developed and used to identify CMPV and CCEV, but not simultaneously in a single tube. Recently, multiplex PCR has gained reputation as a convenient diagnostic method with cost- and time–saving benefits. In the present communication, we describe the development, optimization and validation a multiplex PCR assays able to detect simultaneously the genome of the three viruses in one single test allowing for rapid and efficient molecular diagnosis. The assay was developed based on the evaluation and combination of published and new primer sets, and was applied to the detection of 110 tissue samples. The method showed high sensitivity, and the specificity was confirmed by PCR-product sequencing. In conclusion, this rapid, sensitive and specific assay is considered a useful method for identifying three important viruses in specimens from camels and as part of a molecular diagnostic regime.

Keywords: multiplex PCR, diagnosis, pox and pox-like diseases, camels

Procedia PDF Downloads 463
24677 MicroRNA Drivers of Resistance to Androgen Deprivation Therapy in Prostate Cancer

Authors: Philippa Saunders, Claire Fletcher

Abstract:

INTRODUCTION: Prostate cancer is the most prevalent malignancy affecting Western males. It is initially an androgen-dependent disease: androgens bind to the androgen receptor and drive the expression of genes that promote proliferation and evasion of apoptosis. Despite reduced androgen dependence in advanced prostate cancer, androgen receptor signaling remains a key driver of growth. Androgen deprivation therapy (ADT) is, therefore, a first-line treatment approach and works well initially, but resistance inevitably develops. Abiraterone and Enzalutamide are drugs widely used in ADT and are androgen synthesis and androgen receptor signaling inhibitors, respectively. The shortage of other treatment options means acquired resistance to these drugs is a major clinical problem. MicroRNAs (miRs) are important mediators of post-transcriptional gene regulation and show altered expression in cancer. Several have been linked to the development of resistance to ADT. Manipulation of such miRs may be a pathway to breakthrough treatments for advanced prostate cancer. This study aimed to validate ADT resistance-implicated miRs and their clinically relevant targets. MATERIAL AND METHOD: Small RNA-sequencing of Abiraterone- and Enzalutamide-resistant C42 prostate cancer cells identified subsets of miRs dysregulated as compared to parental cells. Real-Time Quantitative Reverse Transcription PCR (qRT-PCR) was used to validate altered expression of candidate ADT resistance-implicated miRs 195-5p, 497-5p and 29a-5p in ADT-resistant and -responsive prostate cancer cell lines, patient-derived xenografts (PDXs) and primary prostate cancer explants. RESULTS AND DISCUSSION: This study suggests a possible role for miR-497-5p in the development of ADT resistance in prostate cancer. MiR-497-5p expression was increased in ADT-resistant versus ADT-responsive prostate cancer cells. Importantly, miR-497-5p expression was also increased in Enzalutamide-treated, castrated (ADT-mimicking) PDXs versus intact PDXs. MiR-195-5p was also elevated in ADT-resistant versus -responsive prostate cancer cells, while there was a drop in miR-29a-5p expression. Candidate clinically relevant targets of miR-497-5p in prostate cancer were identified by mining AGO-PAR-CLIP-seq data sets and may include AVL9 and FZD6. CONCLUSION: In summary, this study identified microRNAs that are implicated in prostate cancer resistance to androgen deprivation therapy and could represent novel therapeutic targets for advanced disease.

Keywords: microRNA, androgen deprivation therapy, Enzalutamide, abiraterone, patient-derived xenograft

Procedia PDF Downloads 137
24676 Anomaly Detection Based on System Log Data

Authors: M. Kamel, A. Hoayek, M. Batton-Hubert

Abstract:

With the increase of network virtualization and the disparity of vendors, the continuous monitoring and detection of anomalies cannot rely on static rules. An advanced analytical methodology is needed to discriminate between ordinary events and unusual anomalies. In this paper, we focus on log data (textual data), which is a crucial source of information for network performance. Then, we introduce an algorithm used as a pipeline to help with the pretreatment of such data, group it into patterns, and dynamically label each pattern as an anomaly or not. Such tools will provide users and experts with continuous real-time logs monitoring capability to detect anomalies and failures in the underlying system that can affect performance. An application of real-world data illustrates the algorithm.

Keywords: logs, anomaly detection, ML, scoring, NLP

Procedia PDF Downloads 91
24675 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 369