Search results for: genome mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1397

Search results for: genome mining

1367 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 430
1366 In silico Comparative Analysis of Chloroplast Genome (cpDNA) and Some Individual Genes (rbcL and trnH-psbA) in Pooideae Subfamily Members

Authors: Ibrahim Ilker Ozyigit, Ertugrul Filiz, Ilhan Dogan

Abstract:

An in silico analysis of Brachypodium distachyon, Triticum aestivum, Festuca arundinacea, Lolium perenne, Hordeum vulgare subsp. vulgare of the Pooideaea was performed based on complete chloroplast genomes including rbcL coding and trnH-psbA intergenic spacer regions alone to compare phylogenetic resolving power. Neighbor-joining, Minimum Evolution, and Unweighted Pair Group Method with arithmetic mean methods were used to reconstruct phylogenies with the highest bootstrap supported the obtained data from whole chloroplast genome sequence. The highest and lowest values from nucleotide diversity (π) analysis were found to be 0.315813 and 0.043495 in rbcL coding region in chloroplast genome and complete chloroplast genome, respectively. The highest transition/transversion bias (R) value was recorded as 1.384 in complete chloroplast genomes. F. arudinacea-L. perenne clade was uncovered in all phylogenies. Sequences of rbcL and trnH-psbA regions were not able to resolve the Pooideae phylogenies due to lack of genetic variation.

Keywords: chloroplast DNA, Pooideae, phylogenetic analysis, rbcL, trnH-psbA

Procedia PDF Downloads 353
1365 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan

Authors: Dina Ahmad Alkhodary

Abstract:

This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.

Keywords: data, mining, development, business

Procedia PDF Downloads 471
1364 Healthcare Data Mining Innovations

Authors: Eugenia Jilinguirian

Abstract:

In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.

Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database

Procedia PDF Downloads 42
1363 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining

Procedia PDF Downloads 410
1362 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills

Authors: Kyle De Freitas, Margaret Bernard

Abstract:

Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.

Keywords: educational data mining, learning management system, learning analytics, EDM framework

Procedia PDF Downloads 300
1361 Societal Acceptability Conditions of Genome Editing for Upland Rice in Madagascar

Authors: Anny Lucrece Nlend Nkott, Ludovic Temple

Abstract:

The appearance in 2012 of the CRISPR-CaS9 genome editing technique marks a turning point in the field of genetics. This technique would make it possible to create new varieties quickly and cheaply. Although some consider CRISPR-CaS9 to be revolutionary, others consider it a potential societal threat. To document the controversy, we explain the socioeconomic conditions under which this technique could be accepted for the creation of a rainfed rice variety in Madagascar. The methodological framework is based on 38 individual and semistructured interviews, a multistakeholder forum with 27 participants, and a survey of 148 rice producers. Results reveal that the acceptability of genome editing requires (i) strengthening the seed system through the operationalization of regulatory structures and the upgrading of stakeholders' knowledge of genetically modified organisms, (ii) assessing the effects of the edited variety on biodiversity and soil nitrogen dynamics, and (iii) strengthening the technical and human capacities of the biosafety body. Structural mechanisms for regulating the seed system are necessary to ensure safe experimentation of genome editing techniques. Organizational innovation also appears to be necessary. The study documents how collective learning between communities of scientists and nonscientists is a component of systemic processes of varietal innovation. This study was carried out with the financial support of the GENERICE project (Generation and Deployment of Genome-Edited, Nitrogen-use-Efficient Rice Varieties), funded by the Agropolis Foundation.

Keywords: CRISPR-CaS9, varietal innovation, seed system, innovation system

Procedia PDF Downloads 124
1360 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 91
1359 Assessment of Prevalent Diseases Caused by Mining Activities in the Northern Part of Mindanao Island, Philippines

Authors: Odinah Cuartero-Enteria, Kyla Rita Mercado, Jason Salamanes, Aian Pecasales, Sherwin Sabado

Abstract:

The northern part of Mindanao Island, Philippines has sizable reserve of mineral resources. Years ago, mining activities have been flourishing which resulted to both local economic gain but with environmental concerns. This study investigates the prevalent diseases by mining activities in these areas. The study was done using the secondary data gathered from the Rural Health Units (RHU) of the selected areas. The study further determined the prevalent diseases that existed in the three areas from years 2005, 2010 and 2015 indicating before the mining activities and when mining activities are present. The results show that areas which are far from mining activities have fewer cases of patients suffering from air-borne diseases. The top ten most common diseases such as pneumonia, tuberculosis, influenza, upper respiratory tract infection (URTI) and skin diseases were caused by air-borne due to air pollution. Hence, the places where mining activities are present contribute to the prevalent diseases. Thus, addressing the air pollution caused by mining activities is very important.

Keywords: Philippines, Mindanao Island, mining activities, pollution, prevalent diseases

Procedia PDF Downloads 448
1358 Analysis of Endogenous Sirevirus in Germinating Barley (Hordeum vulgare L.)

Authors: Nermin Gozukirmizi, Buket Cakmak, Sevgi Marakli

Abstract:

Sireviruses are genera of copia LTR retrotransposons with a unique genome structure among retrotransposons. Barley (Hordeum vulgare L.) is an economically important plant and has been studied as a model plant regarding its short annual life cycle and seven chromosome pairs. In this study, we used mature barley embryos, 10-day-old roots and 10-day-old leaves derived from the same barley plant to investigate SIRE1 retrotransposon movements by Inter-Retrotransposon Amplified Polymorphism (IRAP) technique. We found polymorphism rates between 0-64% among embryos, roots and leaves. Polymorphism rates were detected to be 0-27% among embryos, 8-60% among roots, and 11-50% among leaves. Polymorphisms were observed not only among the parts of different individuals, but also on the parts of the same plant (23-64%). The internal domains of SIRE1 (gag, env and rt) were also analyzed in the embryos, roots and leaves. Analysis of band profiles showed no polymorphism for gag, however, different band patterns were observed among samples for rt and env. The sequencing of SIRE1 gag, env and rt domains revealed 79% similarity for gag, 95% for env and 84% for rt to Ty1-copia retrotransposons. SIRE1 retrotransposon was identified in the soybean genome and has been studied on other plants (maize, rice, tomatoe etc.). This study is the first detailed investigation of SIRE1 in barley genome. The obtained findings are expected to contribute to the comprehension of SIRE1 retrotransposon and its role in barley genome.

Keywords: barley, polymorphism, retrotransposon, SIRE1 virus

Procedia PDF Downloads 283
1357 CRISPR-DT: Designing gRNAs for the CRISPR-Cpf1 System with Improved Target Efficiency and Specificity

Authors: Houxiang Zhu, Chun Liang

Abstract:

The CRISPR-Cpf1 system has been successfully applied in genome editing. However, target efficiency of the CRISPR-Cpf1 system varies among different gRNA sequences. The published CRISPR-Cpf1 gRNA data was reanalyzed. Many sequences and structural features of gRNAs (e.g., the position-specific nucleotide composition, position-nonspecific nucleotide composition, GC content, minimum free energy, and melting temperature) correlated with target efficiency were found. Using machine learning technology, a support vector machine (SVM) model was created to predict target efficiency for any given gRNAs. The first web service application, CRISPR-DT (CRISPR DNA Targeting), has been developed to help users design optimal gRNAs for the CRISPR-Cpf1 system by considering both target efficiency and specificity. CRISPR-DT will empower researchers in genome editing.

Keywords: CRISPR-Cpf1, genome editing, target efficiency, target specificity

Procedia PDF Downloads 238
1356 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 69
1355 Cloud Computing in Data Mining: A Technical Survey

Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham

Abstract:

Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.

Keywords: cloud computing, data mining, computing models, cloud services

Procedia PDF Downloads 453
1354 Mining Diagnostic Investigation Process

Authors: Sohail Imran, Tariq Mahmood

Abstract:

In complex healthcare diagnostic investigation process, medical practitioners have to focus on ways to standardize their processes to perform high quality care and optimize the time and costs. Process mining techniques can be applied to extract process related knowledge from data without considering causal and dynamic dependencies in business domain and processes. The application of process mining is effective in diagnostic investigation. It is very helpful where a treatment gives no dispositive evidence favoring it. In this paper, we applied process mining to discover important process flow of diagnostic investigation for hepatitis patients. This approach has some benefits which can enhance the quality and efficiency of diagnostic investigation processes.

Keywords: process mining, healthcare, diagnostic investigation process, process flow

Procedia PDF Downloads 496
1353 Analysis of Reliability of Mining Shovel Using Weibull Model

Authors: Anurag Savarnya

Abstract:

The reliability of the various parts of electric mining shovel has been assessed through the application of Weibull Model. The study was initiated to find reliability of components of electric mining shovel. The paper aims to optimize the reliability of components and increase the life cycle of component. A multilevel decomposition of the electric mining shovel was done and maintenance records were used to evaluate the failure data and appropriate system characterization was done to model the system in terms of reasonable number of components. The approach used develops a mathematical model to assess the reliability of the electric mining shovel components. The model can be used to predict reliability of components of the hydraulic mining shovel and system performance. Reliability is an inherent attribute to a system. When the life-cycle costs of a system are being analyzed, reliability plays an important role as a major driver of these costs and has considerable influence on system performance. It is an iterative process that begins with specification of reliability goals consistent with cost and performance objectives. The data were collected from an Indian open cast coal mine and the reliability of various components of the electric mining shovel has been assessed by following a Weibull Model.

Keywords: reliability, Weibull model, electric mining shovel

Procedia PDF Downloads 479
1352 An Adaptive Distributed Incremental Association Rule Mining System

Authors: Adewale O. Ogunde, Olusegun Folorunso, Adesina S. Sodiya

Abstract:

Most existing Distributed Association Rule Mining (DARM) systems are still facing several challenges. One of such challenges that have not received the attention of many researchers is the inability of existing systems to adapt to constantly changing databases and mining environments. In this work, an Adaptive Incremental Mining Algorithm (AIMA) is therefore proposed to address these problems. AIMA employed multiple mobile agents for the entire mining process. AIMA was designed to adapt to changes in the distributed databases by mining only the incremental database updates and using this to update the existing rules in order to improve the overall response time of the DARM system. In AIMA, global association rules were integrated incrementally from one data site to another through Results Integration Coordinating Agents. The mining agents in AIMA were made adaptive by defining mining goals with reasoning and behavioral capabilities and protocols that enabled them to either maintain or change their goals. AIMA employed Java Agent Development Environment Extension for designing the internal agents’ architecture. Results from experiments conducted on real datasets showed that the adaptive system, AIMA performed better than the non-adaptive systems with lower communication costs and higher task completion rates.

Keywords: adaptivity, data mining, distributed association rule mining, incremental mining, mobile agents

Procedia PDF Downloads 370
1351 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 474
1350 From Genome to Field: Applying Genome Wide Association Study for Sustainable Ascochyta Blight Management in Faba Beans

Authors: Rabia Faridi, Rizwana Maqbool, Umara Sahar Rana, Zaheer Ahmad

Abstract:

Climate change impacts agriculture, notably in Germany, where spring faba beans predominate. However, improved winter hardiness aligns with milder winters, enabling autumn-sown varieties. Genetic resistance to Ascochyta blight is vital for crop integration. Traditional breeding faces challenges due to complex inheritance. This study assessed 224 homozygous faba bean lines for Ascochyta resistance traits. To achieve h²>70%, 12 replicates were required (realized h²=87%). Genetic variation and strong trait correlations were observed. Five lines outperformed 29H, while three were highly susceptible. A genome-wide association study (GWAS) with 188 inbred lines and 2058 markers, including 17 guide SNP markers, identified 12 markers associated with resistance traits, potentially indicating new resistance genes. One guide marker (Vf-Mt1g014230-001) on chromosome III validated a known QTL. The guided marker approach complemented GWAS, facilitating marker-assisted selection for Ascochyta resistance. The Göttingen Winter Bean Population offers promise for resistance breeding.

Keywords: genome wide association studies, marker assisted breeding, faba bean, ascochyta blight

Procedia PDF Downloads 35
1349 Data Mining As A Tool For Knowledge Management: A Review

Authors: Maram Saleh

Abstract:

Knowledge has become an essential resource in today’s economy and become the most important asset of maintaining competition advantage in organizations. The importance of knowledge has made organizations to manage their knowledge assets and resources through all multiple knowledge management stages such as: Knowledge Creation, knowledge storage, knowledge sharing and knowledge use. Researches on data mining are continues growing over recent years on both business and educational fields. Data mining is one of the most important steps of the knowledge discovery in databases process aiming to extract implicit, unknown but useful knowledge and it is considered as significant subfield in knowledge management. Data miming have the great potential to help organizations to focus on extracting the most important information on their data warehouses. Data mining tools and techniques can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This review paper explores the applications of data mining techniques in supporting knowledge management process as an effective knowledge discovery technique. In this paper, we identify the relationship between data mining and knowledge management, and then focus on introducing some application of date mining techniques in knowledge management for some real life domains.

Keywords: Data Mining, Knowledge management, Knowledge discovery, Knowledge creation.

Procedia PDF Downloads 182
1348 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff

Abstract:

Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 322
1347 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 493
1346 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 510
1345 Data Mining Techniques for Anti-Money Laundering

Authors: M. Sai Veerendra

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.

Keywords: data mining, clustering, money laundering, anti-money laundering solutions

Procedia PDF Downloads 516
1344 Development of a Geomechanical Risk Assessment Model for Underground Openings

Authors: Ali Mortazavi

Abstract:

The main objective of this research project is to delve into a multitude of geomechanical risks associated with various mining methods employed within the underground mining industry. Controlling geotechnical design parameters and operational factors affecting the selection of suitable mining techniques for a given underground mining condition will be considered from a risk assessment point of view. Important geomechanical challenges will be investigated as appropriate and relevant to the commonly used underground mining methods. Given the complicated nature of rock mass in-situ and complicated boundary conditions and operational complexities associated with various underground mining methods, the selection of a safe and economic mining operation is of paramount significance. Rock failure at varying scales within the underground mining openings is always a threat to mining operations and causes human and capital losses worldwide. Geotechnical design is a major design component of all underground mines and basically dominates the safety of an underground mine. With regard to uncertainties that exist in rock characterization prior to mine development, there are always risks associated with inappropriate design as a function of mining conditions and the selected mining method. Uncertainty often results from the inherent variability of rock masse, which in turn is a function of both geological materials and rock mass in-situ conditions. The focus of this research is on developing a methodology which enables a geomechanical risk assessment of given underground mining conditions. The outcome of this research is a geotechnical risk analysis algorithm, which can be used as an aid in selecting the appropriate mining method as a function of mine design parameters (e.g., rock in-situ properties, design method, governing boundary conditions such as in-situ stress and groundwater, etc.).

Keywords: geomechanical risk assessment, rock mechanics, underground mining, rock engineering

Procedia PDF Downloads 120
1343 Genome Sequencing of the Yeast Saccharomyces cerevisiae Strain 202-3

Authors: Yina A. Cifuentes Triana, Andrés M. Pinzón Velásco, Marío E. Velásquez Lozano

Abstract:

In this work the sequencing and genome characterization of a natural isolate of Saccharomyces cerevisiae yeast (strain 202-3), identified with potential for the production of second generation ethanol from sugarcane bagasse hydrolysates is presented. This strain was selected because its capability to consume xylose during the fermentation of sugarcane bagasse hydrolysates, taking into account that many strains of S. cerevisiae are incapable of processing this sugar. This advantage and other prominent positive aspects during fermentation profiles evaluated in bagasse hydrolysates made the strain 202-3 a candidate strain to improve the production of second-generation ethanol, which was proposed as a first step to study the strain at the genomic level. The molecular characterization was carried out by genome sequencing with the Illumina HiSeq 2000 platform paired end; the assembly was performed with different programs, finally choosing the assembler ABYSS with kmer 89. Gene prediction was developed with the approach of hidden Markov models with Augustus. The genes identified were scored based on similarity with public databases of nucleotide and protein. Records were organized from ontological functions at different hierarchical levels, which identified central metabolic functions and roles of the S. cerevisiae strain 202-3, highlighting the presence of four possible new proteins, two of them probably associated with the positive consumption of xylose.

Keywords: cellulosic ethanol, Saccharomyces cerevisiae, genome sequencing, xylose consumption

Procedia PDF Downloads 300
1342 Mining in Nigeria and Development Effort of Metallurgical Technologies at National Metallurgical Development Center Jos, Plateau State-Nigeria

Authors: Linus O. Asuquo

Abstract:

Mining in Nigeria and development effort of metallurgical technologies at National Metallurgical Development Centre Jos has been addressed in this paper. The paper has looked at the history of mining in Nigeria, the impact of mining on social and industrial development, and the contribution of the mining sector to Nigeria’s Gross Domestic Product (GDP). The paper clearly stated that Nigeria’s mining sector only contributes 0.5% to the nation’s GDP unlike Botswana that the mining sector contributes 38% to the nation’s GDP. Nigeria Bureau of Statistics has it on record that Nigeria has about 44 solid minerals awaiting to be exploited. Clearly highlighted by this paper is the abundant potentials that exist in the mining sector for investment. The paper made an exposition on the extensive efforts made at National Metallurgical Development Center (NMDC) to develop metallurgical technologies in various areas of the metals sector; like mineral processing, foundry development, nonferrous metals extraction, materials testing, lime calcination, ANO (Trade name for powder lubricant) wire drawing lubricant, refractories and many others. The paper went ahead to draw a conclusion that there is a need to develop the mining sector in Nigeria and to give a sustainable support to the efforts currently made at NMDC to develop metallurgical technologies which are capable of transforming the metals sector in Nigeria, which will lead to industrialization. Finally the paper made some recommendations which traverse the topic for the best expectation.

Keywords: mining, minerals, technologies, value addition

Procedia PDF Downloads 73
1341 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 138
1340 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis

Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa

Abstract:

Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.

Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM

Procedia PDF Downloads 429
1339 Systematic Identification of Noncoding Cancer Driver Somatic Mutations

Authors: Zohar Manber, Ran Elkon

Abstract:

Accumulation of somatic mutations (SMs) in the genome is a major driving force of cancer development. Most SMs in the tumor's genome are functionally neutral; however, some cause damage to critical processes and provide the tumor with a selective growth advantage (termed cancer driver mutations). Current research on functional significance of SMs is mainly focused on finding alterations in protein coding sequences. However, the exome comprises only 3% of the human genome, and thus, SMs in the noncoding genome significantly outnumber those that map to protein-coding regions. Although our understanding of noncoding driver SMs is very rudimentary, it is likely that disruption of regulatory elements in the genome is an important, yet largely underexplored mechanism by which somatic mutations contribute to cancer development. The expression of most human genes is controlled by multiple enhancers, and therefore, it is conceivable that regulatory SMs are distributed across different enhancers of the same target gene. Yet, to date, most statistical searches for regulatory SMs have considered each regulatory element individually, which may reduce statistical power. The first challenge in considering the cumulative activity of all the enhancers of a gene as a single unit is to map enhancers to their target promoters. Such mapping defines for each gene its set of regulating enhancers (termed "set of regulatory elements" (SRE)). Considering multiple enhancers of each gene as one unit holds great promise for enhancing the identification of driver regulatory SMs. However, the success of this approach is greatly dependent on the availability of comprehensive and accurate enhancer-promoter (E-P) maps. To date, the discovery of driver regulatory SMs has been hindered by insufficient sample sizes and statistical analyses that often considered each regulatory element separately. In this study, we analyzed more than 2,500 whole-genome sequence (WGS) samples provided by The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) in order to identify such driver regulatory SMs. Our analyses took into account the combinatorial aspect of gene regulation by considering all the enhancers that control the same target gene as one unit, based on E-P maps from three genomics resources. The identification of candidate driver noncoding SMs is based on their recurrence. We searched for SREs of genes that are "hotspots" for SMs (that is, they accumulate SMs at a significantly elevated rate). To test the statistical significance of recurrence of SMs within a gene's SRE, we used both global and local background mutation rates. Using this approach, we detected - in seven different cancer types - numerous "hotspots" for SMs. To support the functional significance of these recurrent noncoding SMs, we further examined their association with the expression level of their target gene (using gene expression data provided by the ICGC and TCGA for samples that were also analyzed by WGS).

Keywords: cancer genomics, enhancers, noncoding genome, regulatory elements

Procedia PDF Downloads 85
1338 Revealing the Genome Based Biosynthetic Potential of a Streptomyces sp. Isolate BR123 Presenting Broad Spectrum Antimicrobial Activities

Authors: Neelma Ashraf

Abstract:

Actinomycetes, particularly genus Streptomyces is of great importance due to their role in the discovery of new natural products, particularly antimicrobial secondary metabolites in the medicinal science and biotechnology industry. Different Streptomyces strains were isolated from Helianthus annuus plants and tested for antibacterial and antifungal activities. The most promising five strains were chosen for further investigation, and growth conditions for antibiotic synthesis were optimised. The supernatants were extracted in different solvents, and the extracted products were analyzed using liquid chromatography-mass spectrometry (LC-MS) and biological testing. From one of the potent strains Streptomyces globusus sp. BR123, a compound lavendamycin was identified using these analytical techniques. In addition, this potent strain also produces a strong antifungal polyene compound with a quasimolecular ion of 2072. Streptomyces sp. BR123 was genome sequenced because of its promising antimicrobial potential in order to identify the gene cluster responsible for analyzed compound “lavendamycin”. The genome analysis yielded candidate genes responsible for the production of this potent compound. The genome sequence of 8.15 Mb of Streptomyces sp. isolate BR123 with a GC content of 72.63% and 8103 protein coding genes was attained. Many antimicrobial, antiparasitic, and anticancerous compounds were detected through multiple biosynthetic gene clusters predicted by in-Silico analysis. Though, the novelty of metabolites was determined through the insignificant resemblance with known biosynthetic gene clusters. The current study gives insight into the bioactive potential of Streptomyces sp. isolate BR123 with respect to the synthesis of bioactive secondary metabolites through genomic and spectrometric analysis. Moreover, the comparative genome study revealed the connection of isolate BR123 with other Streptomyces strains, which could expand the knowledge of this genus and the mechanism involved in the discovery of new antimicrobial metabolites.

Keywords: streptomyces, secondary metabolites, genome, biosynthetic gene clusters, high performance liquid chromatography, mass spectrometry

Procedia PDF Downloads 47