Search results for: Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24815

Search results for: Data Mining

24185 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 376
24184 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 307
24183 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin is an emerging research topic that attracted researchers in the last decade. It is used in many fields, such as smart manufacturing and smart healthcare because it saves time and money. It is usually related to other technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, Human digital twin (HDT), in specific, is still a novel idea that still needs to prove its feasibility. HDT expands the idea of Digital Twin to human beings, which are living beings and different from the inanimate physical entities. The goal of this research was to create a Human digital twin that is responsible for real-time human replies automation by simulating human behavior. For this reason, clustering, supervised classification, topic extraction, and sentiment analysis were studied in this paper. The feasibility of the HDT for personal replies generation on social messaging applications was proved in this work. The overall accuracy of the proposed approach in this paper was 63% which is a very promising result that can open the way for researchers to expand the idea of HDT. This was achieved by using Random Forest for clustering the question data base and matching new questions. K-nearest neighbor was also applied for sentiment analysis.

Keywords: human digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification, clustering

Procedia PDF Downloads 76
24182 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 437
24181 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: building energy prediction, data mining, demand response, electricity market

Procedia PDF Downloads 297
24180 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 341
24179 Analysis and Identification of Different Factors Affecting Students’ Performance Using a Correlation-Based Network Approach

Authors: Jeff Chak-Fu Wong, Tony Chun Yin Yip

Abstract:

The transition from secondary school to university seems exciting for many first-year students but can be more challenging than expected. Enabling instructors to know students’ learning habits and styles enhances their understanding of the students’ learning backgrounds, allows teachers to provide better support for their students, and has therefore high potential to improve teaching quality and learning, especially in any mathematics-related courses. The aim of this research is to collect students’ data using online surveys, to analyze students’ factors using learning analytics and educational data mining and to discover the characteristics of the students at risk of falling behind in their studies based on students’ previous academic backgrounds and collected data. In this paper, we use correlation-based distance methods and mutual information for measuring student factor relationships. We then develop a factor network using the Minimum Spanning Tree method and consider further study for analyzing the topological properties of these networks using social network analysis tools. Under the framework of mutual information, two graph-based feature filtering methods, i.e., unsupervised and supervised infinite feature selection algorithms, are used to analyze the results for students’ data to rank and select the appropriate subsets of features and yield effective results in identifying the factors affecting students at risk of failing. This discovered knowledge may help students as well as instructors enhance educational quality by finding out possible under-performers at the beginning of the first semester and applying more special attention to them in order to help in their learning process and improve their learning outcomes.

Keywords: students' academic performance, correlation-based distance method, social network analysis, feature selection, graph-based feature filtering method

Procedia PDF Downloads 112
24178 The Implementation of Corporate Social Responsibility to Contribute the Isolated District and the Drop behind District to Overcome the Poverty, Study Cases: PT. Kaltim Prima Coal (KPC) Sanggata, East Borneo, Indonesia

Authors: Sri Suryaningsum

Abstract:

The achievement ‘Best Practice Model’ holds by the government on behalf of the success implementation corporate social responsibility program that held on PT. Kaltim Prima Coal which had operation located in the isolated district in Sanggata, it could be the reference for the other companies to improve the social welfare in surrounding area, especially for the companies that have operated in the isolated area in Indonesia. The rule of Kaltim Prima Coal as the catalyst in the development progress to push up the independence of district especially for the district which has located in surrounding mining operation from village level to the regency level, those programs had written in the 7 field program in Corporate Social Responsibility, it was doing by stakeholders. The stakeholders are village government, sub-district government, Regency and citizen. One of the best programs that implement at PT. Kaltim Prima Coal is Regarding Resettlement that was completed based on Asian Development Bank Resettlement Best Practice and International Financial Corporation Resettlement Action Plan. This program contributed on the resettlement residences to develop the isolated and the neglected district.

Keywords: CSR, isolated, neglected, poverty, mining industry

Procedia PDF Downloads 234
24177 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 65
24176 Development of a Technology Assessment Model by Patents and Customers' Review Data

Authors: Kisik Song, Sungjoo Lee

Abstract:

Recent years have seen an increasing number of patent disputes due to excessive competition in the global market and a reduced technology life-cycle; this has increased the risk of investment in technology development. While many global companies have started developing a methodology to identify promising technologies and assess for decisions, the existing methodology still has some limitations. Post hoc assessments of the new technology are not being performed, especially to determine whether the suggested technologies turned out to be promising. For example, in existing quantitative patent analysis, a patent’s citation information has served as an important metric for quality assessment, but this analysis cannot be applied to recently registered patents because such information accumulates over time. Therefore, we propose a new technology assessment model that can replace citation information and positively affect technological development based on post hoc analysis of the patents for promising technologies. Additionally, we collect customer reviews on a target technology to extract keywords that show the customers’ needs, and we determine how many keywords are covered in the new technology. Finally, we construct a portfolio (based on a technology assessment from patent information) and a customer-based marketability assessment (based on review data), and we use them to visualize the characteristics of the new technologies.

Keywords: technology assessment, patents, citation information, opinion mining

Procedia PDF Downloads 448
24175 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 312
24174 An Analysis System for Integrating High-Throughput Transcript Abundance Data with Metabolic Pathways in Green Algae

Authors: Han-Qin Zheng, Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Wen-Chi Chang

Abstract:

As the most important non-vascular plants, algae have many research applications, including high species diversity, biofuel sources, adsorption of heavy metals and, following processing, health supplements. With the increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes, an integrated resource for retrieving gene expression data and metabolic pathway is essential for functional analysis and systems biology in algae. However, gene expression profiles and biological pathways are displayed separately in current resources, and making it impossible to search current databases directly to identify the cellular response mechanisms. Therefore, this work develops a novel AlgaePath database to retrieve gene expression profiles efficiently under various conditions in numerous metabolic pathways. AlgaePath, a web-based database, integrates gene information, biological pathways, and next-generation sequencing (NGS) datasets in Chlamydomonasreinhardtii and Neodesmus sp. UTEX 2219-4. Users can identify gene expression profiles and pathway information by using five query pages (i.e. Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-Expression Analysis). The gene expression data of 45 and 4 samples can be obtained directly on pathway maps in C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively. Genes that are differentially expressed between two conditions can be identified in Folds Search. Furthermore, the Gene Group Analysis of AlgaePath includes pathway enrichment analysis, and can easily compare the gene expression profiles of functionally related genes in a map. Finally, Co-Expression Analysis provides co-expressed transcripts of a target gene. The analysis results provide a valuable reference for designing further experiments and elucidating critical mechanisms from high-throughput data. More than an effective interface to clarify the transcript response mechanisms in different metabolic pathways under various conditions, AlgaePath is also a data mining system to identify critical mechanisms based on high-throughput sequencing.

Keywords: next-generation sequencing (NGS), algae, transcriptome, metabolic pathway, co-expression

Procedia PDF Downloads 392
24173 Applying Arima Data Mining Techniques to ERP to Generate Sales Demand Forecasting: A Case Study

Authors: Ghaleb Y. Abbasi, Israa Abu Rumman

Abstract:

This paper modeled sales history archived from 2012 to 2015 bulked in monthly bins for five products for a medical supply company in Jordan. The sales forecasts and extracted consistent patterns in the sales demand history from the Enterprise Resource Planning (ERP) system were used to predict future forecasting and generate sales demand forecasting using time series analysis statistical technique called Auto Regressive Integrated Moving Average (ARIMA). This was used to model and estimate realistic sales demand patterns and predict future forecasting to decide the best models for five products. Analysis revealed that the current replenishment system indicated inventory overstocking.

Keywords: ARIMA models, sales demand forecasting, time series, R code

Procedia PDF Downloads 365
24172 Classification of Forest Types Using Remote Sensing and Self-Organizing Maps

Authors: Wanderson Goncalves e Goncalves, José Alberto Silva de Sá

Abstract:

Human actions are a threat to the balance and conservation of the Amazon forest. Therefore the environmental monitoring services play an important role as the preservation and maintenance of this environment. This study classified forest types using data from a forest inventory provided by the 'Florestal e da Biodiversidade do Estado do Pará' (IDEFLOR-BIO), located between the municipalities of Santarém, Juruti and Aveiro, in the state of Pará, Brazil, covering an area approximately of 600,000 hectares, Bands 3, 4 and 5 of the TM-Landsat satellite image, and Self - Organizing Maps. The information from the satellite images was extracted using QGIS software 2.8.1 Wien and was used as a database for training the neural network. The midpoints of each sample of forest inventory have been linked to images. Later the Digital Numbers of the pixels have been extracted, composing the database that fed the training process and testing of the classifier. The neural network was trained to classify two forest types: Rain Forest of Lowland Emerging Canopy (Dbe) and Rain Forest of Lowland Emerging Canopy plus Open with palm trees (Dbe + Abp) in the Mamuru Arapiuns glebes of Pará State, and the number of examples in the training data set was 400, 200 examples for each class (Dbe and Dbe + Abp), and the size of the test data set was 100, with 50 examples for each class (Dbe and Dbe + Abp). Therefore, total mass of data consisted of 500 examples. The classifier was compiled in Orange Data Mining 2.7 Software and was evaluated in terms of the confusion matrix indicators. The results of the classifier were considered satisfactory, and being obtained values of the global accuracy equal to 89% and Kappa coefficient equal to 78% and F1 score equal to 0,88. It evaluated also the efficiency of the classifier by the ROC plot (receiver operating characteristics), obtaining results close to ideal ratings, showing it to be a very good classifier, and demonstrating the potential of this methodology to provide ecosystem services, particularly in anthropogenic areas in the Amazon.

Keywords: artificial neural network, computational intelligence, pattern recognition, unsupervised learning

Procedia PDF Downloads 347
24171 The Investigation of Enzymatic Activity in the Soils Under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia

Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan

Abstract:

Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoestearse and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.

Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity

Procedia PDF Downloads 274
24170 Aviation versus Aerospace: A Differential Analysis of Workforce Jobs via Text Mining

Authors: Sarah Werner, Michael J. Pritchard

Abstract:

From pilots to engineers, the skills development within the aerospace industry is exceptionally broad. Employers often struggle with finding the right mixture of qualified skills to fill their organizational demands. This effort to find qualified talent is further complicated by the industrial delineation between two key areas: aviation and aerospace. In a broad sense, the aerospace industry overlaps with the aviation industry. In turn, the aviation industry is a smaller sector segment within the context of the broader definition of the aerospace industry. Furthermore, it could be conceptually argued that -in practice- there is little distinction between these two sectors (i.e., aviation and aerospace). However, through our unstructured text analysis of over 6,000 job listings captured, our team found a clear delineation between aviation-related jobs and aerospace-related jobs. Using techniques in natural language processing, our research identifies an integrated workforce skill pattern that clearly breaks between these two sectors. While the aviation sector has largely maintained its need for pilots, mechanics, and associated support personnel, the staffing needs of the aerospace industry are being progressively driven by integrative engineering needs. Increasingly, this is leading many aerospace-based organizations towards the acquisition of 'system level' staffing requirements. This research helps to better align higher educational institutions with the current industrial staffing complexities within the broader aerospace sector.

Keywords: aerospace industry, job demand, text mining, workforce development

Procedia PDF Downloads 248
24169 Optimization of Manufacturing Process Parameters: An Empirical Study from Taiwan's Tech Companies

Authors: Chao-Ton Su, Li-Fei Chen

Abstract:

The parameter design is crucial to improving the uniformity of a product or process. In the product design stage, parameter design aims to determine the optimal settings for the parameters of each element in the system, thereby minimizing the functional deviations of the product. In the process design stage, parameter design aims to determine the operating settings of the manufacturing processes so that non-uniformity in manufacturing processes can be minimized. The parameter design, trying to minimize the influence of noise on the manufacturing system, plays an important role in the high-tech companies. Taiwan has many well-known high-tech companies, which show key roles in the global economy. Quality remains the most important factor that enables these companies to sustain their competitive advantage. In Taiwan however, many high-tech companies face various quality problems. A common challenge is related to root causes and defect patterns. In the R&D stage, root causes are often unknown, and defect patterns are difficult to classify. Additionally, data collection is not easy. Even when high-volume data can be collected, data interpretation is difficult. To overcome these challenges, high-tech companies in Taiwan use more advanced quality improvement tools. In addition to traditional statistical methods and quality tools, the new trend is the application of powerful tools, such as neural network, fuzzy theory, data mining, industrial engineering, operations research, and innovation skills. In this study, several examples of optimizing the parameter settings for the manufacturing process in Taiwan’s tech companies will be presented to illustrate proposed approach’s effectiveness. Finally, a discussion of using traditional experimental design versus the proposed approach for process optimization will be made.

Keywords: quality engineering, parameter design, neural network, genetic algorithm, experimental design

Procedia PDF Downloads 128
24168 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: data mining, information retrieval system, multi-label, problem transformation, histogram of gradients

Procedia PDF Downloads 358
24167 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks

Procedia PDF Downloads 276
24166 A Case Study of Ontology-Based Sentiment Analysis for Fan Pages

Authors: C. -L. Huang, J. -H. Ho

Abstract:

Social media has become more and more important in our life. Many enterprises promote their services and products to fans via the social media. The positive or negative sentiment of feedbacks from fans is very important for enterprises to improve their products, services, and promotion activities. The purpose of this paper is to understand the sentiment of the fan’s responses by analyzing the responses posted by fans on Facebook. The entity and aspect of fan’s responses were analyzed based on a predefined ontology. The ontology for cell phone sentiment analysis consists of aspect categories on the top level as follows: overall, shape, hardware, brand, price, and service. Each category consists of several sub-categories. All aspects for a fan’s response were found based on the ontology, and their corresponding sentimental terms were found using lexicon-based approach. The sentimental scores for aspects of fan responses were obtained by summarizing the sentimental terms in responses. The frequency of 'like' was also weighted in the sentimental score calculation. Three famous cell phone fan pages on Facebook were selected as demonstration cases to evaluate performances of the proposed methodology. Human judgment by several domain experts was also built for performance comparison. The performances of proposed approach were as good as those of human judgment on precision, recall and F1-measure.

Keywords: opinion mining, ontology, sentiment analysis, text mining

Procedia PDF Downloads 218
24165 Effect of Heavy Metals on the Life History Trait of Heterocephalobellus sp. and Cephalobus sp. (Nematode: Cephalobidae) Collected from a Small-Scale Mining Site, Davao de Oro, Philippines

Authors: Alissa Jane S. Mondejar, Florifern C. Paglinawan, Nanette Hope N. Sumaya, Joey Genevieve T. Martinez, Mylah Villacorte-Tabelin

Abstract:

Mining is associated with increased heavy metals in the environment, and heavy metal contamination disrupts the activities of soil fauna, such as nematodes, causing changes in the function of the soil ecosystem. Previous studies found that nematode community composition and diversity indices were strongly affected by heavy metals (e.g., Pb, Cu, and Zn). In this study, the influence of heavy metals on nematode survivability and reproduction were investigated. Life history analysis of the free-living nematodes, Heterocephalobellus sp. and Cephalobus sp. (Rhabditida: Cephalobidae) were assessed using the hanging drop technique, a technique often used in life history trait experiments. The nematodes were exposed to different temperatures, i.e.,20°C, 25°C, and 30°C, in different groups (control and heavy metal exposed) and fed with the same bacterial density of 1×109 Escherichia coli cells ml-1 for 30 days. Results showed that increasing temperature and exposure to heavy metals had a significant influence on the survivability and egg production of both species. Heterocephalobellus sp. and Cephalobus sp., when exposed to 20°C survived longer and produced few numbers of eggs but without subsequent hatching. Life history parameters of Heterocephalobellus sp. showed that the value of parameters was higher in the control group under net production rate (R0), fecundity (mx) which is also the same value for the total fertility rate (TFR), generation times (G0, G₁, and Gh) and Population doubling time (PDT). However, a lower rate of natural increase (rm) was observed since generation times were higher. Meanwhile, the life history parameters of Cephalobus sp. showed that the value of net production rate (R0) was higher in the exposed group. Fecundity (mx) which is also the same value for the TFR, G0, G1, Gh, and PDT, were higher in the control group. However, a lower rate of natural increase (rm) was observed since generation times were higher. In conclusion, temperature and exposure to heavy metals had a negative influence on the life history of the nematodes, however, further experiments should be considered.

Keywords: artisanal and small-scale gold mining (ASGM), hanging drop method, heavy metals, life history trait.

Procedia PDF Downloads 69
24164 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 67
24163 Mathematical modeling of the calculation of the absorbed dose in uranium production workers with the genetic effects.

Authors: P. Kazymbet, G. Abildinova, K.Makhambetov, M. Bakhtin, D. Rybalkina, K. Zhumadilov

Abstract:

Conducted cytogenetic research in workers Stepnogorsk Mining-Chemical Combine (Akmola region) with the study of 26341 chromosomal metaphase. Using a regression analysis with program DataFit, version 5.0, dependence between exposure dose and the following cytogenetic exponents has been studied: frequency of aberrant cells, frequency of chromosomal aberrations, frequency of the amounts of dicentric chromosomes, and centric rings. Experimental data on calibration curves "dose-effect" enabled the development of a mathematical model, allowing on data of the frequency of aberrant cells, chromosome aberrations, the amounts of dicentric chromosomes and centric rings calculate the absorbed dose at the time of the study. In the dose range of 0.1 Gy to 5.0 Gy dependence cytogenetic parameters on the dose had the following equation: Y = 0,0067е^0,3307х (R2 = 0,8206) – for frequency of chromosomal aberrations; Y = 0,0057е^0,3161х (R2 = 0,8832) –for frequency of cells with chromosomal aberrations; Y =5 Е-0,5е^0,6383 (R2 = 0,6321) – or frequency of the amounts of dicentric chromosomes and centric rings on cells. On the basis of cytogenetic parameters and regression equations calculated absorbed dose in workers of uranium production at the time of the study did not exceed 0.3 Gy.

Keywords: Stepnogorsk, mathematical modeling, cytogenetic, dicentric chromosomes

Procedia PDF Downloads 456
24162 Toxic Metal and Radiological Risk Assessment of Soil, Water and Vegetables around a Gold Mine Turned Residential Area in Mokuro Area of Ile-Ife, Osun State Nigeria: An Implications for Human Health

Authors: Grace O. Akinlade, Danjuma D. Maza, Oluwakemi O. Olawolu, Delight O. Babalola, John A. O. Oyekunle, Joshua O. Ojo

Abstract:

The Mokuro area of Ile-Ife, South West Nigeria, was well known for gold mining in the past (about twenty years ago). However, the place has since been reclaimed and converted to residential area without any environmental risk assessment of the impact of the mining tailings on the environment. Soil, water, and plant samples were collected from 4 different locations around the mine-turned-residential area. Soil samples were pulverized and sieved into finer particles, while the plant samples were dried and pulverized. All the samples were digested and analyzed for As, Pb, Cd, and Zn using atomic absorption spectroscopy (AAS). From the analysis results, the hazard index (HI) was then calculated for the metals. The soil and plant samples were air dried and pulverized, then weighed, after which the samples were packed into special and properly sealed containers to prevent radon gas leakage. After the sealing, the samples were kept for 28 days to attain secular equilibrium. The concentrations of 40K, 238U, and 232Th in the samples were measured using a cesium iodide (CsI) spectrometer and URSA software. The AAS analysis showed that As, Pb, Cd (Toxic metals), and Zn (essential trace metals) are in concentrations lower than permissible limits in plants and soil samples, while the water samples had concentrations higher than permissible limits. The calculated health indices (HI) show that HI for water is >1 and that of plants and soil is <1. Gamma spectrometry result shows high levels of activity concentrations above the recommended limits for all the soil and plant samples collected from the area. Only the water samples have activity concentrations below the recommended limit. Consequently, the absorbed dose, annual effective dose, and excess lifetime cancer risk are all above the recommended safe limit for all the samples except for water samples. In conclusion, all the samples collected from the area are either contaminated with toxic metals or they pose radiological hazards to the consumers. Further detailed study is therefore recommended in order to be able to advise the residents appropriately.

Keywords: toxic metals, gamma spectrometry, Ile-Ife, radiological hazards, gold mining

Procedia PDF Downloads 32
24161 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 150
24160 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 649
24159 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: case-based reasoning, decision tree, stock selection, machine learning

Procedia PDF Downloads 392
24158 Classification of Contexts for Mentioning Love in Interviews with Victims of the Holocaust

Authors: Marina Yurievna Aleksandrova

Abstract:

Research of the Holocaust retains value not only for history but also for sociology and psychology. One of the most important fields of study is how people were coping during and after this traumatic event. The aim of this paper is to identify the main contexts of the topic of love and to determine which contexts are more characteristic for different groups of victims of the Holocaust (gender, nationality, age). In this research, transcripts of interviews with Holocaust victims that were collected during 1946 for the "Voices of the Holocaust" project were used as data. Main contexts were analyzed with methods of network analysis and latent semantic analysis and classified by gender, age, and nationality with random forest. The results show that love is articulated and described significantly differently for male and female informants, nationality is shown results with lower values of quality metrics, as well as the age.

Keywords: Holocaust, latent semantic analysis, network analysis, text-mining, random forest

Procedia PDF Downloads 165
24157 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 63
24156 High-Throughput Artificial Guide RNA Sequence Design for Type I, II and III CRISPR/Cas-Mediated Genome Editing

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

A huge revolution has emerged in genome engineering by the discovery of CRISPR (clustered regularly interspaced palindromic repeats) and CRISPR-associated system genes (Cas) in bacteria. The function of type II Streptococcus pyogenes (Sp) CRISPR/Cas9 system has been confirmed in various species. Other S. thermophilus (St) CRISPR-Cas systems, CRISPR1-Cas and CRISPR3-Cas, have been also reported for preventing phage infection. The CRISPR1-Cas system interferes by cleaving foreign dsDNA entering the cell in a length-specific and orientation-dependant manner. The S. thermophilus CRISPR3-Cas system also acts by cleaving phage dsDNA genomes at the same specific position inside the targeted protospacer as observed in the CRISPR1-Cas system. It is worth mentioning, for the effective DNA cleavage activity, RNA-guided Cas9 orthologs require their own specific PAM (protospacer adjacent motif) sequences. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of target site for the three orthogonals of Cas9 protein, a well-organized procedure will be required for high-throughput and accurate mining of possible target sites in a large genomic dataset. Consequently, we created a reliable procedure to explore potential gRNA sequences for type I (Streptococcus thermophiles), II (Streptococcus pyogenes), and III (Streptococcus thermophiles) CRISPR/Cas systems. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows: i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. The output of such procedure highlights the power of comparative genome mining for different CRISPR/Cas systems. This could yield a repertoire of Cas9 variants with expanded capabilities of gRNA design, and will pave the way for further advance genome and epigenome engineering.

Keywords: CRISPR/Cas systems, gRNA mining, Streptococcus pyogenes, Streptococcus thermophiles

Procedia PDF Downloads 242