Search results for: biological data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27305

Search results for: biological data mining

26795 The Relationship between Fluctuation of Biological Signal: Finger Plethysmogram in Conversation and Anthropophobic Tendency

Authors: Haruo Okabayashi

Abstract:

Human biological signals (pulse wave and brain wave, etc.) have a rhythm which shows fluctuations. This study investigates the relationship between fluctuations of biological signals which are shown by a finger plethysmogram (i.e., finger pulse wave) in conversation and anthropophobic tendency, and identifies whether the fluctuation could be an index of mental health. 32 college students participated in the experiment. The finger plethysmogram of each subject was measured in the following conversation situations: Fun memory talking/listening situation and regrettable memory talking/ listening situation for three minutes each. Lyspect 3.5 was used to collect the data of the finger plethysmogram. Since Lyspect calculates the Lyapunov spectrum, it is possible to obtain the largest Lyapunov exponent (LLE). LLE is an indicator of the fluctuation and shows the degree to which a measure is going away from close proximity to the track in a dynamical system. Before the finger plethysmogram experiment, each participant took the psychological test questionnaire “Anthropophobic Scale.” The scale measures the social phobia trend close to the consciousness of social phobia. It is revealed that there is a remarkable relationship between the fluctuation of the finger plethysmography and anthropophobic tendency scale in talking about a regrettable story in conversation: The participants (N=15) who have a low anthropophobic tendency show significantly more fluctuation of finger pulse waves than the participants (N=17) who have a high anthropophobic tendency (F (1, 31) =5.66, p<0.05). That is, the participants who have a low anthropophobic tendency make conversation flexibly using large fluctuation of biological signal; on the other hand, the participants who have a high anthropophobic tendency constrain a conversation because of small fluctuation. Therefore, fluctuation is not an error but an important drive to make better relationships with others and go towards the development of interaction. In considering mental health, the fluctuation of biological signals would be an important indicator.

Keywords: anthropophobic tendency, finger plethymogram, fluctuation of biological signal, LLE

Procedia PDF Downloads 238
26794 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 560
26793 The Effect of Bacteria on Mercury's Biological Removal

Authors: Nastaran Soltani

Abstract:

Heavy metals such as Mercury are toxic elements that enter the environment through different ways and endanger the environment, plants, animals, and humans’ health. Microbial activities reduce the amount of heavy metals. Therefore, an effective mechanism to eliminate heavy metals in the nature and factory slops, is using bacteria living in polluted areas. Karun River in Khuzestan Province in Iran has been always polluted by heavy metals as it is located among different industries in the region. This study was performed based on the data from sampling water and sediments of four stations across the river during the four seasons of a year. The isolation of resistant bacteria was performed through enrichment and direct cultivation in a solid medium containing mercury. Various bacteria such as Pseudomonas sp., Serratia Marcescens, and E.coli were identified as mercury-resistant bacteria. The power of these bacteria to remove mercury varied from 28% to 86%, with strongest power belonging to Pseudomonas sp. isolated in spring making a good candidate to be used for mercury biological removal from factory slops.

Keywords: bacteria, Karun River, mercury, biological removal, mercury-resistant

Procedia PDF Downloads 286
26792 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning

Authors: Karthik Mittal

Abstract:

This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.

Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA

Procedia PDF Downloads 146
26791 A Graph Library Development Based on the Service-‎Oriented Architecture: Used for Representation of the ‎Biological ‎Systems in the Computer Algorithms

Authors: Mehrshad Khosraviani, Sepehr Najjarpour

Abstract:

Considering the usage of graph-based approaches in systems and synthetic biology, and the various types of ‎the graphs employed by them, a comprehensive graph library based ‎on the three-tier architecture (3TA) was previously introduced for full representation of the biological systems. Although proposing a 3TA-based graph library, three following reasons motivated us to redesign the graph ‎library based on the service-oriented architecture (SOA): (1) Maintaining the accuracy of the data related to an input graph (including its edges, its ‎vertices, its topology, etc.) without involving the end user:‎ Since, in the case of using 3TA, the library files are available to the end users, they may ‎be utilized incorrectly, and consequently, the invalid graph data will be provided to the ‎computer algorithms. However, considering the usage of the SOA, the operation of the ‎graph registration is specified as a service by encapsulation of the library files. In other words, overall control operations needed for registration of the valid data will be the ‎responsibility of the services. (2) Partitioning of the library product into some different parts: Considering 3TA, a whole library product was provided in general. While here, the product ‎can be divided into smaller ones, such as an AND/OR graph drawing service, and each ‎one can be provided individually. As a result, the end user will be able to select any ‎parts of the library product, instead of all features, to add it to a project. (3) Reduction of the complexities: While using 3TA, several other libraries must be needed to add for connecting to the ‎database, responsibility of the provision of the needed library resources in the SOA-‎based graph library is entrusted with the services by themselves. Therefore, the end user ‎who wants to use the graph library is not involved with its complexity. In the end, in order to ‎make ‎the library easier to control in the system, and to restrict the end user from accessing the files, ‎it was preferred to use the service-oriented ‎architecture ‎‎(SOA) over the three-tier architecture (3TA) and to redevelop the previously proposed graph library based on it‎.

Keywords: Bio-Design Automation, Biological System, Graph Library, Service-Oriented Architecture, Systems and Synthetic Biology

Procedia PDF Downloads 311
26790 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 50
26789 Destination Port Detection For Vessels: An Analytic Tool For Optimizing Port Authorities Resources

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/ unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages AIS messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring Automatic Identification System (AIS) messages. Our RRoT method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measure to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Fr´echet Distance (DFD), Dynamic Time Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an fmeasure of 99.08% using Dynamic Time Warping (DTW) similarity measure.

Keywords: spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization

Procedia PDF Downloads 121
26788 Study of the Transport of ²²⁶Ra Colloidal in Mining Context Using a Multi-Disciplinary Approach

Authors: Marine Reymond, Michael Descostes, Marie Muguet, Clemence Besancon, Martine Leermakers, Catherine Beaucaire, Sophie Billon, Patricia Patrier

Abstract:

²²⁶Ra is one of the radionuclides resulting from the disintegration of ²³⁸U. Due to its half-life (1600 y) and its high specific activity (3.7 x 1010 Bq/g), ²²⁶Ra is found at the ultra-trace level in the natural environment (usually below 1 Bq/L, i.e. 10-13 mol/L). Because of its decay in ²²²Rn, a radioactive gas with a shorter half-life (3.8 days) which is difficult to control and dangerous for humans when inhaled, ²²⁶Ra is subject to a dedicated monitoring in surface waters especially in the context of uranium mining. In natural waters, radionuclides occur in dissolved, colloidal or particular forms. Due to the size of colloids, generally ranging between 1 nm and 1 µm and their high specific surface areas, the colloidal fraction could be involved in the transport of trace elements, including radionuclides in the environment. The colloidal fraction is not always easy to determine and few existing studies focus on ²²⁶Ra. In the present study, a complete multidisciplinary approach is proposed to assess the colloidal transport of ²²⁶Ra. It includes water sampling by conventional filtration (0.2µm) and the innovative Diffusive Gradient in Thin Films technique to measure the dissolved fraction (<10nm), from which the colloidal fraction could be estimated. Suspended matter in these waters were also sampled and characterized mineralogically by X-Ray Diffraction, infrared spectroscopy and scanning electron microscopy. All of these data, which were acquired on a rehabilitated former uranium mine, allowed to build a geochemical model using the geochemical calculation code PhreeqC to describe, as accurately as possible, the colloidal transport of ²²⁶Ra. Colloidal transport of ²²⁶Ra was found, for some of the sampling points, to account for up to 95% of the total ²²⁶Ra measured in water. Mineralogical characterization and associated geochemical modelling highlight the role of barite, a barium sulfate mineral well known to trap ²²⁶Ra into its structure. Barite was shown to be responsible for the colloidal ²²⁶Ra fraction despite the presence of kaolinite and ferrihydrite, which are also known to retain ²²⁶Ra by sorption.

Keywords: colloids, mining context, radium, transport

Procedia PDF Downloads 156
26787 Convergence and Stability in Federated Learning with Adaptive Differential Privacy Preservation

Authors: Rizwan Rizwan

Abstract:

This paper provides an overview of Federated Learning (FL) and its application in enhancing data security, privacy, and efficiency. FL utilizes three distinct architectures to ensure privacy is never compromised. It involves training individual edge devices and aggregating their models on a server without sharing raw data. This approach not only provides secure models without data sharing but also offers a highly efficient privacy--preserving solution with improved security and data access. Also we discusses various frameworks used in FL and its integration with machine learning, deep learning, and data mining. In order to address the challenges of multi--party collaborative modeling scenarios, a brief review FL scheme combined with an adaptive gradient descent strategy and differential privacy mechanism. The adaptive learning rate algorithm adjusts the gradient descent process to avoid issues such as model overfitting and fluctuations, thereby enhancing modeling efficiency and performance in multi-party computation scenarios. Additionally, to cater to ultra-large-scale distributed secure computing, the research introduces a differential privacy mechanism that defends against various background knowledge attacks.

Keywords: federated learning, differential privacy, gradient descent strategy, convergence, stability, threats

Procedia PDF Downloads 30
26786 Information Needs and Information Usage of the Older Person Club’s Members in Bangkok

Authors: Siriporn Poolsuwan

Abstract:

This research aims to explore the information needs, information usages, and problems of information usage of the older people club’s members in Dusit District, Bangkok. There are 12 clubs and 746 club’s members in this district. The research results use for older person service in this district. Data is gathered from 252 club’s members by using questionnaires. The quantitative approach uses in research by percentage, means and standard deviation. The results are as follows (1) The older people need Information for entertainment, occupation and academic in the field of short story, computer work, and religion and morality. (2) The participants use Information from various sources. (3) The Problem of information usage is their language skills because of the older people’s literacy problem.

Keywords: information behavior, older person, information seeking, knowledge discovery and data mining

Procedia PDF Downloads 270
26785 A Survey on Compression Methods for Table Constraints

Authors: N. Gharbi

Abstract:

Constraint Satisfaction problems are mathematical problems that are often used to model many real-world problems for which we look if there exists a solution satisfying all its constraints. Table constraints are important for modeling parts of many problems since they list all combinations of allowed or forbidden values. However, they admit practical limitations because they are sometimes too large to be represented in a direct way. In this paper, we present a survey of the different categories of the proposed approaches to compress table constraints in order to reduce both space and time complexities.

Keywords: constraint programming, compression, data mining, table constraints

Procedia PDF Downloads 325
26784 A Method to Evaluate and Compare Web Information Extractors

Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman

Abstract:

Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.

Keywords: web information extractors, information extraction evaluation method, Google scholar, web

Procedia PDF Downloads 248
26783 Intensive Biological Control in Spanish Greenhouses: Problems of the Success

Authors: Carolina Sanchez, Juan R. Gallego, Manuel Gamez, Tomas Cabello

Abstract:

Currently, biological control programs in greenhouse crops involve the use, at the same time, several natural enemies during the crop cycle. Also, large number of plant species grown in greenhouses, among them, the used cultivars are also wide. However, the cultivar effects on entomophagous species efficacy (predators and parasitoids) have been scarcely studied. A new method had been developed, using the factitious prey or host Ephestia kuehniella. It allows us to evaluate, under greenhouse or controlled conditions (semi-field), the cultivar effects on the entomophagous species effectiveness. The work was carried out in greenhouse tomato crop. It has been found the biological and ecological activities of predatory species (Nesidiocoris tenuis) and egg-parasitoid (Trichogramma achaeae) can be well represented with the use of the factitious prey or host; being better in the former than the latter. The data found in the trial are shown and discussed. The developed method could be applied to evaluate new plant materials before making available to farmers as commercial varieties, at low costs and easy use.

Keywords: cultivar effects, efficiency, predators, parasitoids

Procedia PDF Downloads 274
26782 Text Mining Analysis of the Reconstruction Plans after the Great East Japan Earthquake

Authors: Minami Ito, Akihiro Iijima

Abstract:

On March 11, 2011, the Great East Japan Earthquake occurred off the coast of Sanriku, Japan. It is important to build a sustainable society through the reconstruction process rather than simply restoring the infrastructure. To compare the goals of reconstruction plans of quake-stricken municipalities, Japanese language morphological analysis was performed by using text mining techniques. Frequently-used nouns were sorted into four main categories of “life”, “disaster prevention”, “economy”, and “harmony with environment”. Because Soma City is affected by nuclear accident, sentences tagged to “harmony with environment” tended to be frequent compared to the other municipalities. Results from cluster analysis and principle component analysis clearly indicated that the local government reinforces the efforts to reduce risks from radiation exposure as a top priority.

Keywords: eco-friendly reconstruction, harmony with environment, decontamination, nuclear disaster

Procedia PDF Downloads 220
26781 Identifying Network Subgraph-Associated Essential Genes in Molecular Networks

Authors: Efendi Zaenudin, Chien-Hung Huang, Ka-Lok Ng

Abstract:

Essential genes play an important role in the survival of an organism. It has been shown that cancer-associated essential genes are genes necessary for cancer cell proliferation, where these genes are potential therapeutic targets. Also, it was demonstrated that mutations of the cancer-associated essential genes give rise to the resistance of immunotherapy for patients with tumors. In the present study, we focus on studying the biological effects of the essential genes from a network perspective. We hypothesize that one can analyze a biological molecular network by decomposing it into both three-node and four-node digraphs (subgraphs). These network subgraphs encode the regulatory interaction information among the network’s genetic elements. In this study, the frequency of occurrence of the subgraph-associated essential genes in a molecular network was quantified by using the statistical parameter, odds ratio. Biological effects of subgraph-associated essential genes are discussed. In summary, the subgraph approach provides a systematic method for analyzing molecular networks and it can capture useful biological information for biomedical research.

Keywords: biological molecular networks, essential genes, graph theory, network subgraphs

Procedia PDF Downloads 156
26780 Hybridized Approach for Distance Estimation Using K-Means Clustering

Authors: Ritu Vashistha, Jitender Kumar

Abstract:

Clustering using the K-means algorithm is a very common way to understand and analyze the obtained output data. When a similar object is grouped, this is called the basis of Clustering. There is K number of objects and C number of cluster in to single cluster in which k is always supposed to be less than C having each cluster to be its own centroid but the major problem is how is identify the cluster is correct based on the data. Formulation of the cluster is not a regular task for every tuple of row record or entity but it is done by an iterative process. Each and every record, tuple, entity is checked and examined and similarity dissimilarity is examined. So this iterative process seems to be very lengthy and unable to give optimal output for the cluster and time taken to find the cluster. To overcome the drawback challenge, we are proposing a formula to find the clusters at the run time, so this approach can give us optimal results. The proposed approach uses the Euclidian distance formula as well melanosis to find the minimum distance between slots as technically we called clusters and the same approach we have also applied to Ant Colony Optimization(ACO) algorithm, which results in the production of two and multi-dimensional matrix.

Keywords: ant colony optimization, data clustering, centroids, data mining, k-means

Procedia PDF Downloads 128
26779 Implementation of Dozer Push Measurement under Payment Mechanism in Mining Operation

Authors: Anshar Ajatasatru

Abstract:

The decline of coal prices over past years have been significantly increasing the awareness of effective mining operation. A viable step must be undertaken in becoming more cost competitive while striving for best mining practice especially at Melak Coal Mine in East Kalimantan, Indonesia. This paper aims to show how effective dozer push measurement method can be implemented as it is controlled by contract rate on the unit basis of USD ($) per bcm. The method emerges from an idea of daily dozer push activity that continually shifts the overburden until final target design by mine planning. Volume calculation is then performed by calculating volume of each time overburden is removed within determined distance using cut and fill method from a high precision GNSS system which is applied into dozer as a guidance to ensure the optimum result of overburden removal. Accumulation of daily to weekly dozer push volume is found 95 bcm which is multiplied by average sell rate of $ 0,95, thus the amount monthly revenue is $ 90,25. Furthermore, the payment mechanism is then based on push distance and push grade. The push distance interval will determine the rates that vary from $ 0,9 - $ 2,69 per bcm and are influenced by certain push slope grade from -25% until +25%. The amount payable rates for dozer push operation shall be specifically following currency adjustment and is to be added to the monthly overburden volume claim, therefore, the sell rate of overburden volume per bcm may fluctuate depends on the real time exchange rate of Jakarta Interbank Spot Dollar Rate (JISDOR). The result indicates that dozer push measurement can be one of the surface mining alternative since it has enabled to refine method of work, operating cost and productivity improvement apart from exposing risk of low rented equipment performance. In addition, payment mechanism of contract rate by dozer push operation scheduling will ultimately deliver clients by almost 45% cost reduction in the form of low and consistent cost.

Keywords: contract rate, cut-fill method, dozer push, overburden volume

Procedia PDF Downloads 316
26778 Cluster Analysis of Students’ Learning Satisfaction

Authors: Purevdolgor Luvsantseren, Ajnai Luvsan-Ish, Oyuntsetseg Sandag, Javzmaa Tsend, Akhit Tileubai, Baasandorj Chilhaasuren, Jargalbat Puntsagdash, Galbadrakh Chuluunbaatar

Abstract:

One of the indicators of the quality of university services is student satisfaction. Aim: We aimed to study the level of satisfaction of students in the first year of premedical courses in the course of Medical Physics using the cluster method. Materials and Methods: In the framework of this goal, a questionnaire was collected from a total of 324 students who studied the medical physics course of the 1st course of the premedical course at the Mongolian National University of Medical Sciences. When determining the level of satisfaction, the answers were obtained on five levels of satisfaction: "excellent", "good", "medium", "bad" and "very bad". A total of 39 questionnaires were collected from students: 8 for course evaluation, 19 for teacher evaluation, and 12 for student evaluation. From the research, a database with 39 fields and 324 records was created. Results: In this database, cluster analysis was performed in MATLAB and R programs using the k-means method of data mining. Calculated the Hopkins statistic in the created database, the values are 0.88, 0.87, and 0.97. This shows that cluster analysis methods can be used. The course evaluation sub-fund is divided into three clusters. Among them, cluster I has 150 objects with a "good" rating of 46.2%, cluster II has 119 objects with a "medium" rating of 36.7%, and Cluster III has 54 objects with a "good" rating of 16.6%. The teacher evaluation sub-base into three clusters, there are 179 objects with a "good" rating of 55.2% in cluster II, 108 objects with an "average" rating of 33.3% in cluster III, and 36 objects with an "excellent" rating in cluster I of 11.1%. The sub-base of student evaluations is divided into two clusters: cluster II has 215 objects with an "excellent" rating of 66.3%, and cluster I has 108 objects with an "excellent" rating of 33.3%. Evaluating the resulting clusters with the Silhouette coefficient, 0.32 for the course evaluation cluster, 0.31 for the teacher evaluation cluster, and 0.30 for student evaluation show statistical significance. Conclusion: Finally, to conclude, cluster analysis in the model of the medical physics lesson “good” - 46.2%, “middle” - 36.7%, “bad” - 16.6%; 55.2% - “good”, 33.3% - “middle”, 11.1% - “bad” in the teacher evaluation model; 66.3% - “good” and 33.3% of “bad” in the student evaluation model.

Keywords: questionnaire, data mining, k-means method, silhouette coefficient

Procedia PDF Downloads 49
26777 Different Formula of Mixed Bacteria as a Bio-Treatment for Sewage Wastewater

Authors: E. Marei, A. Hammad, S. Ismail, A. El-Gindy

Abstract:

This study aims to investigate the ability of different formula of mixed bacteria as a biological treatments of wastewater after primary treatment as a bio-treatment and bio-removal and bio-adsorbent of different heavy metals in natural circumstances. The wastewater was collected from Sarpium forest site-Ismailia Governorate, Egypt. These treatments were mixture of free cells and mixture of immobilized cells of different bacteria. These different formulas of mixed bacteria were prepared under Lab. condition. The obtained data indicated that, as a result of wastewater bio-treatment, the removal rate was found to be 76.92 and 76.70% for biological oxygen demand, 79.78 and 71.07% for chemical oxygen demand, 32.45 and 36.84 % for ammonia nitrogen as well as 91.67 and 50.0% for phosphate after 24 and 28 hrs with mixed free cells and mixed immobilized cells, respectively. Moreover, the bio-removals of different heavy metals were found to reach 90.0 and 50. 0% for Cu ion, 98.0 and 98.5% for Fe ion, 97.0 and 99.3% for Mn ion, 90.0 and 90.0% Pb, 80.0% and 75.0% for Zn ion after 24 and 28 hrs with mixed free cells and mixed immobilized cells, respectively. The results indicated that 13.86 and 17.43% of removal efficiency and reduction of total dissolved solids were achieved after 24 and 28 hrs with mixed free cells and mixed immobilized cells, respectively.

Keywords: wastewater bio-treatment , bio-sorption heavy metals, biological desalination, immobilized bacteria, free cell bacteria

Procedia PDF Downloads 201
26776 Fake News Detection for Korean News Using Machine Learning Techniques

Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Keywords: fake news detection, Korean news, machine learning, text mining

Procedia PDF Downloads 275
26775 Improving Grade Control Turnaround Times with In-Pit Hyperspectral Assaying

Authors: Gary Pattemore, Michael Edgar, Andrew Job, Marina Auad, Kathryn Job

Abstract:

As critical commodities become more scarce, significant time and resources have been used to better understand complicated ore bodies and extract their full potential. These challenging ore bodies provide several pain points for geologists and engineers to overcome, poor handling of these issues flows downs stream to the processing plant affecting throughput rates and recovery. Many open cut mines utilise blast hole drilling to extract additional information to feed back into the modelling process. This method requires samples to be collected during or after blast hole drilling. Samples are then sent for assay with turnaround times varying from 1 to 12 days. This method is time consuming, costly, requires human exposure on the bench and collects elemental data only. To address this challenge, research has been undertaken to utilise hyperspectral imaging across a broad spectrum to scan samples, collars or take down hole measurements for minerals and moisture content and grade abundances. Automation of this process using unmanned vehicles and on-board processing reduces human in pit exposure to ensure ongoing safety. On-board processing allows data to be integrated into modelling workflows with immediacy. The preliminary results demonstrate numerous direct and indirect benefits from this new technology, including rapid and accurate grade estimates, moisture content and mineralogy. These benefits allow for faster geo modelling updates, better informed mine scheduling and improved downstream blending and processing practices. The paper presents recommendations for implementation of the technology in open cut mining environments.

Keywords: grade control, hyperspectral scanning, artificial intelligence, autonomous mining, machine learning

Procedia PDF Downloads 113
26774 Mining Riding Patterns in Bike-Sharing System Connecting with Public Transportation

Authors: Chong Zhang, Guoming Tang, Bin Ge, Jiuyang Tang

Abstract:

With the fast growing road traffic and increasingly severe traffic congestion, more and more citizens choose to use the public transportation for daily travelling. Meanwhile, the shared bike provides a convenient option for the first and last mile to the public transit. As of 2016, over one thousand cities around the world have deployed the bike-sharing system. The combination of these two transportations have stimulated the development of each other and made significant contribution to the reduction of carbon footprint. A lot of work has been done on mining the riding behaviors in various bike-sharing systems. Most of them, however, treated the bike-sharing system as an isolated system and thus their results provide little reference for the public transit construction and optimization. In this work, we treat the bike-sharing and public transit as a whole and investigate the customers’ bike-and-ride behaviors. Specifically, we develop a spatio-temporal traffic delivery model to study the riding patterns between the two transportation systems and explore the traffic characteristics (e.g., distributions of customer arrival/departure and traffic peak hours) from the time and space dimensions. During the model construction and evaluation, we make use of large open datasets from real-world bike-sharing systems (the CitiBike in New York, GoBike in San Francisco and BIXI in Montreal) along with corresponding public transit information. The developed two-dimension traffic model, as well as the mined bike-and-ride behaviors, can provide great help to the deployment of next-generation intelligent transportation systems.

Keywords: riding pattern mining, bike-sharing system, public transportation, bike-and-ride behavior

Procedia PDF Downloads 780
26773 Exploring Influence Range of Tainan City Using Electronic Toll Collection Big Data

Authors: Chen Chou, Feng-Tyan Lin

Abstract:

Big Data has been attracted a lot of attentions in many fields for analyzing research issues based on a large number of maternal data. Electronic Toll Collection (ETC) is one of Intelligent Transportation System (ITS) applications in Taiwan, used to record starting point, end point, distance and travel time of vehicle on the national freeway. This study, taking advantage of ETC big data, combined with urban planning theory, attempts to explore various phenomena of inter-city transportation activities. ETC, one of government's open data, is numerous, complete and quick-update. One may recall that living area has been delimited with location, population, area and subjective consciousness. However, these factors cannot appropriately reflect what people’s movement path is in daily life. In this study, the concept of "Living Area" is replaced by "Influence Range" to show dynamic and variation with time and purposes of activities. This study uses data mining with Python and Excel, and visualizes the number of trips with GIS to explore influence range of Tainan city and the purpose of trips, and discuss living area delimited in current. It dialogues between the concepts of "Central Place Theory" and "Living Area", presents the new point of view, integrates the application of big data, urban planning and transportation. The finding will be valuable for resource allocation and land apportionment of spatial planning.

Keywords: Big Data, ITS, influence range, living area, central place theory, visualization

Procedia PDF Downloads 279
26772 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 240
26771 Semi-Automatic Method to Assist Expert for Association Rules Validation

Authors: Amdouni Hamida, Gammoudi Mohamed Mohsen

Abstract:

In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process.

Keywords: association rules, rule-based classification, classification quality, validation

Procedia PDF Downloads 439
26770 Machine Learning Methods for Network Intrusion Detection

Authors: Mouhammad Alkasassbeh, Mohammad Almseidin

Abstract:

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Keywords: IDS, DDoS, MLP, KDD

Procedia PDF Downloads 234
26769 Alternative Approaches to Community Involvement in Resettlement Schemes to Prevent Potential Conflicts: Case Study in Chibuto District, Mozambique

Authors: Constâncio Augusto Machanguana

Abstract:

The world over, resettling communities, for whatever purpose (mining, dams, forestry and wildlife management, roads, or facilitating services delivery), often leads to tensions between those resettled, the investors, and the local and national governments involved in the process. Causes include unclear government legislation and regulations, confusing Corporate Social Responsibility policies and guidelines, and other social-economic policies leading to unrealistic expectations among those being resettled, causing frustrations within the community, shifting them to any imminent conflict against the investors (company). The exploitation of heavy mineral sands along Mozambique’s long coastline and hinterland has not been providing a benefit for the affected communities. A case in point is the exploration, since 2018, of heavy sands in Chibuto District in the Southern Province of Gaza. A likely contributing factor is the standard type of socio-economic surveys and community involvement processes that could smooth the relationship among the parties. This research aims to investigate alternative processes to plan, initiate and guide resettlement processes in such a way that tensions and conflicts are avoided. Based on the process already finished, compared to similar cases along with the country, mixed methods to collect primary data were adopted: three focus groups of 125 people, representing 324 resettled householders; five semi-structured interviews with relevant stakeholders such as the local government, NGO’s and local leaders to understand their role in all stages of the process. The preliminary results show that the community has limited or no understanding of the potential impacts of these large-scale explorations, and the apparent harmony between the parties (community and company) may hide the dissatisfaction of those resettled. So, rather than focusing on negative mining impacts, the research contributes to science by identifying the best resettlement approach that can be replicated in other contexts along with the country in the actual context of the new discovery of mineral resources.

Keywords: conflict mitigation, resettlement, mining, Mozambique

Procedia PDF Downloads 113
26768 Credit Card Fraud Detection with Ensemble Model: A Meta-Heuristic Approach

Authors: Gong Zhilin, Jing Yang, Jian Yin

Abstract:

The purpose of this paper is to develop a novel system for credit card fraud detection based on sequential modeling of data using hybrid deep learning models. The projected model encapsulates five major phases are pre-processing, imbalance-data handling, feature extraction, optimal feature selection, and fraud detection with an ensemble classifier. The collected raw data (input) is pre-processed to enhance the quality of the data through alleviation of the missing data, noisy data as well as null values. The pre-processed data are class imbalanced in nature, and therefore they are handled effectively with the K-means clustering-based SMOTE model. From the balanced class data, the most relevant features like improved Principal Component Analysis (PCA), statistical features (mean, median, standard deviation) and higher-order statistical features (skewness and kurtosis). Among the extracted features, the most optimal features are selected with the Self-improved Arithmetic Optimization Algorithm (SI-AOA). This SI-AOA model is the conceptual improvement of the standard Arithmetic Optimization Algorithm. The deep learning models like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and optimized Quantum Deep Neural Network (QDNN). The LSTM and CNN are trained with the extracted optimal features. The outcomes from LSTM and CNN will enter as input to optimized QDNN that provides the final detection outcome. Since the QDNN is the ultimate detector, its weight function is fine-tuned with the Self-improved Arithmetic Optimization Algorithm (SI-AOA).

Keywords: credit card, data mining, fraud detection, money transactions

Procedia PDF Downloads 131
26767 Information Communication Technology Based Road Traffic Accidents’ Identification, and Related Smart Solution Utilizing Big Data

Authors: Ghulam Haider Haidaree, Nsenda Lukumwena

Abstract:

Today the world of research enjoys abundant data, available in virtually any field, technology, science, and business, politics, etc. This is commonly referred to as big data. This offers a great deal of precision and accuracy, supportive of an in-depth look at any decision-making process. When and if well used, Big Data affords its users with the opportunity to produce substantially well supported and good results. This paper leans extensively on big data to investigate possible smart solutions to urban mobility and related issues, namely road traffic accidents, its casualties, and fatalities based on multiple factors, including age, gender, location occurrences of accidents, etc. Multiple technologies were used in combination to produce an Information Communication Technology (ICT) based solution with embedded technology. Those technologies include principally Geographic Information System (GIS), Orange Data Mining Software, Bayesian Statistics, to name a few. The study uses the Leeds accident 2016 to illustrate the thinking process and extracts thereof a model that can be tested, evaluated, and replicated. The authors optimistically believe that the proposed model will significantly and smartly help to flatten the curve of road traffic accidents in the fast-growing population densities, which increases considerably motor-based mobility.

Keywords: accident factors, geographic information system, information communication technology, mobility

Procedia PDF Downloads 208
26766 Improved Classification Procedure for Imbalanced and Overlapped Situations

Authors: Hankyu Lee, Seoung Bum Kim

Abstract:

The issue with imbalance and overlapping in the class distribution becomes important in various applications of data mining. The imbalanced dataset is a special case in classification problems in which the number of observations of one class (i.e., major class) heavily exceeds the number of observations of the other class (i.e., minor class). Overlapped dataset is the case where many observations are shared together between the two classes. Imbalanced and overlapped data can be frequently found in many real examples including fraud and abuse patients in healthcare, quality prediction in manufacturing, text classification, oil spill detection, remote sensing, and so on. The class imbalance and overlap problem is the challenging issue because this situation degrades the performance of most of the standard classification algorithms. In this study, we propose a classification procedure that can effectively handle imbalanced and overlapped datasets by splitting data space into three parts: nonoverlapping, light overlapping, and severe overlapping and applying the classification algorithm in each part. These three parts were determined based on the Hausdorff distance and the margin of the modified support vector machine. An experiments study was conducted to examine the properties of the proposed method and compared it with other classification algorithms. The results showed that the proposed method outperformed the competitors under various imbalanced and overlapped situations. Moreover, the applicability of the proposed method was demonstrated through the experiment with real data.

Keywords: classification, imbalanced data with class overlap, split data space, support vector machine

Procedia PDF Downloads 308