Search results for: artisanal small-scale mining
422 An Intelligent Search and Retrieval System for Mining Clinical Data Repositories Based on Computational Imaging Markers and Genomic Expression Signatures for Investigative Research and Decision Support
Authors: David J. Foran, Nhan Do, Samuel Ajjarapu, Wenjin Chen, Tahsin Kurc, Joel H. Saltz
Abstract:
The large-scale data and computational requirements of investigators throughout the clinical and research communities demand an informatics infrastructure that supports both existing and new investigative and translational projects in a robust, secure environment. In some subspecialties of medicine and research, the capacity to generate data has outpaced the methods and technology used to aggregate, organize, access, and reliably retrieve this information. Leading health care centers now recognize the utility of establishing an enterprise-wide, clinical data warehouse. The primary benefits that can be realized through such efforts include cost savings, efficient tracking of outcomes, advanced clinical decision support, improved prognostic accuracy, and more reliable clinical trials matching. The overarching objective of the work presented here is the development and implementation of a flexible Intelligent Retrieval and Interrogation System (IRIS) that exploits the combined use of computational imaging, genomics, and data-mining capabilities to facilitate clinical assessments and translational research in oncology. The proposed System includes a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide insight into the underlying tumor characteristics that are not be apparent by human inspection alone. A key distinguishing feature of the System is a configurable Extract, Transform and Load (ETL) interface that enables it to adapt to different clinical and research data environments. This project is motivated by the growing emphasis on establishing Learning Health Systems in which cyclical hypothesis generation and evidence evaluation become integral to improving the quality of patient care. To facilitate iterative prototyping and optimization of the algorithms and workflows for the System, the team has already implemented a fully functional Warehouse that can reliably aggregate information originating from multiple data sources including EHR’s, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology PAC systems, Digital Pathology archives, Unstructured Clinical Documents, and Next Generation Sequencing services. The System enables physicians to systematically mine and review the molecular, genomic, image-based, and correlated clinical information about patient tumors individually or as part of large cohorts to identify patterns that may influence treatment decisions and outcomes. The CRDW core system has facilitated peer-reviewed publications and funded projects, including an NIH-sponsored collaboration to enhance the cancer registries in Georgia, Kentucky, New Jersey, and New York, with machine-learning based classifications and quantitative pathomics, feature sets. The CRDW has also resulted in a collaboration with the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) at the U.S. Department of Veterans Affairs to develop algorithms and workflows to automate the analysis of lung adenocarcinoma. Those studies showed that combining computational nuclear signatures with traditional WHO criteria through the use of deep convolutional neural networks (CNNs) led to improved discrimination among tumor growth patterns. The team has also leveraged the Warehouse to support studies to investigate the potential of utilizing a combination of genomic and computational imaging signatures to characterize prostate cancer. The results of those studies show that integrating image biomarkers with genomic pathway scores is more strongly correlated with disease recurrence than using standard clinical markers.Keywords: clinical data warehouse, decision support, data-mining, intelligent databases, machine-learning.
Procedia PDF Downloads 126421 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data
Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim
Abstract:
Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.Keywords: activity pattern, data fusion, smart-card, XGboost
Procedia PDF Downloads 246420 Coal Mining Safety Monitoring Using Wsn
Authors: Somdatta Saha
Abstract:
The main purpose was to provide an implementable design scenario for underground coal mines using wireless sensor networks (WSNs). The main reason being that given the intricacies in the physical structure of a coal mine, only low power WSN nodes can produce accurate surveillance and accident detection data. The work mainly concentrated on designing and simulating various alternate scenarios for a typical mine and comparing them based on the obtained results to arrive at a final design. In the Era of embedded technology, the Zigbee protocols are used in more and more applications. Because of the rapid development of sensors, microcontrollers, and network technology, a reliable technological condition has been provided for our automatic real-time monitoring of coal mine. The underground system collects temperature, humidity and methane values of coal mine through sensor nodes in the mine; it also collects the number of personnel inside the mine with the help of an IR sensor, and then transmits the data to information processing terminal based on ARM.Keywords: ARM, embedded board, wireless sensor network (Zigbee)
Procedia PDF Downloads 340419 Syndromic Surveillance Framework Using Tweets Data Analytics
Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden
Abstract:
Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza
Procedia PDF Downloads 116418 Unsupervised Domain Adaptive Text Retrieval with Query Generation
Authors: Rui Yin, Haojie Wang, Xun Li
Abstract:
Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.Keywords: dense retrieval, query generation, unsupervised training, text retrieval
Procedia PDF Downloads 73417 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas
Authors: Grace Omowunmi Soyebi
Abstract:
A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This data mining application is to be designed using a structured system analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the design and implementation of a computerized medical record system. This computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.Keywords: programming, data, software development, innovation
Procedia PDF Downloads 87416 FTIR Characterization of EPS Ligands from Mercury Resistant Bacterial Isolate, Paenibacillus jamilae PKR1
Authors: Debajit Kalita, Macmillan Nongkhlaw, S. R. Joshi
Abstract:
Mercury (Hg) is a highly toxic heavy metal released both from naturally occurring volcanoes and anthropogenic activities like alkali and mining industries as well as biomedical wastes. Exposure to mercury is known to affect the nervous, gastrointestinal and renal systems. In the present study, a bacterial isolate identified using 16S rRNA marker as Paenibacillus jamilae PKR1 isolated from India’s largest sandstone-type uranium deposits, containing an average of 0.1% U3O8, was found to be resistance to Hg contamination under culture conditions. It showed strong hydrophobicity as revealed by SAT, MATH, PAT, SAA adherence assays. The Fourier Transform Infrared (FTIR) spectra showed the presence of hydroxyl, amino and carboxylic functional groups on the cell surface EPS which are known to contribute in the binding of metals. It is proposed that the characterized isolate tolerating up to 4.0mM of mercury provides scope for its application in bioremediation of mercury from contaminated sites.Keywords: mercury, Domiasiat, uranium, paenibacillus jamilae, hydrophobicity, FTIR
Procedia PDF Downloads 409415 Mining News Deserts: Impact of Local Newspaper's Closure on Political Participation and Engagement in Rural Australian Town of Lightning Ridge
Authors: Marco Magasic
Abstract:
This article examines how a local newspaper’s closure impacts the way everyday people in a rural Australian town are informed about and engage with political affairs. It draws on a two-month focused ethnographic study in the outback town of Lighting Ridge, New South Wales and explores people’s media-related practices following the closure of the towns’ only newspaper, The Ridge News, in 2015. While social media is considered to have partly filled the news void, there is an increasingly fragmented and less vibrant local public sphere that has led to growing complacency among individuals about political affairs. Local residents highlight a dearth of reliable, credible information and lament the loss of the newspaper and its role in community advocacy and fostering people’s engagement with political institutions, especially local government.Keywords: public sphere, political participation, local news, democratic deficit
Procedia PDF Downloads 154414 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques
Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel
Abstract:
Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis
Procedia PDF Downloads 713413 Harmonization of State Law and Local Laws in Coastal and Marine Areas Management
Authors: N. S. B. Ambarini, Tito Sofyan, Edra Satmaidi
Abstract:
Coastal and marine are two potential natural resource one of the pillars of the national economy. The Indonesian archipelago has marine and coastal which is quite spacious. Various important natural resources such as fisheries, mining and so on are in coastal areas and the sea, so that this region is a unique area with a variety of interests to exploit it. Therefore, to preserve a sustainable manner need good management and comprehensive. To the national and local level legal regulations have been published relating to the management of coastal and marine areas. However, in practice it has not been able to function optimally. Substantially has not touched the problems of the region, especially concerning the interests of local communities (local). This study is a legal non-doctrinal approach to socio-legal studies. Based on the results of research in some coastal and marine areas in Bengkulu province - Indonesia, there is a fact that the system of customary law and local wisdom began to weaken implementation. Therefore harmonization needs to be done in implementing laws and regulations that apply to the values of indigenous and local knowledge that exists in the community.Keywords: coastal and marine, harmonization, law, local
Procedia PDF Downloads 346412 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm
Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian
Abstract:
The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool
Procedia PDF Downloads 435411 Message Framework for Disaster Management: An Application Model for Mines
Authors: A. Baloglu, A. Çınar
Abstract:
Different tools and technologies were implemented for Crisis Response and Management (CRM) which is generally using available network infrastructure for information exchange. Depending on type of disaster or crisis, network infrastructure could be affected and it could not be able to provide reliable connectivity. Thus any tool or technology that depends on the connectivity could not be able to fulfill its functionalities. As a solution, a new message exchange framework has been developed. Framework provides offline/online information exchange platform for CRM Information Systems (CRMIS) and it uses XML compression and packet prioritization algorithms and is based on open source web technologies. By introducing offline capabilities to the web technologies, framework will be able to perform message exchange on unreliable networks. The experiments done on the simulation environment provide promising results on low bandwidth networks (56kbps and 28.8 kbps) with up to 50% packet loss and the solution is to successfully transfer all the information on these low quality networks where the traditional 2 and 3 tier applications failed.Keywords: crisis response and management, XML messaging, web services, XML compression, mining
Procedia PDF Downloads 339410 A Multi-Model Approach to Assess Atlantic Bonito (Sarda Sarda, Bloch 1793) in the Eastern Atlantic Ocean: A Case Study of the Senegalese Exclusive Economic Zone
Authors: Ousmane Sarr
Abstract:
The Senegalese coasts have high productivity of fishery resources due to the frequency of intense up-welling system that occurs along its coast, caused by the maritime trade winds making its waters nutrients rich. Fishing plays a primordial role in Senegal's socioeconomic plans and food security. However, a global diagnosis of the Senegalese maritime fishing sector has highlighted the challenges this sector encounters. Among these concerns, some significant stocks, a priority target for artisanal fishing, need further assessment. If no efforts are made in this direction, most stock will be overexploited or even in decline. It is in this context that this research was initiated. This investigation aimed to apply a multi-modal approach (LBB, Catch-only-based CMSY model and its most recent version (CMSY++); JABBA, and JABBA-Select) to assess the stock of Atlantic bonito, Sarda sarda (Bloch, 1793) in the Senegalese Exclusive Economic Zone (SEEZ). Available catch, effort, and size data from Atlantic bonito over 15 years (2004-2018) were used to calculate the nominal and standardized CPUE, size-frequency distribution, and length at retentions (50 % and 95 % selectivity) of the species. These relevant results were employed as input parameters for stock assessment models mentioned above to define the stock status of this species in this region of the Atlantic Ocean. The LBB model indicated an Atlantic bonito healthy stock status with B/BMSY values ranging from 1.3 to 1.6 and B/B0 values varying from 0.47 to 0.61 of the main scenarios performed (BON_AFG_CL, BON_GN_Length, and BON_PS_Length). The results estimated by LBB are consistent with those obtained by CMSY. The CMSY model results demonstrate that the SEEZ Atlantic bonito stock is in a sound condition in the final year of the main scenarios analyzed (BON, BON-bt, BON-GN-bt, and BON-PS-bt) with sustainable relative stock biomass (B2018/BMSY = 1.13 to 1.3) and fishing pressure levels (F2018/FMSY= 0.52 to 1.43). The B/BMSY and F/FMSY results for the JABBA model ranged between 2.01 to 2.14 and 0.47 to 0.33, respectively. In contrast, The estimated B/BMSY and F/FMSY for JABBA-Select ranged from 1.91 to 1.92 and 0.52 to 0.54. The Kobe plots results of the base case scenarios ranged from 75% to 89% probability in the green area, indicating sustainable fishing pressure and an Atlantic bonito healthy stock size capable of producing high yields close to the MSY. Based on the stock assessment results, this study highlighted scientific advice for temporary management measures. This study suggests an improvement of the selectivity parameters of longlines and purse seines and a temporary prohibition of the use of sleeping nets in the fishery for the Atlantic bonito stock in the SEEZ based on the results of the length-base models. Although these actions are temporary, they can be essential to reduce or avoid intense pressure on the Atlantic bonito stock in the SEEZ. However, it is necessary to establish harvest control rules to provide coherent and solid scientific information that leads to appropriate decision-making for rational and sustainable exploitation of Atlantic bonito in the SEEZ and the Eastern Atlantic Ocean.Keywords: multi-model approach, stock assessment, atlantic bonito, healthy stock, sustainable, SEEZ, temporary management measures
Procedia PDF Downloads 58409 A Study of the Performance Parameter for Recommendation Algorithm Evaluation
Authors: C. Rana, S. K. Jain
Abstract:
The enormous amount of Web data has challenged its usage in efficient manner in the past few years. As such, a range of techniques are applied to tackle this problem; prominent among them is personalization and recommender system. In fact, these are the tools that assist user in finding relevant information of web. Most of the e-commerce websites are applying such tools in one way or the other. In the past decade, a large number of recommendation algorithms have been proposed to tackle such problems. However, there have not been much research in the evaluation criteria for these algorithms. As such, the traditional accuracy and classification metrics are still used for the evaluation purpose that provides a static view. This paper studies how the evolution of user preference over a period of time can be mapped in a recommender system using a new evaluation methodology that explicitly using time dimension. We have also presented different types of experimental set up that are generally used for recommender system evaluation. Furthermore, an overview of major accuracy metrics and metrics that go beyond the scope of accuracy as researched in the past few years is also discussed in detail.Keywords: collaborative filtering, data mining, evolutionary, clustering, algorithm, recommender systems
Procedia PDF Downloads 413408 Learning about the Strengths and Weaknesses of Urban Climate Action Plans
Authors: Prince Dacosta Aboagye, Ayyoob Sharifi
Abstract:
Cities respond to climate concerns mainly through their climate action plans (CAPs). A comprehensive content analysis of the dynamics in existing urban CAPs is not well represented in the literature. This literature void presents a difficulty in appreciating the strengths and weaknesses of urban CAPs. Here, we perform a qualitative content analysis (QCA) on CAPs from 278 cities worldwide and use text-mining tools to map and visualize the relevant data. Our analysis showed a decline in the number of CAPs developed and published following the global COVID-19 lockdown period. Evidently, megacities are leading the deep decarbonisation agenda. We also observed a transition from developing mainly mitigation-focused CAPs pre-COP21 to both mitigation and adaptation CAPs. A lack of inclusiveness in local climate planning was common among European and North American cities. The evidence is a catalyst for understanding the trends in existing urban CAPs to shape future urban climate planning.Keywords: urban, climate action plans, strengths, weaknesses
Procedia PDF Downloads 96407 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification
Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike
Abstract:
Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.Keywords: data mining, decision tree, classification, imbalance dataset
Procedia PDF Downloads 136406 Research on the Risks of Railroad Receiving and Dispatching Trains Operators: Natural Language Processing Risk Text Mining
Authors: Yangze Lan, Ruihua Xv, Feng Zhou, Yijia Shan, Longhao Zhang, Qinghui Xv
Abstract:
Receiving and dispatching trains is an important part of railroad organization, and the risky evaluation of operating personnel is still reflected by scores, lacking further excavation of wrong answers and operating accidents. With natural language processing (NLP) technology, this study extracts the keywords and key phrases of 40 relevant risk events about receiving and dispatching trains and reclassifies the risk events into 8 categories, such as train approach and signal risks, dispatching command risks, and so on. Based on the historical risk data of personnel, the K-Means clustering method is used to classify the risk level of personnel. The result indicates that the high-risk operating personnel need to strengthen the training of train receiving and dispatching operations towards essential trains and abnormal situations.Keywords: receiving and dispatching trains, natural language processing, risk evaluation, K-means clustering
Procedia PDF Downloads 91405 A Study of Growth Factors on Sustainable Manufacturing in Small and Medium-Sized Enterprises: Case Study of Japan Manufacturing
Authors: Tadayuki Kyoutani, Shigeyuki Haruyama, Ken Kaminishi, Zefry Darmawan
Abstract:
Japan’s semiconductor industries have developed greatly in recent years. Many were started from a Small and Medium-sized Enterprises (SMEs) that found at a good circumstance and now become the prosperous industries in the world. Sustainable growth factors that support the creation of spirit value inside the Japanese company were strongly embedded through performance. Those factors were not clearly defined among each company. A series of literature research conducted to explore quantitative text mining about the definition of sustainable growth factors. Sustainable criteria were developed from previous research to verify the definition of the factors. A typical frame work was proposed as a systematical approach to develop sustainable growth factor in a specific company. Result of approach was review in certain period shows that factors influenced in sustainable growth was importance for the company to achieve the goal.Keywords: SME, manufacture, sustainable, growth factor
Procedia PDF Downloads 250404 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis
Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales
Abstract:
This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis
Procedia PDF Downloads 193403 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study
Authors: Salima Smiti, Ines Gasmi, Makram Soui
Abstract:
Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.Keywords: credit risk assessment, classification algorithms, data mining, rule extraction
Procedia PDF Downloads 181402 Heart Ailment Prediction Using Machine Learning Methods
Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula
Abstract:
The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting
Procedia PDF Downloads 49401 Customer Data Analysis Model Using Business Intelligence Tools in Telecommunication Companies
Authors: Monica Lia
Abstract:
This article presents a customer data analysis model using business intelligence tools for data modelling, transforming, data visualization and dynamic reports building. Economic organizational customer’s analysis is made based on the information from the transactional systems of the organization. The paper presents how to develop the data model starting for the data that companies have inside their own operational systems. The owned data can be transformed into useful information about customers using business intelligence tool. For a mature market, knowing the information inside the data and making forecast for strategic decision become more important. Business Intelligence tools are used in business organization as support for decision-making.Keywords: customer analysis, business intelligence, data warehouse, data mining, decisions, self-service reports, interactive visual analysis, and dynamic dashboards, use cases diagram, process modelling, logical data model, data mart, ETL, star schema, OLAP, data universes
Procedia PDF Downloads 430400 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.Keywords: clustering, data analysis, data mining, predictive models
Procedia PDF Downloads 466399 Drugs, Silk Road, Bitcoins
Authors: Lali Khurtsia, Vano Tsertsvadze
Abstract:
Georgian drug policy is directed to reduce the supply of drugs. Retrospective analysis has shown that law enforcement activities have been followed by the expulsion of particular injecting drugs. The demand remains unchanged and drugs are substituted by the hand-made, even more dangerous homemade drugs entered the market. To find out expected new trends on the Georgian drug market, qualitative study was conducted with Georgian drug users to determine drug supply routes. It turned out that drug suppliers and consumers for safety reasons and to protect their anonymity, use Skype to make deals. IT in illegal drug trade is even more sophisticated in the worldwide. Trading with Bitcoins in the Darknet ensures high confidentiality of money transactions and the safe circulation of drugs. In 2014 largest Bitcoin mining enterprise in the world was built in Georgia. We argue that the use of Bitcoins and Darknet by Georgian drug consumers and suppliers will be an incentive to response adequately to the government's policy of restricting supply in order to satisfy market demand for drugs.Keywords: bitcoin, darknet, drugs, policy
Procedia PDF Downloads 439398 Impact of Trade Cooperation of BRICS Countries on Economic Growth
Authors: Svetlana Gusarova
Abstract:
The essential role in the recent development of world economy has led to the developing countries, notably to BRICS countries (Brazil, Russia, India, China, South Africa). Over the next 50 years the BRICS countries are expected to be the engines of global trade and economic growth. Trade cooperation of BRICS countries can enhance their economic development. BRICS countries were among Top 10 world exporters of office and telecom equipment, of textiles, of clothing, of iron and steel, of chemicals, of agricultural products, of automotive products, of fuel and mining products. China was one of the main trading partners of all BRICS countries, maintaining close relationship with all BRICS countries in the development of trade. Author analyzed trade complementarity of BRICS countries and revealed the high level of complementarity of their trade flows in connection with availability of specialization in different types of goods. The correlation and regression analysis of communication of Intra-BRICS merchandise turnover and their GDP (PPP) revealed very strong impact on the development of their economies.Keywords: BRICS countries, trade cooperation, complementarity, regression analysis
Procedia PDF Downloads 281397 Cotton Crops Vegetative Indices Based Assessment Using Multispectral Images
Authors: Muhammad Shahzad Shifa, Amna Shifa, Muhammad Omar, Aamir Shahzad, Rahmat Ali Khan
Abstract:
Many applications of remote sensing to vegetation and crop response depend on spectral properties of individual leaves and plants. Vegetation indices are usually determined to estimate crop biophysical parameters like crop canopies and crop leaf area indices with the help of remote sensing. Cotton crops assessment is performed with the help of vegetative indices. Remotely sensed images from an optical multispectral radiometer MSR5 are used in this study. The interpretation is based on the fact that different materials reflect and absorb light differently at different wavelengths. Non-normalized and normalized forms of these datasets are analyzed using two complementary data mining algorithms; K-means and K-nearest neighbor (KNN). Our analysis shows that the use of normalized reflectance data and vegetative indices are suitable for an automated assessment and decision making.Keywords: cotton, condition assessment, KNN algorithm, clustering, MSR5, vegetation indices
Procedia PDF Downloads 333396 Identification of Conserved Domains and Motifs for GRF Gene Family
Authors: Jafar Ahmadi, Nafiseh Noormohammadi, Sedegeh Fabriki Ourang
Abstract:
GRF, Growth regulating factor, genes encode a novel class of plant-specific transcription factors. The GRF proteins play a role in the regulation of cell numbers in young and growing tissues and may act as transcription activations in growth and development of plants. Identification of GRF genes and their expression are important in plants to performance of the growth and development of various organs. In this study, to better understanding the structural and functional differences of GRFs family, 45 GRF proteins sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H. vulgare, and S. bicolor, have been collected and analyzed through bioinformatics data mining. As a result, in secondary structure of GRFs, the number of alpha helices was more than beta sheets and in all of them QLQ domains were completely in the biggest alpha helix. In all GRFs, QLQ, and WRC domains were completely protected except in AtGRF9. These proteins have no trans-membrane domain and due to have nuclear localization signals act in nuclear and they are component of unstable proteins in the test tube.Keywords: domain, gene family, GRF, motif
Procedia PDF Downloads 457395 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines
Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma
Abstract:
Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)
Procedia PDF Downloads 274394 Technical and Economic Environment in the Polish Power System as the Basis for Distributed Generation and Renewable Energy Sources Development
Authors: Pawel Sowa, Joachim Bargiel, Bogdan Mol, Katarzyna Luszcz
Abstract:
The article raises the issue of the development of local renewable energy sources and the production of distributed energy in context of improving the reliability of the Polish Power System and the beneficial impact on local and national energy security. The paper refers to the current problems of local governments in the process of investment in the area of distributed energy projects, and discusses the issues of the future role and cooperation within the local power plants and distributed energy. Attention is paid to the local communities the chance to raise their own resources and management of energy fuels (biomass, wind, gas mining) and improving the local energy balance. The material presented takes the issue of the development of the energy potential of municipalities and future cooperation with professional energy. As an example, practical solutions used in one of the communes in Silesia.Keywords: distributed generation, mini centers energy, renewable energy sources, reliability of supply of rural commune
Procedia PDF Downloads 600393 Methods for Distinction of Cattle Using Supervised Learning
Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl
Abstract:
Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.Keywords: genetic data, Pinzgau cattle, supervised learning, machine learning
Procedia PDF Downloads 550