Search results for: data mining applications and discovery
30534 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction
Authors: Patricia Jiménez, Rafael Corchuelo
Abstract:
Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.Keywords: information extraction, search heuristics, semi-structured documents, web mining.
Procedia PDF Downloads 33830533 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 19130532 Alternative Approaches to Community Involvement in Resettlement Schemes to Prevent Potential Conflicts: Case Study in Chibuto District, Mozambique
Authors: Constâncio Augusto Machanguana
Abstract:
The world over, resettling communities, for whatever purpose (mining, dams, forestry and wildlife management, roads, or facilitating services delivery), often leads to tensions between those resettled, the investors, and the local and national governments involved in the process. Causes include unclear government legislation and regulations, confusing Corporate Social Responsibility policies and guidelines, and other social-economic policies leading to unrealistic expectations among those being resettled, causing frustrations within the community, shifting them to any imminent conflict against the investors (company). The exploitation of heavy mineral sands along Mozambique’s long coastline and hinterland has not been providing a benefit for the affected communities. A case in point is the exploration, since 2018, of heavy sands in Chibuto District in the Southern Province of Gaza. A likely contributing factor is the standard type of socio-economic surveys and community involvement processes that could smooth the relationship among the parties. This research aims to investigate alternative processes to plan, initiate and guide resettlement processes in such a way that tensions and conflicts are avoided. Based on the process already finished, compared to similar cases along with the country, mixed methods to collect primary data were adopted: three focus groups of 125 people, representing 324 resettled householders; five semi-structured interviews with relevant stakeholders such as the local government, NGO’s and local leaders to understand their role in all stages of the process. The preliminary results show that the community has limited or no understanding of the potential impacts of these large-scale explorations, and the apparent harmony between the parties (community and company) may hide the dissatisfaction of those resettled. So, rather than focusing on negative mining impacts, the research contributes to science by identifying the best resettlement approach that can be replicated in other contexts along with the country in the actual context of the new discovery of mineral resources.Keywords: conflict mitigation, resettlement, mining, Mozambique
Procedia PDF Downloads 11630531 Abandoned Mine Methane Mitigation in the United States
Authors: Jerome Blackman, Pamela Franklin, Volha Roshchanka
Abstract:
The US coal mining sector accounts for 6% of total US Methane emissions (2021). 60% of US coal mining methane emissions come from active underground mine ventilation systems. Abandoned mines contribute about 13% of methane emissions from coal mining. While there are thousands of abandoned underground coal mines in the US, the Environmental Protection Agency (EPA) estimates that fewer than 100 have sufficient methane resources for viable methane recovery and use projects. Many abandoned mines are in remote areas far from potential energy customers and may be flooded, further complicating methane recovery. Because these mines are no longer active, recovery projects can be simpler to implement.Keywords: abandoned mines, coal mine methane, coal mining, methane emissions, methane mitigation, recovery and use
Procedia PDF Downloads 7830530 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-
Authors: Nieto Bernal Wilson, Carmona Suarez Edgar
Abstract:
The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse
Procedia PDF Downloads 41230529 An Approach to Building a Recommendation Engine for Travel Applications Using Genetic Algorithms and Neural Networks
Authors: Adrian Ionita, Ana-Maria Ghimes
Abstract:
The lack of features, design and the lack of promoting an integrated booking application are some of the reasons why most online travel platforms only offer automation of old booking processes, being limited to the integration of a smaller number of services without addressing the user experience. This paper represents a practical study on how to improve travel applications creating user-profiles through data-mining based on neural networks and genetic algorithms. Choices made by users and their ‘friends’ in the ‘social’ network context can be considered input data for a recommendation engine. The purpose of using these algorithms and this design is to improve user experience and to deliver more features to the users. The paper aims to highlight a broader range of improvements that could be applied to travel applications in terms of design and service integration, while the main scientific approach remains the technical implementation of the neural network solution. The motivation of the technologies used is also related to the initiative of some online booking providers that have made the fact that they use some ‘neural network’ related designs public. These companies use similar Big-Data technologies to provide recommendations for hotels, restaurants, and cinemas with a neural network based recommendation engine for building a user ‘DNA profile’. This implementation of the ‘profile’ a collection of neural networks trained from previous user choices, can improve the usability and design of any type of application.Keywords: artificial intelligence, big data, cloud computing, DNA profile, genetic algorithms, machine learning, neural networks, optimization, recommendation system, user profiling
Procedia PDF Downloads 16430528 Enhancing Students’ Achievement, Interest and Retention in Chemistry through an Integrated Teaching/Learning Approach
Authors: K. V. F. Fatokun, P. A. Eniayeju
Abstract:
This study concerns the effects of concept mapping-guided discovery integrated teaching approach on the learning style and achievement of chemistry students. The sample comprised 162 senior secondary school (SS 2) students drawn from two science schools in Nasarawa State which have equivalent mean scores of 9.68 and 9.49 in their pre-test. Five instruments were developed and validated while the sixth was purely adopted by the investigator for the study, Four null hypotheses were tested at α = 0.05 level of significance. Chi square analysis showed that there is a significant shift in students’ learning style from accommodating and diverging to converging and assimilating when exposed to concept mapping- guided discovery approach. Also t-test and ANOVA that those in experimental group achieve and retain content learnt better. Results of the Scheffe’s test for multiple comparisons showed that boys in the experimental group performed better than girls. It is therefore concluded that the concept mapping-guided discovery integrated approach should be used in secondary schools to successfully teach electrochemistry. It is strongly recommended that chemistry teachers should be encouraged to adopt this method for teaching difficult concepts.Keywords: integrated teaching approach, concept mapping-guided discovery, achievement, retention, learning styles and interest
Procedia PDF Downloads 32930527 Green Crypto Mining: A Quantitative Analysis of the Profitability of Bitcoin Mining Using Excess Wind Energy
Authors: John Dorrell, Matthew Ambrosia, Abilash
Abstract:
This paper employs econometric analysis to quantify the potential profit wind farms can receive by allocating excess wind energy to power bitcoin mining machines. Cryptocurrency mining consumes a substantial amount of electricity worldwide, and wind energy produces a significant amount of energy that is lost because of the intermittent nature of the resource. Supply does not always match consumer demand. By combining the weaknesses of these two technologies, we can improve efficiency and a sustainable path to mine cryptocurrencies. This paper uses historical wind energy from the ERCOT network in Texas and cryptocurrency data from 2000-2021, to create 4-year return on investment projections. Our research model incorporates the price of bitcoin, the price of the miner, the hash rate of the miner relative to the network hash rate, the block reward, the bitcoin transaction fees awarded to the miners, the mining pool fees, the cost of the electricity and the percentage of time the miner will be running to demonstrate that wind farms generate enough excess energy to mine bitcoin profitably. Excess wind energy can be used as a financial battery, which can utilize wasted electricity by changing it into economic energy. The findings of our research determine that wind energy producers can earn profit while not taking away much if any, electricity from the grid. According to our results, Bitcoin mining could give as much as 1347% and 805% return on investment with the starting dates of November 1, 2021, and November 1, 2022, respectively, using wind farm curtailment. This paper is helpful to policymakers and investors in determining efficient and sustainable ways to power our economic future. This paper proposes a practical solution for the problem of crypto mining energy consumption and creates a more sustainable energy future for Bitcoin.Keywords: bitcoin, mining, economics, energy
Procedia PDF Downloads 3730526 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures
Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat
Abstract:
In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.Keywords: association rules, clustering, similarity measure, statistical approaches
Procedia PDF Downloads 32130525 Environmental Impact Assessment in Mining Regions with Remote Sensing
Authors: Carla Palencia-Aguilar
Abstract:
Calculations of Net Carbon Balance can be obtained by means of Net Biome Productivity (NBP), Net Ecosystem Productivity (NEP), and Net Primary Production (NPP). The latter is an important component of the biosphere carbon cycle and is easily obtained data from MODIS MOD17A3HGF; however, the results are only available yearly. To overcome data availability, bands 33 to 36 from MODIS MYD021KM (obtained on a daily basis) were analyzed and compared with NPP data from the years 2000 to 2021 in 7 sites where surface mining takes place in the Colombian territory. Coal, Gold, Iron, and Limestone were the minerals of interest. Scales and Units as well as thermal anomalies, were considered for net carbon balance per location. The NPP time series from the satellite images were filtered by using two Matlab filters: First order and Discrete Transfer. After filtering the NPP time series, comparing the graph results from the satellite’s image value, and running a linear regression, the results showed R2 from 0,72 to 0,85. To establish comparable units among NPP and bands 33 to 36, the Greenhouse Gas Equivalencies Calculator by EPA was used. The comparison was established in two ways: one by the sum of all the data per point per year and the other by the average of 46 weeks and finding the percentage that the value represented with respect to NPP. The former underestimated the total CO2 emissions. The results also showed that coal and gold mining in the last 22 years had less CO2 emissions than limestone, with an average per year of 143 kton CO2 eq for gold, 152 kton CO2 eq for coal, and 287 kton CO2 eq for iron. Limestone emissions varied from 206 to 441 kton CO2 eq. The maximum emission values from unfiltered data correspond to 165 kton CO2 eq. for gold, 188 kton CO2 eq. for coal, and 310 kton CO2 eq. for iron and limestone, varying from 231 to 490 kton CO2 eq. If the most pollutant limestone site improves its production technology, limestone could count with a maximum of 318 kton CO2 eq emissions per year, a value very similar respect to iron. The importance of gathering data is to establish benchmarks in order to attain 2050’s zero emissions goal.Keywords: carbon dioxide, NPP, MODIS, MINING
Procedia PDF Downloads 10530524 Dietary Risk Assessment of Green Leafy Vegetables (GLV) Due to Heavy Metals from Selected Mining Areas
Authors: Simon Mensah Ofosu
Abstract:
Illicit surface mining activities pollutes agricultural lands and water bodies and results in accumulation of heavy metals in vegetables cultivated in such areas. Heavy metal (HM) accumulation in vegetables is a serious food safety issues due to the adverse effects of metal toxicities, hence the need to investigate the levels of these metals in cultivated vegetables in the eastern region. Cocoyam leaves, cabbage and cucumber were sampled from selected farms in mining areas (Atiwa District) and non -mining areas (Yilo Krobo and East Akim District) of the region for the study. Levels of Cadmium, Lead, Mercury and Arsenic were investigated in the vegetables with Atomic Absorption Spectrometer, and the results statistically analyzed with Microsoft Office Excel (2013) Spread Sheet and ANOVA. Cadmium (Cd) and arsenic (As) were the highest and least concentrated HM in the vegetables sampled, respectively. The mean concentrations of Cd and Pb in cabbage (0.564 mg/kg, 0.470 mg/kg), cucumber (0.389 mg/kg, 0.190 mg/kg), cocoyam leaves (0.410 mg/kg, 0.256 mg/kg) respectively from the mining areas exceeded the permissible limits set by Joint FAO/WHO. The mean concentrations of the metals in vegetables from the mining and non-mining areas varied significantly (P<0.05). The Target Hazard Quotient (THQ) was used to assess the health risk posed to the human population via vegetable consumption. The THQ values of cadmium, mercury, and lead in adults and children through vegetable consumption in the mining areas were greater than 1 (THQ >1). This indicates the potential health risk that the children and adults may be facing. The THQ values of adults and children in the non-mining areas were less than the safe limit of 1 (THQ<1), hence no significant health risk posed to the population from such areas.Keywords: food safety, risk assessment, illicit mining, public health, contaminated vegetables
Procedia PDF Downloads 9530523 Presenting a Model for Predicting the State of Being Accident-Prone of Passages According to Neural Network and Spatial Data Analysis
Authors: Hamd Rezaeifar, Hamid Reza Sahriari
Abstract:
Accidents are considered to be one of the challenges of modern life. Due to the fact that the victims of this problem and also internal transportations are getting increased day by day in Iran, studying effective factors of accidents and identifying suitable models and parameters about this issue are absolutely essential. The main purpose of this research has been studying the factors and spatial data affecting accidents of Mashhad during 2007- 2008. In this paper it has been attempted to – through matching spatial layers on each other and finally by elaborating them with the place of accident – at the first step by adding landmarks of the accident and through adding especial fields regarding the existence or non-existence of effective phenomenon on accident, existing information banks of the accidents be completed and in the next step by means of data mining tools and analyzing by neural network, the relationship between these data be evaluated and a logical model be designed for predicting accident-prone spots with minimum error. The model of this article has a very accurate prediction in low-accident spots; yet it has more errors in accident-prone regions due to lack of primary data.Keywords: accident, data mining, neural network, GIS
Procedia PDF Downloads 4830522 A Method to Evaluate and Compare Web Information Extractors
Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman
Abstract:
Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.Keywords: web information extractors, information extraction evaluation method, Google scholar, web
Procedia PDF Downloads 24830521 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics
Procedia PDF Downloads 13530520 Concept Drifts Detection and Localisation in Process Mining
Authors: M. V. Manoj Kumar, Likewin Thomas, Annappa
Abstract:
Process mining provides methods and techniques for analyzing event logs recorded in modern information systems that support real-world operations. While analyzing an event-log, state-of-the-art techniques available in process mining believe that the operational process as a static entity (stationary). This is not often the case due to the possibility of occurrence of a phenomenon called concept drift. During the period of execution, the process can experience concept drift and can evolve with respect to any of its associated perspectives exhibiting various patterns-of-change with a different pace. Work presented in this paper discusses the main aspects to consider while addressing concept drift phenomenon and proposes a method for detecting and localizing the sudden concept drifts in control-flow perspective of the process by using features extracted by processing the traces in the process log. Our experimental results are promising in the direction of efficiently detecting and localizing concept drift in the context of process mining research discipline.Keywords: abrupt drift, concept drift, sudden drift, control-flow perspective, detection and localization, process mining
Procedia PDF Downloads 34830519 Reclamation of Mining Using Vegetation - A Comparative Study of Open Pit Mining
Authors: G. Surendra Babu
Abstract:
We all know the importance of mineral wealth, which has been buried inside the layers of the earth for decades. These are the natural energy sources that are used in our day to day life like fuel, electricity, construction, etc. but the process of extraction causes damage to the nature that can’t be returned back and which are left over after completion of mining we can see these are barren from decades these remain unused degraded land. Most of them are covered with vegetation before the start during mining which damages the native vegetation of the region and disturbs the watershed boundary of the regions and it also disturbs the biodiversity of the reign. The major motto of the study is to understand the various issues that are found and to understand various methods of reclamations process that are suitable for revegetating and also variously practiced which are carried out in the different case studies and government guidelines procedure of lease licenses which includes the environmental clearances and also to study the vegetation pattern according to the major issues identified. And finally suggesting the new guidelines with respect to the old guidelines which helps in the revegetation of the mine-sites which helps in establishing of its own sustainable ecosystem in future.Keywords: reclamation, open-pit mining, revegetation, reclamation methods
Procedia PDF Downloads 19330518 Development of Knowledge Discovery Based Interactive Decision Support System on Web Platform for Maternal and Child Health System Strengthening
Authors: Partha Saha, Uttam Kumar Banerjee
Abstract:
Maternal and Child Healthcare (MCH) has always been regarded as one of the important issues globally. Reduction of maternal and child mortality rates and increase of healthcare service coverage were declared as one of the targets in Millennium Development Goals till 2015 and thereafter as an important component of the Sustainable Development Goals. Over the last decade, worldwide MCH indicators have improved but could not match the expected levels. Progress of both maternal and child mortality rates have been monitored by several researchers. Each of the studies has stated that only less than 26% of low-income and middle income countries (LMICs) were on track to achieve targets as prescribed by MDG4. Average worldwide annual rate of reduction of under-five mortality rate and maternal mortality rate were 2.2% and 1.9% as on 2011 respectively whereas rates should be minimum 4.4% and 5.5% annually to achieve targets. In spite of having proven healthcare interventions for both mothers and children, those could not be scaled up to the required volume due to fragmented health systems, especially in the developing and under-developed countries. In this research, a knowledge discovery based interactive Decision Support System (DSS) has been developed on web platform which would assist healthcare policy makers to develop evidence-based policies. To achieve desirable results in MCH, efficient resource planning is very much required. In maximum LMICs, resources are big constraint. Knowledge, generated through this system, would help healthcare managers to develop strategic resource planning for combatting with issues like huge inequity and less coverage in MCH. This system would help healthcare managers to accomplish following four tasks. Those are a) comprehending region wise conditions of variables related with MCH, b) identifying relationships within variables, c) segmenting regions based on variables status, and d) finding out segment wise key influential variables which have major impact on healthcare indicators. Whole system development process has been divided into three phases. Those were i) identifying contemporary issues related with MCH services and policy making; ii) development of the system; and iii) verification and validation of the system. More than 90 variables under three categories, such as a) educational, social, and economic parameters; b) MCH interventions; and c) health system building blocks have been included into this web-based DSS and five separate modules have been developed under the system. First module has been designed for analysing current healthcare scenario. Second module would help healthcare managers to understand correlations among variables. Third module would reveal frequently-occurring incidents along with different MCH interventions. Fourth module would segment regions based on previously mentioned three categories and in fifth module, segment-wise key influential interventions will be identified. India has been considered as case study area in this research. Data of 601 districts of India has been used for inspecting effectiveness of those developed modules. This system has been developed by importing different statistical and data mining techniques on Web platform. Policy makers would be able to generate different scenarios from the system before drawing any inference, aided by its interactive capability.Keywords: maternal and child heathcare, decision support systems, data mining techniques, low and middle income countries
Procedia PDF Downloads 25930517 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection
Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei
Abstract:
Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.Keywords: data mining, industrial system, multivariate time series, anomaly detection
Procedia PDF Downloads 1730516 Determination of Safe Ore Extraction Methodology beneath Permanent Extraction in a Lead Zinc Mine with the Help of FLAC3D Numerical Model
Authors: Ayan Giri, Lukaranjan Phukan, Shantanu Karmakar
Abstract:
Structure and tectonics play a vital role in ore genesis and deposition. The existence of a swelling structure below the current level of a mine leads to the discovery of ores below some permeant developments of the mine. The discovery and the extraction of the ore body are very critical to sustain the business requirement of the mine. The challenge was to extract the ore without hampering the global stability of the mine. In order to do so, different mining options were considered and analysed by numerical modelling in FLAC3d software. The constitutive model prepared for this simulation is the improved unified constitutive model, which can better and more accurately predict the stress-strain relationships in a continuum model. The IUCM employs the Hoek-Brown criterion to determine the instantaneous Mohr-Coulomb parameters cohesion (c) and friction (ɸ) at each level of confining stress. The extra swelled part can be dimensioned as north-south strike width 50m, east-west strike width 50m. On the north side, already a stope (P1) is excavated of the dimension of 25m NS width. The different options considered were (a) Open stoping of extraction of southern part (P0) of 50m to the full extent, (b) Extraction of the southern part of 25m, then filling of both the primaries and extraction of secondary (S0) 25m in between. (c) Extraction of the southern part (P0) completely, preceded by backfill and modify the design of the secondary (S0) for the overall stability of the permanent excavation above the stoping.Keywords: extraction, IUCM, FLAC 3D, stoping, tectonics
Procedia PDF Downloads 21430515 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering
Authors: K. Umbleja, M. Ichino
Abstract:
Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis
Procedia PDF Downloads 16230514 Application of Data Mining for Aquifer Environmental Assessment
Authors: Saman Javadi, Mehdi Hashemy, Mohahammad Mahmoodi
Abstract:
Vulnerability maps are employed as an important solution in order to handle entrance of pollution into the aquifers. The common way to provide vulnerability map is DRASTIC. Meanwhile, application of the method is not easy to apply for any aquifer due to choosing appropriate constant values of weights and ranks. In this study, a new approach using k-means clustering is applied to make vulnerability maps. Four features of depth to groundwater, hydraulic conductivity, recharge value and vadose zone were considered at the same time as features of clustering. Five regions are recognized out of the case study represent zones with different level of vulnerability. The finding results show that clustering provides a realistic vulnerability map so that, Pearson’s correlation coefficients between nitrate concentrations and clustering vulnerability is obtained 61%.Keywords: clustering, data mining, groundwater, vulnerability assessment
Procedia PDF Downloads 60430513 Attributes That Influence Respondents When Choosing a Mate in Internet Dating Sites: An Innovative Matching Algorithm
Authors: Moti Zwilling, Srečko Natek
Abstract:
This paper aims to present an innovative predictive analytics analysis in order to find the best combination between two consumers who strive to find their partner or in internet sites. The methodology shown in this paper is based on analysis of consumer preferences and involves data mining and machine learning search techniques. The study is composed of two parts: The first part examines by means of descriptive statistics the correlations between a set of parameters that are taken between man and women where they intent to meet each other through the social media, usually the internet. In this part several hypotheses were examined and statistical analysis were taken place. Results show that there is a strong correlation between the affiliated attributes of man and woman as long as concerned to how they present themselves in a social media such as "Facebook". One interesting issue is the strong desire to develop a serious relationship between most of the respondents. In the second part, the authors used common data mining algorithms to search and classify the most important and effective attributes that affect the response rate of the other side. Results exhibit that personal presentation and education background are found as most affective to achieve a positive attitude to one's profile from the other mate.Keywords: dating sites, social networks, machine learning, decision trees, data mining
Procedia PDF Downloads 29530512 Library on the Cloud: Universalizing Libraries Based on Virtual Space
Authors: S. Vanaja, P. Panneerselvam, S. Santhanakarthikeyan
Abstract:
Cloud Computing is a latest trend in Libraries. Entering in to cloud services, Librarians can suit the present information handling and they are able to satisfy needs of the knowledge society. Libraries are now in the platform of universalizing all its information to users and they focus towards clouds which gives easiest access to data and application. Cloud computing is a highly scalable platform promising quick access to hardware and software over the internet, in addition to easy management and access by non-expert users. In this paper, we discuss the cloud’s features and its potential applications in the library and information centers, how cloud computing actually works is illustrated in this communication and how it will be implemented. It discuss about what are the needs to move to cloud, process of migration to cloud. In addition to that this paper assessed the practical problems during migration in libraries, advantages of migration process and what are the measures that Libraries should follow during migration in to cloud. This paper highlights the benefits and some concerns regarding data ownership and data security on the cloud computing.Keywords: cloud computing, cloud-service, cloud based-ILS, cloud-providers, discovery service, IaaS, PaaS, SaaS, virtualization, Web scale access
Procedia PDF Downloads 66330511 Design of a Service-Enabled Dependable Integration Environment
Authors: Fuyang Peng, Donghong Li
Abstract:
The aim of information systems integration is to make all the data sources, applications and business flows integrated into the new environment so that unwanted redundancies are reduced and bottlenecks and mismatches are eliminated. Two issues have to be dealt with to meet such requirements: the software architecture that supports resource integration, and the adaptor development tool that help integration and migration of legacy applications. In this paper, a service-enabled dependable integration environment (SDIE), is presented, which has two key components, i.e., a dependable service integration platform and a legacy application integration tool. For the dependable platform for service integration, the service integration bus, the service management framework, the dependable engine for service composition, and the service registry and discovery components are described. For the legacy application integration tool, its basic organization, functionalities and dependable measures taken are presented. Due to its service-oriented integration model, the light-weight extensible container, the service component combination-oriented p-lattice structure, and other features, SDIE has advantages in openness, flexibility, performance-price ratio and feature support over commercial products, is better than most of the open source integration software in functionality, performance and dependability support.Keywords: application integration, dependability, legacy, SOA
Procedia PDF Downloads 36130510 Static and Dynamic Tailings Dam Monitoring with Accelerometers
Authors: Cristiana Ortigão, Antonio Couto, Thiago Gabriel
Abstract:
In the wake of Samarco Fundão’s failure in 2015 followed by Vale’s Brumadinho disaster in 2019, the Brazilian National Mining Agency started a comprehensive dam safety programmed to rank dam safety risks and establish monitoring and analysis procedures. This paper focuses on the use of accelerometers for static and dynamic applications. Static applications may employ tiltmeters, as an example shown later in this paper. Dynamic monitoring of a structure with accelerometers yields its dynamic signature and this technique has also been successfully used in Brazil and this paper gives an example of tailings dam.Keywords: instrumentation, dynamic, monitoring, tailings, dams, tiltmeters, automation
Procedia PDF Downloads 15030509 High-Throughput, Purification-Free, Multiplexed Profiling of Circulating miRNA for Discovery, Validation, and Diagnostics
Authors: J. Hidalgo de Quintana, I. Stoner, M. Tackett, G. Doran, C. Rafferty, A. Windemuth, J. Tytell, D. Pregibon
Abstract:
We have developed the Multiplexed Circulating microRNA assay that allows the detection of up to 68 microRNA targets per sample. The assay combines particlebased multiplexing, using patented Firefly hydrogel particles, with single step RT-PCR signal. Thus, the Circulating microRNA assay leverages PCR sensitivity while eliminating the need for separate reverse transcription reactions and mitigating amplification biases introduced by target-specific qPCR. Furthermore, the ability to multiplex targets in each well eliminates the need to split valuable samples into multiple reactions. Results from the Circulating microRNA assay are interpreted using Firefly Analysis Workbench, which allows visualization, normalization, and export of experimental data. To aid discovery and validation of biomarkers, we have generated fixed panels for Oncology, Cardiology, Neurology, Immunology, and Liver Toxicology. Here we present the data from several studies investigating circulating and tumor microRNA, showcasing the ability of the technology to sensitively and specifically detect microRNA biomarker signatures from fluid specimens.Keywords: biomarkers, biofluids, miRNA, photolithography, flowcytometry
Procedia PDF Downloads 37030508 Zika Virus NS5 Protein Potential Inhibitors: An Enhanced in silico Approach in Drug Discovery
Authors: Pritika Ramharack, Mahmoud E. S. Soliman
Abstract:
The re-emerging Zika virus is an arthropod-borne virus that has been described to have explosive potential as a worldwide pandemic. The initial transmission of the virus was through a mosquito vector, however, evolving modes of transmission has allowed the spread of the disease over continents. The virus already been linked to irreversible chronic central nervous system (CNS) conditions. The concerns of the scientific and clinical community are the consequences of Zika viral mutations, thus suggesting the urgent need for viral inhibitors. There have been large strides in vaccine development against the virus but there are still no FDA-approved drugs available. Rapid rational drug design and discovery research is fundamental in the production of potent inhibitors against the virus that will not just mask the virus, but destroy it completely. In silico drug design allows for this prompt screening of potential leads, thus decreasing the consumption of precious time and resources. This study demonstrates an optimized and proven screening technique in the discovery of two potential small molecule inhibitors of Zika virus Methyltransferase and RNA-dependent RNA polymerase. This in silico “per-residue energy decomposition pharmacophore” virtual screening approach will be critical in aiding scientists in the discovery of not only effective inhibitors of Zika viral targets, but also a wide range of anti-viral agents.Keywords: NS5 protein inhibitors, per-residue decomposition, pharmacophore model, virtual screening, Zika virus
Procedia PDF Downloads 22930507 Modeling Optimal Lipophilicity and Drug Performance in Ligand-Receptor Interactions: A Machine Learning Approach to Drug Discovery
Authors: Jay Ananth
Abstract:
The drug discovery process currently requires numerous years of clinical testing as well as money just for a single drug to earn FDA approval. For drugs that even make it this far in the process, there is a very slim chance of receiving FDA approval, resulting in detrimental hurdles to drug accessibility. To minimize these inefficiencies, numerous studies have implemented computational methods, although few computational investigations have focused on a crucial feature of drugs: lipophilicity. Lipophilicity is a physical attribute of a compound that measures its solubility in lipids and is a determinant of drug efficacy. This project leverages Artificial Intelligence to predict the impact of a drug’s lipophilicity on its performance by accounting for factors such as binding affinity and toxicity. The model predicted lipophilicity and binding affinity in the validation set with very high R² scores of 0.921 and 0.788, respectively, while also being applicable to a variety of target receptors. The results expressed a strong positive correlation between lipophilicity and both binding affinity and toxicity. The model helps in both drug development and discovery, providing every pharmaceutical company with recommended lipophilicity levels for drug candidates as well as a rapid assessment of early-stage drugs prior to any testing, eliminating significant amounts of time and resources currently restricting drug accessibility.Keywords: drug discovery, lipophilicity, ligand-receptor interactions, machine learning, drug development
Procedia PDF Downloads 11130506 Applications of Hyperspectral Remote Sensing: A Commercial Perspective
Authors: Tuba Zahra, Aakash Parekh
Abstract:
Hyperspectral remote sensing refers to imaging of objects or materials in narrow conspicuous spectral bands. Hyperspectral images (HSI) enable the extraction of spectral signatures for objects or materials observed. These images contain information about the reflectance of each pixel across the electromagnetic spectrum. It enables the acquisition of data simultaneously in hundreds of spectral bands with narrow bandwidths and can provide detailed contiguous spectral curves that traditional multispectral sensors cannot offer. The contiguous, narrow bandwidth of hyperspectral data facilitates the detailed surveying of Earth's surface features. This would otherwise not be possible with the relatively coarse bandwidths acquired by other types of imaging sensors. Hyperspectral imaging provides significantly higher spectral and spatial resolution. There are several use cases that represent the commercial applications of hyperspectral remote sensing. Each use case represents just one of the ways that hyperspectral satellite imagery can support operational efficiency in the respective vertical. There are some use cases that are specific to VNIR bands, while others are specific to SWIR bands. This paper discusses the different commercially viable use cases that are significant for HSI application areas, such as agriculture, mining, oil and gas, defense, environment, and climate, to name a few. Theoretically, there is n number of use cases for each of the application areas, but an attempt has been made to streamline the use cases depending upon economic feasibility and commercial viability and present a review of literature from this perspective. Some of the specific use cases with respect to agriculture are crop species (sub variety) detection, soil health mapping, pre-symptomatic crop disease detection, invasive species detection, crop condition optimization, yield estimation, and supply chain monitoring at scale. Similarly, each of the industry verticals has a specific commercially viable use case that is discussed in the paper in detail.Keywords: agriculture, mining, oil and gas, defense, environment and climate, hyperspectral, VNIR, SWIR
Procedia PDF Downloads 7930505 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features
Authors: Bushra Zafar, Usman Qamar
Abstract:
Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection
Procedia PDF Downloads 318