Search results for: educational data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26266

Search results for: educational data mining

26056 Leveraging Power BI for Advanced Geotechnical Data Analysis and Visualization in Mining Projects

Authors: Elaheh Talebi, Fariba Yavari, Lucy Philip, Lesley Town

Abstract:

The mining industry generates vast amounts of data, necessitating robust data management systems and advanced analytics tools to achieve better decision-making processes in the development of mining production and maintaining safety. This paper highlights the advantages of Power BI, a powerful intelligence tool, over traditional Excel-based approaches for effectively managing and harnessing mining data. Power BI enables professionals to connect and integrate multiple data sources, ensuring real-time access to up-to-date information. Its interactive visualizations and dashboards offer an intuitive interface for exploring and analyzing geotechnical data. Advanced analytics is a collection of data analysis techniques to improve decision-making. Leveraging some of the most complex techniques in data science, advanced analytics is used to do everything from detecting data errors and ensuring data accuracy to directing the development of future project phases. However, while Power BI is a robust tool, specific visualizations required by geotechnical engineers may have limitations. This paper studies the capability to use Python or R programming within the Power BI dashboard to enable advanced analytics, additional functionalities, and customized visualizations. This dashboard provides comprehensive tools for analyzing and visualizing key geotechnical data metrics, including spatial representation on maps, field and lab test results, and subsurface rock and soil characteristics. Advanced visualizations like borehole logs and Stereonet were implemented using Python programming within the Power BI dashboard, enhancing the understanding and communication of geotechnical information. Moreover, the dashboard's flexibility allows for the incorporation of additional data and visualizations based on the project scope and available data, such as pit design, rock fall analyses, rock mass characterization, and drone data. This further enhances the dashboard's usefulness in future projects, including operation, development, closure, and rehabilitation phases. Additionally, this helps in minimizing the necessity of utilizing multiple software programs in projects. This geotechnical dashboard in Power BI serves as a user-friendly solution for analyzing, visualizing, and communicating both new and historical geotechnical data, aiding in informed decision-making and efficient project management throughout various project stages. Its ability to generate dynamic reports and share them with clients in a collaborative manner further enhances decision-making processes and facilitates effective communication within geotechnical projects in the mining industry.

Keywords: geotechnical data analysis, power BI, visualization, decision-making, mining industry

Procedia PDF Downloads 56
26055 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques

Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian

Abstract:

Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.

Keywords: data mining, k-means, road traffic accidents, Waze, Weka

Procedia PDF Downloads 379
26054 Formal Innovations vs. Informal Innovations: The Case of the Mining Sector in Nigeria

Authors: Jegede Oluseye Oladayo

Abstract:

The study mapped innovation activities in the formal and informal mining sector in Nigeria. Data were collected through primary and secondary sources. Primary data were collected through guided questionnaire administration, guided interviews and personal observation. A purposive sampling method was adopted to select firms that are micro, small and medium enterprises. The study covered 100 (50 in the formal sector and 50 in the informal sector) purposively selected companies in south-western Nigeria. Secondary data were collected from different published sources. Data were analysed using descriptive and inferential statistics. Of the four types of technological innovations sampled, organisational innovation was found to be highest both in the formal (100%) and informal (100%) sectors, followed by process innovation: 60% in the formal sector and 28% in the informal sector, marketing innovation and diffusion based innovation were implemented by 64% and 4% respectively in the formal sector. There were no R&D activities (intramural or extramural) in both sectors, however, innovation activities occur at moderate levels in the formal sector. This is characterised by acquisition of machinery, equipment, hardware (100%), software (56), training (82%) and acquisition of external knowledge (60%) in the formal sector. In the informal sector, innovation activities were characterised by acquisition of external knowledge (100%), training/learning by experience (100%) and acquisition of tools (68%). The impact of innovation on firm’s performance in the formal sector was expressed mainly as increased capacity of production (100%), reduced production cost per unit of labour (88%), compliance with governmental regulatory requirements (72%) and entry on new markets (60%). In the informal sector, the impact of innovation was mainly expressed in improved flexibility of production (70%) and machinery/energy efficiency (70%). The important technological driver of process innovation in the mining sector was acquisition of machinery which accounts for the prevalence of 100% both in the formal and informal sectors. Next to this is training and re-training of technical staff, 74% in both the formal and the informal sector. Other factors influencing organisational innovation are skill of workforce with a prevalence of 80% in both the formal and informal sector. The important technological drivers include educational background of the manager/head of technical department (54%) for organisational innovation and (50%) for process innovation in the formal sector. The study concluded that innovation competence of the firms was mostly organisational changes.

Keywords: innovation prevalence, innovation activities, innovation performance, innovation drivers

Procedia PDF Downloads 354
26053 Cirrhosis Mortality Prediction as Classification using Frequent Subgraph Mining

Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride

Abstract:

In this work, we use machine learning and novel data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. To the best of our knowledge, this is the first work to apply modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.

Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning

Procedia PDF Downloads 109
26052 Most Important Educational Planning Issues in the Developing Countries

Authors: Naeem Khan

Abstract:

In 1971 Williams in his essay titled "What Educational Planning is About in Higher Education" defined educational planning as "planning in education, as in anything else consist essentially of deciding, in advance, what you want, to do and how you are going to do in". In the “World Year book of Education”. While Anderson and Bowman in 1976 in their joint article titled "Theoretical Considerations in Educational Planning" defined it as "the process of preparing a set of decisions for future action pertaining in education". There are so many other definitions which are related to educational planning in which every one stress on the importance of educational planning. But developing countries face a lot of problems related to the educational planning and this paper is to discuss few of them.

Keywords: educational planning, problems, developing countries, education system,

Procedia PDF Downloads 519
26051 Assessing Carbon Stock and Sequestration of Reforestation Species on Old Mining Sites in Morocco Using the DNDC Model

Authors: Nabil Elkhatri, Mohamed Louay Metougui, Ngonidzashe Chirinda

Abstract:

Mining activities have left a legacy of degraded landscapes, prompting urgent efforts for ecological restoration. Reforestation holds promise as a potent tool to rehabilitate these old mining sites, with the potential to sequester carbon and contribute to climate change mitigation. This study focuses on evaluating the carbon stock and sequestration potential of reforestation species in the context of Morocco's mining areas, employing the DeNitrification-DeComposition (DNDC) model. The research is grounded in recognizing the need to connect theoretical models with practical implementation, ensuring that reforestation efforts are informed by accurate and context-specific data. Field data collection encompasses growth patterns, biomass accumulation, and carbon sequestration rates, establishing an empirical foundation for the study's analyses. By integrating the collected data with the DNDC model, the study aims to provide a comprehensive understanding of carbon dynamics within reforested ecosystems on old mining sites. The major findings reveal varying sequestration rates among different reforestation species, indicating the potential for species-specific optimization of reforestation strategies to enhance carbon capture. This research's significance lies in its potential to contribute to sustainable land management practices and climate change mitigation strategies. By quantifying the carbon stock and sequestration potential of reforestation species, the study serves as a valuable resource for policymakers, land managers, and practitioners involved in ecological restoration and carbon management. Ultimately, the study aligns with global objectives to rejuvenate degraded landscapes while addressing pressing climate challenges.

Keywords: carbon stock, carbon sequestration, DNDC model, ecological restoration, mining sites, Morocco, reforestation, sustainable land management.

Procedia PDF Downloads 38
26050 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining

Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato

Abstract:

Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.

Keywords: data mining, data science, trajectory, animal behavior

Procedia PDF Downloads 115
26049 Focus-Latent Dirichlet Allocation for Aspect-Level Opinion Mining

Authors: Mohsen Farhadloo, Majid Farhadloo

Abstract:

Aspect-level opinion mining that aims at discovering aspects (aspect identification) and their corresponding ratings (sentiment identification) from customer reviews have increasingly attracted attention of researchers and practitioners as it provides valuable insights about products/services from customer's points of view. Instead of addressing aspect identification and sentiment identification in two separate steps, it is possible to simultaneously identify both aspects and sentiments. In recent years many graphical models based on Latent Dirichlet Allocation (LDA) have been proposed to solve both aspect and sentiment identifications in a single step. Although LDA models have been effective tools for the statistical analysis of document collections, they also have shortcomings in addressing some unique characteristics of opinion mining. Our goal in this paper is to address one of the limitations of topic models to date; that is, they fail to directly model the associations among topics. Indeed in many text corpora, it is natural to expect that subsets of the latent topics have higher probabilities. We propose a probabilistic graphical model called focus-LDA, to better capture the associations among topics when applied to aspect-level opinion mining. Our experiments on real-life data sets demonstrate the improved effectiveness of the focus-LDA model in terms of the accuracy of the predictive distributions over held out documents. Furthermore, we demonstrate qualitatively that the focus-LDA topic model provides a natural way of visualizing and exploring unstructured collection of textual data.

Keywords: aspect-level opinion mining, document modeling, Latent Dirichlet Allocation, LDA, sentiment analysis

Procedia PDF Downloads 77
26048 Application of Granular Computing Paradigm in Knowledge Induction

Authors: Iftikhar U. Sikder

Abstract:

This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.

Keywords: concept approximation, granular computing, reducts, rough set theory, rule induction

Procedia PDF Downloads 497
26047 Presenting a Model for Predicting the State of Being Accident-Prone of Passages According to Neural Network and Spatial Data Analysis

Authors: Hamd Rezaeifar, Hamid Reza Sahriari

Abstract:

Accidents are considered to be one of the challenges of modern life. Due to the fact that the victims of this problem and also internal transportations are getting increased day by day in Iran, studying effective factors of accidents and identifying suitable models and parameters about this issue are absolutely essential. The main purpose of this research has been studying the factors and spatial data affecting accidents of Mashhad during 2007- 2008. In this paper it has been attempted to – through matching spatial layers on each other and finally by elaborating them with the place of accident – at the first step by adding landmarks of the accident and through adding especial fields regarding the existence or non-existence of effective phenomenon on accident, existing information banks of the accidents be completed and in the next step by means of data mining tools and analyzing by neural network, the relationship between these data be evaluated and a logical model be designed for predicting accident-prone spots with minimum error. The model of this article has a very accurate prediction in low-accident spots; yet it has more errors in accident-prone regions due to lack of primary data.

Keywords: accident, data mining, neural network, GIS

Procedia PDF Downloads 21
26046 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 133
26045 A General Strategy for Noise Assessment in Open Mining Industries

Authors: Diego Mauricio Murillo Gomez, Enney Leon Gonzalez Ramirez, Hugo Piedrahita, Jairo Yate

Abstract:

This paper proposes a methodology for the management of noise in open mining industries based on an integral concept, which takes into consideration occupational and environmental noise as a whole. The approach relies on the characterization of sources, the combination of several measurements’ techniques and the use of acoustic prediction software. A discussion about the difference between frequently used acoustic indicators such as Leq and LAV is carried out, aiming to establish common ground for homologation. The results show that the correct integration of this data not only allows for a more robust technical analysis but also for a more strategic route of intervention as several departments of the company are working together. Noise control measurements can be designed to provide a healthy acoustic surrounding in which the exposure workers but also the outdoor community is benefited.

Keywords: environmental noise, noise control, occupational noise, open mining

Procedia PDF Downloads 233
26044 A Study of Soil Heavy Metal Pollution in the Manganese Mining in Drama, Greece

Authors: A. Argiri, A. Molla, Tzouvalekas, E. Skoufogianni, N. Danalatos

Abstract:

The release of heavy metals into the environment has increased over the last years. In this study, 25 soil samples (0-15 cm) from the fields near the mining area in Drama region were selected. The samples were analyzed in the laboratory for their physicochemical properties and for seven “pseudo-total’’ heavy metals content, namely Pb, Zn, Cd, Cr, Cu, Ni, and Mn. The total metal concentrations (Pb, Zn, Cd, Cr, Cu, Ni and Mn) in digests were determined by using the atomic absorption spectrophotometer. According to the results, the mean concentration of the listed heavy metals in 25 soil samples are Cd 1.1 mg/kg, Cr 15 mg/kg, Cu 21.7 mg/kg, Ni 30.1 mg/kg, Pd 50.8 mg/kg, Zn 99.5 mg/kg and Mn 815.3 mg/kg. The results show that the heavy metals remain in the soil even if the mining closed many years ago.

Keywords: Greece, heavy metals, mining, pollution

Procedia PDF Downloads 93
26043 Environmental Impact Assessment in Mining Regions with Remote Sensing

Authors: Carla Palencia-Aguilar

Abstract:

Calculations of Net Carbon Balance can be obtained by means of Net Biome Productivity (NBP), Net Ecosystem Productivity (NEP), and Net Primary Production (NPP). The latter is an important component of the biosphere carbon cycle and is easily obtained data from MODIS MOD17A3HGF; however, the results are only available yearly. To overcome data availability, bands 33 to 36 from MODIS MYD021KM (obtained on a daily basis) were analyzed and compared with NPP data from the years 2000 to 2021 in 7 sites where surface mining takes place in the Colombian territory. Coal, Gold, Iron, and Limestone were the minerals of interest. Scales and Units as well as thermal anomalies, were considered for net carbon balance per location. The NPP time series from the satellite images were filtered by using two Matlab filters: First order and Discrete Transfer. After filtering the NPP time series, comparing the graph results from the satellite’s image value, and running a linear regression, the results showed R2 from 0,72 to 0,85. To establish comparable units among NPP and bands 33 to 36, the Greenhouse Gas Equivalencies Calculator by EPA was used. The comparison was established in two ways: one by the sum of all the data per point per year and the other by the average of 46 weeks and finding the percentage that the value represented with respect to NPP. The former underestimated the total CO2 emissions. The results also showed that coal and gold mining in the last 22 years had less CO2 emissions than limestone, with an average per year of 143 kton CO2 eq for gold, 152 kton CO2 eq for coal, and 287 kton CO2 eq for iron. Limestone emissions varied from 206 to 441 kton CO2 eq. The maximum emission values from unfiltered data correspond to 165 kton CO2 eq. for gold, 188 kton CO2 eq. for coal, and 310 kton CO2 eq. for iron and limestone, varying from 231 to 490 kton CO2 eq. If the most pollutant limestone site improves its production technology, limestone could count with a maximum of 318 kton CO2 eq emissions per year, a value very similar respect to iron. The importance of gathering data is to establish benchmarks in order to attain 2050’s zero emissions goal.

Keywords: carbon dioxide, NPP, MODIS, MINING

Procedia PDF Downloads 67
26042 Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala

Abstract:

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics

Procedia PDF Downloads 107
26041 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: cluster analysis, education, mathematics, profiles

Procedia PDF Downloads 100
26040 Quantification of GHGs Emissions from Electricity and Diesel Fuel Consumption in Basalt Mining Industry in Thailand

Authors: S. Kittipongvises, A. Dubsok

Abstract:

The mineral and mining industry is necessary for countries to have an adequate and reliable supply of materials to meet their socio-economic development. Despite its importance, the environmental impacts from mineral exploration are hugely significant. This study aimed to investigate and quantify the amount of GHGs emissions emitted from both electricity and diesel vehicle fuel consumption in basalt mining in Thailand. Plant A, located in the northeastern region of Thailand, was selected as a case study. Results indicated that total GHGs emissions from basalt mining and operation (Plant A) were approximately 2,501,086 kgCO2e and 1,997,412 kgCO2e in 2014 and 2015, respectively. The estimated carbon intensity ranged between 1.824 kgCO2e to 2.284 kgCO2e per ton of rock product. Scope 1 (direct emissions) was the dominant driver of its total GHGs compared to scope 2 (indirect emissions). As such, transport related combustion of diesel fuels generated the highest GHGs emission (65%) compared to emissions from purchased electricity (35%). Some of the potential implications for mining entities were also presented.

Keywords: basalt mining, diesel fuel, electricity, GHGs emissions, Thailand

Procedia PDF Downloads 235
26039 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction

Authors: Patricia Jiménez, Rafael Corchuelo

Abstract:

Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.

Keywords: information extraction, search heuristics, semi-structured documents, web mining.

Procedia PDF Downloads 308
26038 Bridging Educational Research and Policymaking: The Development of Educational Think Tank in China

Authors: Yumei Han, Ling Li, Naiqing Song, Xiaoping Yang, Yuping Han

Abstract:

Educational think tank is agreeably regarded as significant part of a nation’s soft power to promote the scientific and democratic level of educational policy making, and it plays critical role of bridging educational research in higher institutions and educational policy making. This study explores the concept, functions and significance of educational think tank in China, and conceptualizes a three dimensional framework to analyze the approaches of transforming research-based higher institutions into effective educational think tanks to serve educational policy making in the nation wide. Since 2014, the Ministry of Education P.R. China has been promoting the strategy of developing new type of educational think tanks in higher institutions, and such a strategy has been put into the agenda for the 13th Five Year Plan for National Education Development released in 2017.In such context, increasing scholars conduct studies to put forth strategies of promoting the development and transformation of new educational think tanks to serve educational policy making process. Based on literature synthesis, policy text analysis, and analysis of theories about policy making process and relationship between educational research and policy-making, this study constructed a three dimensional conceptual framework to address the following questions: (a) what are the new features of educational think tanks in the new era comparing traditional think tanks, (b) what are the functional objectives of the new educational think tanks, (c) what are the organizational patterns and mechanism of the new educational think tanks, (d) in what approaches traditional research-based higher institutions can be developed or transformed into think tanks to effectively serve the educational policy making process. The authors adopted case study approach on five influential education policy study centers affiliated with top higher institutions in China and applied the three dimensional conceptual framework to analyze their functional objectives, organizational patterns as well as their academic pathways that researchers use to contribute to the development of think tanks to serve education policy making process.Data was mainly collected through interviews with center administrators, leading researchers and academic leaders in the institutions. Findings show that: (a) higher institution based think tanks mainly function for multi-level objectives, providing evidence, theoretical foundations, strategies, or evaluation feedbacks for critical problem solving or policy-making on the national, provincial, and city/county level; (b) higher institution based think tanks organize various types of research programs for different time spans to serve different phases of policy planning, decision making, and policy implementation; (c) in order to transform research-based higher institutions into educational think tanks, the institutions must promote paradigm shift that promotes issue-oriented field studies, large data mining and analysis, empirical studies, and trans-disciplinary research collaborations; and (d) the five cases showed distinguished features in their way of constructing think tanks, and yet they also exposed obstacles and challenges such as independency of the think tanks, the discourse shift from academic papers to consultancy report for policy makers, weakness in empirical research methods, lack of experience in trans-disciplinary collaboration. The authors finally put forth implications for think tank construction in China and abroad.

Keywords: education policy-making, educational research, educational think tank, higher institution

Procedia PDF Downloads 140
26037 Trace Logo: A Notation for Representing Control-Flow of Operational Process

Authors: M. V. Manoj Kumar, Likewin Thomas, Annappa

Abstract:

Process mining research discipline bridges the gap between data mining and business process modeling and analysis, it offers the process-centric and end-to-end methods/techniques for analyzing information of real-world process detailed in operational event-logs. In this paper, we have proposed a notation called trace logo for graphically representing control-flow perspective (order of execution of activities) of process. A trace logo consists of a stack of activity names at each position, sizes of the activity name indicates their frequency in the traces and the total height of the activity depicts the information content of the position. A trace logo created from a set of aligned traces generated using Multiple Trace Alignment technique.

Keywords: consensus trace, process mining, multiple trace alignment, trace logo

Procedia PDF Downloads 328
26036 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 295
26035 Identification of Environmental Damage Due to Mining Area Bangka Islands in Indonesia

Authors: Aroma Elmina Martha

Abstract:

Environment affects the continuity of life and human well-being and the bodies of other living. Environmental quality is very closely related to the quality of life. Sustainability must be protected from damage due to the use of natural resources, such as tin mining in Bangka island. This research is a descriptive study, which identifies the environmental damage caused by mining land and sea in Bangka district. The approach used is juridical, social and economic. The study uses primary legal materials, secondary, and tertiary, equipped with field research. The analysis technique used is qualitative analysis. The impacts of mining on land among other physical and chemical damage, erosion and widening the depth of the river, a pool of micro-climate, the quality and feasibility, vegetation, wildlife and biodiversity, land values, social and economic. This mining causes damage to the soil structure, and puddles in the former digs which were not backfilled again. The impact of mining on the ocean such as changes in current surge, erosion and abrasion basic coastal waters, shoreline change, marine water quality changes, and changes in marine communities. The findings of the research show that tin mining in the sea also potentially have a significant impact on the life of the reef, populations of marine organisms. However, mining on land needs to consider the impact of the damage, so that the damage can be minimized. In the recovery process needs to be pursued by exploiting the rest of the pile of tin. Thus, mining activities should take into account the distance of beach sediment size, wave height, wave length, wave period, and the acceleration of gravity. The process of the tin washing should be done in a fairly safe area, thus avoiding damage to the coral reefs that will eventually reduce the population of marine life.

Keywords: abration, environmental damage, mining, shoreline

Procedia PDF Downloads 296
26034 Application of Data Mining for Aquifer Environmental Assessment

Authors: Saman Javadi, Mehdi Hashemy, Mohahammad Mahmoodi

Abstract:

Vulnerability maps are employed as an important solution in order to handle entrance of pollution into the aquifers. The common way to provide vulnerability map is DRASTIC. Meanwhile, application of the method is not easy to apply for any aquifer due to choosing appropriate constant values of weights and ranks. In this study, a new approach using k-means clustering is applied to make vulnerability maps. Four features of depth to groundwater, hydraulic conductivity, recharge value and vadose zone were considered at the same time as features of clustering. Five regions are recognized out of the case study represent zones with different level of vulnerability. The finding results show that clustering provides a realistic vulnerability map so that, Pearson’s correlation coefficients between nitrate concentrations and clustering vulnerability is obtained 61%.

Keywords: clustering, data mining, groundwater, vulnerability assessment

Procedia PDF Downloads 569
26033 Attributes That Influence Respondents When Choosing a Mate in Internet Dating Sites: An Innovative Matching Algorithm

Authors: Moti Zwilling, Srečko Natek

Abstract:

This paper aims to present an innovative predictive analytics analysis in order to find the best combination between two consumers who strive to find their partner or in internet sites. The methodology shown in this paper is based on analysis of consumer preferences and involves data mining and machine learning search techniques. The study is composed of two parts: The first part examines by means of descriptive statistics the correlations between a set of parameters that are taken between man and women where they intent to meet each other through the social media, usually the internet. In this part several hypotheses were examined and statistical analysis were taken place. Results show that there is a strong correlation between the affiliated attributes of man and woman as long as concerned to how they present themselves in a social media such as "Facebook". One interesting issue is the strong desire to develop a serious relationship between most of the respondents. In the second part, the authors used common data mining algorithms to search and classify the most important and effective attributes that affect the response rate of the other side. Results exhibit that personal presentation and education background are found as most affective to achieve a positive attitude to one's profile from the other mate.

Keywords: dating sites, social networks, machine learning, decision trees, data mining

Procedia PDF Downloads 275
26032 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: classification, data mining, spam filtering, naive bayes, decision tree

Procedia PDF Downloads 385
26031 Influence of Emotional Intelligence on Educational Supervision and Leadership Style in Saudi Arabia

Authors: Jawaher Bakheet Almudarra

Abstract:

An Educational Supervisor assists teachers to develop their competence and skills in teaching, solving educational problems, and to improve the teaching methods to suit the educational process. They evaluate their teachers and write reports based on their assessments. In 1957, the Saudi Ministry of Education instituted Educational Supervision to facilitate effective management of schools, however, there have been concerns that the Educational Supervision has not been effective in executing its mandate. Studies depicted that Educational supervision has not been effective because it has been marred by poor and autocratic leadership practices such as stringent inspection, commanding and judging. Therefore, there is need to consider some of the ways in which school outcomes can be enhanced through the improvement of Educational supervision practices. Emotional intelligence is a relatively new concept that can be integrated into the Saudi education system that is yet to be examined in-depth and embraced particularly in the realm of educational leadership. Its recognition and adoption may improve leadership practices among Educational supervisors. This study employed a qualitative interpretive approach that will focus on decoding, describing and interpreting the connection between emotional intelligence and leadership. The study also took into account the social constructions that include consciousness, language and shared meanings. The data collection took place in the Office of Educational Supervisors in Riyadh and involved 4 Educational supervisors and 20 teachers from both genders- male and female. The data collection process encompasses three methods namely; qualitative emotional intelligence self-assessment questionnaires, reflective semi-structured interviews, and open workshops. The questionnaires would explore whether the Educational supervisors understand the meaning of emotional intelligence and its significance in enhancing the quality of education system in Saudi Arabia. Subsequently, reflective semi-structured interviews were carried out with the Educational supervisors to explore the connection between their leadership styles and the way they conceptualise their emotionality. The open workshops will include discussions on emotional aspects of Educational supervisors’ practices and how Educational supervisors make use of the emotional intelligence discourse in their leadership and supervisory relationships.

Keywords: directors of educational supervision, emotional intelligence, educational leadership, education management

Procedia PDF Downloads 396
26030 A GIS Based Composite Land Degradation Assessment and Mapping of Tarkwa Mining Area

Authors: Bernard Kumi-Boateng, Kofi Bonsu

Abstract:

The clearing of vegetation in the Tarkwa Mining Area (TMA) for the purposes of mining, lumbering and development of settlement for the increasing population has caused a large scale denudation of the forest cover and erosion of the top soil thereby degrading the agriculture land. It is, therefore, essential to know the current status of land degradation in TMA so as to facilitate land conservation policy-making. The types of degradation, the extents of the degradations and their various degrees were combined to develop a composite land degradation index to assess the current status of land degradation in TMA using GIS based techniques. The assessment revealed that the most significant types of degradation in TMA were open pit and quarry mining; urbanisation and other construction projects; and surface scraping during land clearing. It was found that 21.62 % of the total area of TMA (353.07 km2) had high degradation index rating. It is recommended that decision makers use this assessment as a reference point for future initiatives that will be taken in order to develop land conservation policy.

Keywords: degradation, GIS, land, mining

Procedia PDF Downloads 320
26029 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 143
26028 Annual Effective Dose Associated with Radon in Groundwater Samples from Mining Communities Within the Ife-Ilesha Schist Belt, Southwestern Nigeria.

Authors: Paulinah Oyindamola Fasanmi, Matthew Omoniyi Isinkaye

Abstract:

In this study, the activity concentration of ²²²Rn in groundwater samples collected from gold and kaolin mining communities within the Ife-Ilesha schist belt, southwestern Nigeria, with their corresponding annual effective doses have been determined using the Durridge RAD-7, radon-in-water detector. The mean concentration of ²²²Rn in all the groundwater samples was 13.83 Bql-¹. In borehole water, ²²²Rn had a mean value of 20.68 Bql-¹, while it had a mean value of 11.67 Bql-¹ in well water samples. The mean activity concentration of radon obtained from the gold mining communities ranged from 1.6 Bql-¹ from Igun town to 4.8 Bql-¹ from Ilesha town. A higher mean value of 41.8 Bql-¹ was, however, obtained from Ijero, which is the kaolin mining community. The mean annual effective dose due to ingestion and inhalation of radon from groundwater samples was obtained to be 35.35 μSvyr-¹ and 34.86 nSvyr-¹, respectively. The mean annual ingestion dose estimated for well water samples was 29.90 μSvyr-¹, while 52.85 μSvyr-¹ was obtained for borehole water samples. On the other hand, the mean annual inhalation dose for well water was 29.49 nSvyr-¹, while for borehole water, 52.13 nSvyr-¹ was obtained. The mean annual effective dose due to ingestion of radon in groundwater from the gold mining communities ranged from 4.10 μSvyr-¹ from Igun to 13.1 μSvyr-¹ from Ilesha, while a mean value of 106.7 μSvyr-¹ was obtained from Ijero kaolin mining community. For inhalation, the mean value varied from 4.0 nSvyr-¹ from Igun to 12.9 nSvyr-¹ from Ilesha, while 105.2 nSvyr-¹ was obtained from the kaolin mining community. The mean annual effective dose due to ingestion and inhalation is lower than the reference level of 100 μSvyr-¹ recommended by World Health Organization except for values obtained from Ijero kaolin mining community, which exceeded the reference levels. It has been concluded that as far as radon-related health risks are concerned, groundwater from gold mining communities is generally safe, while groundwater from kaolin mining communities needs mitigation and monitoring. It has been discovered that Kaolin mining impacts groundwater with ²²²Rn than gold mining. Also, the radon level in borehole water exceeds its level in well water.

Keywords: 222Rn, Groundwater, Radioactivity, Annual Effective Dose, Mining.

Procedia PDF Downloads 41
26027 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 268