Search results for: Web Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1084

Search results for: Web Mining

484 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 669
483 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 387
482 Chelator-assisted Phytoextraction of Nickel from Nickeliferous Lateritic Soil by Phyllanthus sp. nov.

Authors: Grecco M. Ante, Princess Rochelle O. Gan

Abstract:

Plants that can absorb greater than 10,000 µg Ni/g dry mass in their stems and leaves are termed as ‘hypernickelophores’. Chelators are chemicals that make the metals in the soil more soluble, making them a potential enhancer for phytoextraction. This study aims to observe the effect of different concentrations of the chelating agent ethylene diamine tetraacetate (EDTA) on the metal uptake (or rate of phytoextraction) of Nickel by Phyllanthus sp. nov. The plant is found to be a hyperickelophore in normal conditions. The addition of EDTA increased the metal uptake of the plant. The increasing amount of the chelating agent causes a decrease in the phytoextraction of the plant but moves the onset of its peak of maximum nickel content in its tissue to an earlier time. The chelator-assisted phytoextraction of nickel by Phyllanthus sp. nov. is proven to be an efficient auxiliary mining operation for nickel laterite mines.

Keywords: phytomining, Phyllanthus sp. nov., EDTA, nickel, laterite

Procedia PDF Downloads 465
481 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: case-based reasoning, decision tree, stock selection, machine learning

Procedia PDF Downloads 419
480 Classification of Contexts for Mentioning Love in Interviews with Victims of the Holocaust

Authors: Marina Yurievna Aleksandrova

Abstract:

Research of the Holocaust retains value not only for history but also for sociology and psychology. One of the most important fields of study is how people were coping during and after this traumatic event. The aim of this paper is to identify the main contexts of the topic of love and to determine which contexts are more characteristic for different groups of victims of the Holocaust (gender, nationality, age). In this research, transcripts of interviews with Holocaust victims that were collected during 1946 for the "Voices of the Holocaust" project were used as data. Main contexts were analyzed with methods of network analysis and latent semantic analysis and classified by gender, age, and nationality with random forest. The results show that love is articulated and described significantly differently for male and female informants, nationality is shown results with lower values of quality metrics, as well as the age.

Keywords: Holocaust, latent semantic analysis, network analysis, text-mining, random forest

Procedia PDF Downloads 180
479 Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems

Authors: Bruno Trstenjak, Dzenana Donko

Abstract:

Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.

Keywords: case based reasoning, classification, expert's knowledge, hybrid model

Procedia PDF Downloads 367
478 Optimum Drilling States in Down-the-Hole Percussive Drilling: An Experimental Investigation

Authors: Joao Victor Borges Dos Santos, Thomas Richard, Yevhen Kovalyshen

Abstract:

Down-the-hole (DTH) percussive drilling is an excavation method that is widely used in the mining industry due to its high efficiency in fragmenting hard rock formations. A DTH hammer system consists of a fluid driven (air or water) piston and a drill bit; the reciprocating movement of the piston transmits its kinetic energy to the drill bit by means of stress waves that propagate through the drill bit towards the rock formation. In the literature of percussive drilling, the existence of an optimum drilling state (Sweet Spot) is reported in some laboratory and field experimental studies. An optimum rate of penetration is achieved for a specific range of axial thrust (or weight-on-bit) beyond which the rate of penetration decreases. Several authors advance different explanations as possible root causes to the occurrence of the Sweet Spot, but a universal explanation or consensus does not exist yet. The experimental investigation in this work was initiated with drilling experiments conducted at a mining site. A full-scale drilling rig (equipped with a DTH hammer system) was instrumented with high precision sensors sampled at a very high sampling rate (kHz). Data was collected while two boreholes were being excavated, an in depth analysis of the recorded data confirmed that an optimum performance can be achieved for specific ranges of input thrust (weight-on-bit). The high sampling rate allowed to identify the bit penetration at each single impact (of the piston on the drill bit) as well as the impact frequency. These measurements provide a direct method to identify when the hammer does not fire, and drilling occurs without percussion, and the bit propagate the borehole by shearing the rock. The second stage of the experimental investigation was conducted in a laboratory environment with a custom-built equipment dubbed Woody. Woody allows the drilling of shallow holes few centimetres deep by successive discrete impacts from a piston. After each individual impact, the bit angular position is incremented by a fixed amount, the piston is moved back to its initial position at the top of the barrel, and the air pressure and thrust are set back to their pre-set values. The goal is to explore whether the observed optimum drilling state stems from the interaction between the drill bit and the rock (during impact) or governed by the overall system dynamics (between impacts). The experiments were conducted on samples of Calca Red, with a drill bit of 74 millimetres (outside diameter) and with weight-on-bit ranging from 0.3 kN to 3.7 kN. Results show that under the same piston impact energy and constant angular displacement of 15 degrees between impact, the average drill bit rate of penetration is independent of the weight-on-bit, which suggests that the sweet spot is not caused by intrinsic properties of the bit-rock interface.

Keywords: optimum drilling state, experimental investigation, field experiments, laboratory experiments, down-the-hole percussive drilling

Procedia PDF Downloads 89
477 Upgrade of Value Chains and the Effect on Resilience of Russia’s Coal Industry and Receiving Regions on the Path of Energy Transition

Authors: Sergey Nikitenko, Vladimir Klishin, Yury Malakhov, Elena Goosen

Abstract:

Transition to renewable energy sources (solar, wind, bioenergy, etc.) and launching of alternative energy generation has weakened the role of coal as a source of energy. The Paris Agreement and assumption of obligations by many nations to orderly reduce CO₂ emissions by means of technological modernization and climate change adaptation has abridged coal demand yet more. This paper aims to assess current resilience of the coal industry to stress and to define prospects for coal production optimization using high technologies pursuant to global challenges and requirements of energy transition. Our research is based on the resilience concept adapted to the coal industry. It is proposed to divide the coal sector into segments depending on the prevailing value chains (VC). Four representative models of VC are identified in the coal sector. The most promising lines of upgrading VC in the coal industry include: •Elongation of VC owing to introduction of clean technologies of coal conversion and utilization; •Creation of parallel VC by means of waste management; •Branching of VC (conversion of a company’s VC into a production network). The upgrade effectiveness is governed in many ways by applicability of advanced coal processing technologies, usability of waste, expandability of production, entrance to non-rival markets and localization of new segments of VC in receiving regions. It is also important that upgrade of VC by means of formation of agile high-tech inter-industry production networks within the framework of operating surface and underground mines can reduce social, economic and ecological risks associated with closure of coal mines. Such promising route of VC upgrade is application of methanotrophic bacteria to produce protein to be used as feed-stuff in fish, poultry and cattle breeding, or in production of ferments, lipoids, sterols, antioxidants, pigments and polysaccharides. Closed mines can use recovered methane as a clean energy source. There exist methods of methane utilization from uncontrollable sources, including preliminary treatment and recovery of methane from air-and-methane mixture, or decomposition of methane to hydrogen and acetylene. Separated hydrogen is used in hydrogen fuel cells to generate power to feed the process of methane utilization and to supply external consumers. Despite the recent paradigm of carbon-free energy generation, it is possible to preserve the coal mining industry using the differentiated approach to upgrade of value chains based on flexible technologies with regard to specificity of mining companies.

Keywords: resilience, resilience concept, resilience indicator, resilience in the Russian coal industry, value chains

Procedia PDF Downloads 107
476 Using Different Methods of Nanofabrication as a New Way to Activate Cement Replacement Materials in Concrete Industry

Authors: Azadeh Askarinejad, Parham Hayati, Reza Parchami, Parisa Hayati

Abstract:

One of the most important industries and building operations causing carbon dioxide emission is the cement and concrete related industries so that cement production (including direct fuel for mining and transporting raw material) consumes approximately 6 million Btus per metric-ton, and releases about 1 metric-ton of CO2. Reducing the consumption of cement with simultaneous utilizing waste materials as cement replacement is preferred for reasons of environmental protection. Blended cements consist of different supplementary cementitious materials (SCM), such as fly ash, silica fume, Ground Granulated Blast Furnace Slag (GGBFS), limestone, natural pozzolans, etc. these materials should be chemically activated to show effective cementitious properties. The present review article reports three different methods of nanofabrication that were used for activation of two types of SCMs.

Keywords: nanofabrication, cement replacement materials, activation, concrete

Procedia PDF Downloads 613
475 Slope Stability Assessment in Metasedimentary Deposit of an Opencast Mine: The Case of the Dikuluwe-Mashamba (DIMA) Mine in the DR Congo

Authors: Dina Kon Mushid, Sage Ngoie, Tshimbalanga Madiba, Kabutakapua Kakanda

Abstract:

Slope stability assessment is still the biggest challenge in mining activities and civil engineering structures. The slope in an opencast mine frequently reaches multiple weak layers that lead to the instability of the pit. Faults and soft layers throughout the rock would increase weathering and erosion rates. Therefore, it is essential to investigate the stability of the complex strata to figure out how stable they are. In the Dikuluwe-Mashamba (DIMA) area, the lithology of the stratum is a set of metamorphic rocks whose parent rocks are sedimentary rocks with a low degree of metamorphism. Thus, due to the composition and metamorphism of the parent rock, the rock formation is different in hardness and softness, which means that when the content of dolomitic and siliceous is high, the rock is hard. It is softer when the content of argillaceous and sandy is high. Therefore, from the vertical direction, it appears as a weak and hard layer, and from the horizontal direction, it seems like a smooth and hard layer in the same rock layer. From the structural point of view, the main structures in the mining area are the Dikuluwe dipping syncline and the Mashamba dipping anticline, and the occurrence of rock formations varies greatly. During the folding process of the rock formation, the stress will concentrate on the soft layer, causing the weak layer to be broken. At the same time, the phenomenon of interlayer dislocation occurs. This article aimed to evaluate the stability of metasedimentary rocks of the Dikuluwe-Mashamba (DIMA) open-pit mine using limit equilibrium and stereographic methods Based on the presence of statistical structural planes, the stereographic projection was used to study the slope's stability and examine the discontinuity orientation data to identify failure zones along the mine. The results revealed that the slope angle is too steep, and it is easy to induce landslides. The numerical method's sensitivity analysis showed that the slope angle and groundwater significantly impact the slope safety factor. The increase in the groundwater level substantially reduces the stability of the slope. Among the factors affecting the variation in the rate of the safety factor, the bulk density of soil is greater than that of rock mass, the cohesion of soil mass is smaller than that of rock mass, and the friction angle in the rock mass is much larger than that in the soil mass. The analysis showed that the rock mass structure types are mostly scattered and fragmented; the stratum changes considerably, and the variation of rock and soil mechanics parameters is significant.

Keywords: slope stability, weak layer, safety factor, limit equilibrium method, stereography method

Procedia PDF Downloads 262
474 Models of State Organization and Influence over Collective Identity and Nationalism in Spain

Authors: Muñoz-Sanchez, Victor Manuel, Perez-Flores, Antonio Manuel

Abstract:

The main objective of this paper is to establish the relationship between models of state organization and the various types of collective identity expressed by the Spanish. The question of nationalism and identity ascription in Spain has always been a topic of special importance due to the presence in that country of territories where the population emits very different opinions of nationalist sentiment than the rest of Spain. The current situation of sovereignty challenge of Catalonia to the central government exemplifies the importance of the subject matter. In order to analyze this process of interrelation, we use a secondary data mining by applying the multiple correspondence analysis technique (MCA). As a main result a typology of four types of expression of collective identity based on models of State organization are shown, which are connected with the party position on this issue.

Keywords: models of organization of the state, nationalism, collective identity, Spain, political parties

Procedia PDF Downloads 443
473 Ibrutinib and the Potential Risk of Cardiac Failure: A Review of Pharmacovigilance Data

Authors: Abdulaziz Alakeel, Roaa Alamri, Abdulrahman Alomair, Mohammed Fouda

Abstract:

Introduction: Ibrutinib is a selective, potent, and irreversible small-molecule inhibitor of Bruton's tyrosine kinase (BTK). It forms a covalent bond with a cysteine residue (CYS-481) at the active site of Btk, leading to inhibition of Btk enzymatic activity. The drug is indicated to treat certain type of cancers such as mantle cell lymphoma (MCL), chronic lymphocytic leukaemia and Waldenström's macroglobulinaemia (WM). Cardiac failure is a condition referred to inability of heart muscle to pump adequate blood to human body organs. There are multiple types of cardiac failure including left and right-sided heart failure, systolic and diastolic heart failures. The aim of this review is to evaluate the risk of cardiac failure associated with the use of ibrutinib and to suggest regulatory recommendations if required. Methodology: Signal Detection team at the National Pharmacovigilance Center (NPC) of Saudi Food and Drug Authority (SFDA) performed a comprehensive signal review using its national database as well as the World Health Organization (WHO) database (VigiBase), to retrieve related information for assessing the causality between cardiac failure and ibrutinib. We used the WHO- Uppsala Monitoring Centre (UMC) criteria as standard for assessing the causality of the reported cases. Results: Case Review: The number of resulted cases for the combined drug/adverse drug reaction are 212 global ICSRs as of July 2020. The reviewers have selected and assessed the causality for the well-documented ICSRs with completeness scores of 0.9 and above (35 ICSRs); the value 1.0 presents the highest score for best-written ICSRs. Among the reviewed cases, more than half of them provides supportive association (four probable and 15 possible cases). Data Mining: The disproportionality of the observed and the expected reporting rate for drug/adverse drug reaction pair is estimated using information component (IC), a tool developed by WHO-UMC to measure the reporting ratio. Positive IC reflects higher statistical association while negative values indicates less statistical association, considering the null value equal to zero. The results of (IC=1.5) revealed a positive statistical association for the drug/ADR combination, which means “Ibrutinib” with “Cardiac Failure” have been observed more than expected when compared to other medications available in WHO database. Conclusion: Health regulators and health care professionals must be aware for the potential risk of cardiac failure associated with ibrutinib and the monitoring of any signs or symptoms in treated patients is essential. The weighted cumulative evidences identified from causality assessment of the reported cases and data mining are sufficient to support a causal association between ibrutinib and cardiac failure.

Keywords: cardiac failure, drug safety, ibrutinib, pharmacovigilance, signal detection

Procedia PDF Downloads 129
472 Twitter's Impact on Print Media with Respect to Real World Events

Authors: Basit Shahzad, Abdullatif M. Abdullatif

Abstract:

Recent advancements in Information and Communication Technologies (ICT) and easy access to Internet have made social media the first choice for information sharing related to any important events or news. On Twitter, trend is a common feature that quantifies the level of popularity of a certain news or event. In this work, we examine the impact of Twitter trends on real world events by hypothesizing that Twitter trends have an influence on print media in Pakistan. For this, Twitter is used as a platform and Twitter trends as a base line. We first collect data from two sources (Twitter trends and print media) in the period May to August 2016. Obtained data from two sources is analyzed and it is observed that social media is significantly influencing the print media and majority of the news printed in newspaper are posted on Twitter earlier.

Keywords: twitter trends, text mining, effectiveness of trends, print media

Procedia PDF Downloads 258
471 A Study on the Nostalgia Contents Analysis of Hometown Alumni in the Online Community

Authors: Heejin Yun, Juanjuan Zang

Abstract:

This study aims to analyze the text terms posted on an online community of people from the same hometown and to understand the topic and trend of nostalgia composed online. For this purpose, this study collected 144 writings which the natives of Yeongjong Island, Incheon, South-Korea have posted on an online community. And it analyzed association relations. As a result, online community texts means that just defining nostalgia as ‘a mind longing for hometown’ is not an enough explanation. Second, texts composed online have abstractness rather than persons’ individual stories. This study figured out the relationship that had the most critical and closest mutual association among the terms that constituted nostalgia through literature research and association rule concerning nostalgia. The result of this study has a characteristic that it summed up the core terms and emotions related to nostalgia.

Keywords: nostalgia, cultural memory, data mining, association rule

Procedia PDF Downloads 229
470 Fractional, Component and Morphological Composition of Ambient Air Dust in the Areas of Mining Industry

Authors: S.V. Kleyn, S.Yu. Zagorodnov, А.А. Kokoulina

Abstract:

Technogenic emissions of the mining and processing complex are characterized by a high content of chemical components and solid dust particles. However, each industrial enterprise and the surrounding area have features that require refinement and parameterization. Numerous studies have shown the negative impact of fine dust PM10 and PM2.5 on the health, as well as the possibility of toxic components absorption, including heavy metals by dust particles. The target of the study was the quantitative assessment of the fractional and particle size composition of ambient air dust in the area of impact by primary magnesium production complex. Also, we tried to describe the morphology features of dust particles. Study methods. To identify the dust emission sources, the analysis of the production process has been carried out. The particulate composition of the emissions was measured using laser particle analyzer Microtrac S3500 (covered range of particle size is 20 nm to 2000 km). Particle morphology and the component composition were established by electron microscopy by scanning microscope of high resolution (magnification rate - 5 to 300 000 times) with X-ray fluorescence device S3400N ‘HITACHI’. The chemical composition was identified by X-ray analysis of the samples using an X-ray diffractometer XRD-700 ‘Shimadzu’. Determination of the dust pollution level was carried out using model calculations of emissions in the atmosphere dispersion. The calculations were verified by instrumental studies. Results of the study. The results demonstrated that the dust emissions of different technical processes are heterogeneous and fractional structure is complicated. The percentage of particle sizes up to 2.5 micrometres inclusive was ranged from 0.00 to 56.70%; particle sizes less than 10 microns inclusive – 0.00 - 85.60%; particle sizes greater than 10 microns - 14.40% -100.00%. During microscopy, the presence of nanoscale size particles has been detected. Studied dust particles are round, irregular, cubic and integral shapes. The composition of the dust includes magnesium, sodium, potassium, calcium, iron, chlorine. On the base of obtained results, it was performed the model calculations of dust emissions dispersion and establishment of the areas of fine dust РМ 10 and РМ 2.5 distribution. It was found that the dust emissions of fine powder fractions PM10 and PM2.5 are dispersed over large distances and beyond the border of the industrial site of the enterprise. The population living near the enterprise is exposed to the risk of diseases associated with dust exposure. Data are transferred to the economic entity to make decisions on the measures to minimize the risks. Exposure and risks indicators on the health are used to provide named patient health and preventive care to the citizens living in the area of negative impact of the facility.

Keywords: dust emissions, еxposure assessment, PM 10, PM 2.5

Procedia PDF Downloads 261
469 Analysis and Forecasting of Bitcoin Price Using Exogenous Data

Authors: J-C. Leneveu, A. Chereau, L. Mansart, T. Mesbah, M. Wyka

Abstract:

Extracting and interpreting information from Big Data represent a stake for years to come in several sectors such as finance. Currently, numerous methods are used (such as Technical Analysis) to try to understand and to anticipate market behavior, with mixed results because it still seems impossible to exactly predict a financial trend. The increase of available data on Internet and their diversity represent a great opportunity for the financial world. Indeed, it is possible, along with these standard financial data, to focus on exogenous data to take into account more macroeconomic factors. Coupling the interpretation of these data with standard methods could allow obtaining more precise trend predictions. In this paper, in order to observe the influence of exogenous data price independent of other usual effects occurring in classical markets, behaviors of Bitcoin users are introduced in a model reconstituting Bitcoin value, which is elaborated and tested for prediction purposes.

Keywords: big data, bitcoin, data mining, social network, financial trends, exogenous data, global economy, behavioral finance

Procedia PDF Downloads 355
468 Uplift Modeling Approach to Optimizing Content Quality in Social Q/A Platforms

Authors: Igor A. Podgorny

Abstract:

TurboTax AnswerXchange is a social Q/A system supporting users working on federal and state tax returns. Content quality and popularity in the AnswerXchange can be predicted with propensity models using attributes of the question and answer. Using uplift modeling, we identify features of questions and answers that can be modified during the question-asking and question-answering experience in order to optimize the AnswerXchange content quality. We demonstrate that adding details to the questions always results in increased question popularity that can be used to promote good quality content. Responding to close-ended questions assertively improve content quality in the AnswerXchange in 90% of cases. Answering knowledge questions with web links increases the likelihood of receiving a negative vote from 60% of the askers. Our findings provide a rationale for employing the uplift modeling approach for AnswerXchange operations.

Keywords: customer relationship management, human-machine interaction, text mining, uplift modeling

Procedia PDF Downloads 244
467 Study of the Landslide and Stability of Open Pit Quarry: Case of Open Pite Quarry of Chouf Amar M'sila, Algeria

Authors: Saadoun Abd Errazak, Hafssaoui Abdallah, Fredj Mohamed

Abstract:

Mining operations open induce risks of instability that can cause landslides and collapse at the bleachers slope. These risks may occur both during and after the operation phase. The magnitude of these risks depends on the mechanical and physical characteristics of the rock mass, the geometrical dimensions of ore bodies, their spatial arrangement, and the state of the operated area. If security and technology measures are not taken into account for this purpose, the environment will be affected. The main objective of this work is to assess these risks by analytical and numerical methods. The study is based on the geological, hydrogeological and geotechnical rock mass of the open pit quarry of Chouf Amar M'sila. The results obtained have allowed us to obtain an acceptable factor of safety and stability study of the open pit.

Keywords: stability, land sliding, numerical modeling, safety factor, open-pit quarry

Procedia PDF Downloads 374
466 The Problem of the Use of Learning Analytics in Distance Higher Education: An Analytical Study of the Open and Distance University System in Mexico

Authors: Ismene Ithai Bras-Ruiz

Abstract:

Learning Analytics (LA) is employed by universities not only as a tool but as a specialized ground to enhance students and professors. However, not all the academic programs apply LA with the same goal and use the same tools. In fact, LA is formed by five main fields of study (academic analytics, action research, educational data mining, recommender systems, and personalized systems). These fields can help not just to inform academic authorities about the situation of the program, but also can detect risk students, professors with needs, or general problems. The highest level applies Artificial Intelligence techniques to support learning practices. LA has adopted different techniques: statistics, ethnography, data visualization, machine learning, natural language process, and data mining. Is expected that any academic program decided what field wants to utilize on the basis of his academic interest but also his capacities related to professors, administrators, systems, logistics, data analyst, and the academic goals. The Open and Distance University System (SUAYED in Spanish) of the University National Autonomous of Mexico (UNAM), has been working for forty years as an alternative to traditional programs; one of their main supports has been the employ of new information and communications technologies (ICT). Today, UNAM has one of the largest network higher education programs, twenty-six academic programs in different faculties. This situation means that every faculty works with heterogeneous populations and academic problems. In this sense, every program has developed its own Learning Analytic techniques to improve academic issues. In this context, an investigation was carried out to know the situation of the application of LA in all the academic programs in the different faculties. The premise of the study it was that not all the faculties have utilized advanced LA techniques and it is probable that they do not know what field of study is closer to their program goals. In consequence, not all the programs know about LA but, this does not mean they do not work with LA in a veiled or, less clear sense. It is very important to know the grade of knowledge about LA for two reasons: 1) This allows to appreciate the work of the administration to improve the quality of the teaching and, 2) if it is possible to improve others LA techniques. For this purpose, it was designed three instruments to determinate the experience and knowledge in LA. These were applied to ten faculty coordinators and his personnel; thirty members were consulted (academic secretary, systems manager, or data analyst, and coordinator of the program). The final report allowed to understand that almost all the programs work with basic statistics tools and techniques, this helps the administration only to know what is happening inside de academic program, but they are not ready to move up to the next level, this means applying Artificial Intelligence or Recommender Systems to reach a personalized learning system. This situation is not related to the knowledge of LA, but the clarity of the long-term goals.

Keywords: academic improvements, analytical techniques, learning analytics, personnel expertise

Procedia PDF Downloads 128
465 Hybrid Hierarchical Clustering Approach for Community Detection in Social Network

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

Social Networks generally present a hierarchy of communities. To determine these communities and the relationship between them, detection algorithms should be applied. Most of the existing algorithms, proposed for hierarchical communities identification, are based on either agglomerative clustering or divisive clustering. In this paper, we present a hybrid hierarchical clustering approach for community detection based on both bottom-up and bottom-down clustering. Obviously, our approach provides more relevant community structure than hierarchical method which considers only divisive or agglomerative clustering to identify communities. Moreover, we performed some comparative experiments to enhance the quality of the clustering results and to show the effectiveness of our algorithm.

Keywords: agglomerative hierarchical clustering, community structure, divisive hierarchical clustering, hybrid hierarchical clustering, opinion mining, social network, social network analysis

Procedia PDF Downloads 365
464 Treatment of Acid Mine Drainage with Metallurgical Slag

Authors: Sukla Saha, Alok Sinha

Abstract:

Acid mine drainage (AMD) refers to the production of acidified water from abandoned mines and active mines as well. The reason behind the generation of this kind of acidified water is the oxidation of pyrites present in the rocks in and around mining areas. Thiobacillus ferrooxidans, which is a sulfur oxidizing bacteria, helps in the oxidation process. AMD is extremely acidic in nature, (pH 2-3) with high concentration of several trace and heavy metals such as Fe, Al, Zn, Mn, Cu and Co and anions such as chloride and sulfate. AMD has several detrimental effect on aquatic organism and environment. It can directly or indirectly contaminate the ground water and surface water as well. The present study considered the treatment of AMD with metallurgical slag, which is a waste material. Slag helped to enhance the pH of AMD to 8.62 from 1.5 with 99% removal of trace metals such as Fe, Al, Mn, Cu and Co. Metallurgical slag was proven as efficient neutralizing material for the treatment of AMD.

Keywords: acid mine drainage, Heavy metals, metallurgical slag, Neutralization

Procedia PDF Downloads 187
463 Spectroscopic Autoradiography of Alpha Particles on Geologic Samples at the Thin Section Scale Using a Parallel Ionization Multiplier Gaseous Detector

Authors: Hugo Lefeuvre, Jerôme Donnard, Michael Descostes, Sophie Billon, Samuel Duval, Tugdual Oger, Herve Toubon, Paul Sardini

Abstract:

Spectroscopic autoradiography is a method of interest for geological sample analysis. Indeed, researchers may face different issues such as radioelement identification and quantification in the field of environmental studies. Imaging gaseous ionization detectors find their place in geosciences for conducting specific measurements of radioactivity to improve the monitoring of natural processes using naturally-occurring radioactive tracers, but also for the nuclear industry linked to the mining sector. In geological samples, the location and identification of the radioactive-bearing minerals at the thin-section scale remains a major challenge as the detection limit of the usual elementary microprobe techniques is far higher than the concentration of most of the natural radioactive decay products. The spatial distribution of each decay product in the case of uranium in a geomaterial is interesting for relating radionuclides concentration to the mineralogy. The present study aims to provide spectroscopic autoradiography analysis method for measuring the initial energy of alpha particles with a parallel ionization multiplier gaseous detector. The analysis method has been developed thanks to Geant4 modelling of the detector. The track of alpha particles recorded in the gas detector allow the simultaneous measurement of the initial point of emission and the reconstruction of the initial particle energy by a selection based on the linear energy distribution. This spectroscopic autoradiography method was successfully used to reproduce the alpha spectra from a 238U decay chain on a geological sample at the thin-section scale. The characteristics of this measurement are an energy spectrum resolution of 17.2% (FWHM) at 4647 keV and a spatial resolution of at least 50 µm. Even if the efficiency of energy spectrum reconstruction is low (4.4%) compared to the efficiency of a simple autoradiograph (50%), this novel measurement approach offers the opportunity to select areas on an autoradiograph to perform an energy spectrum analysis within that area. This opens up possibilities for the detailed analysis of heterogeneous geological samples containing natural alpha emitters such as uranium-238 and radium-226. This measurement will allow the study of the spatial distribution of uranium and its descendants in geo-materials by coupling scanning electron microscope characterizations. The direct application of this dual modality (energy-position) of analysis will be the subject of future developments. The measurement of the radioactive equilibrium state of heterogeneous geological structures, and the quantitative mapping of 226Ra radioactivity are now being actively studied.

Keywords: alpha spectroscopy, digital autoradiography, mining activities, natural decay products

Procedia PDF Downloads 151
462 Mean Monthly Rainfall Prediction at Benina Station Using Artificial Neural Networks

Authors: Hasan G. Elmazoghi, Aisha I. Alzayani, Lubna S. Bentaher

Abstract:

Rainfall is a highly non-linear phenomena, which requires application of powerful supervised data mining techniques for its accurate prediction. In this study the Artificial Neural Network (ANN) technique is used to predict the mean monthly historical rainfall data collected from BENINA station in Benghazi for 31 years, the period of “1977-2006” and the results are compared against the observed values. The specific objective to achieve this goal was to determine the best combination of weather variables to be used as inputs for the ANN model. Several statistical parameters were calculated and an uncertainty analysis for the results is also presented. The best ANN model is then applied to the data of one year (2007) as a case study in order to evaluate the performance of the model. Simulation results reveal that application of ANN technique is promising and can provide reliable estimates of rainfall.

Keywords: neural networks, rainfall, prediction, climatic variables

Procedia PDF Downloads 488
461 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 584
460 Learning Grammars for Detection of Disaster-Related Micro Events

Authors: Josef Steinberger, Vanni Zavarella, Hristo Tanev

Abstract:

Natural disasters cause tens of thousands of victims and massive material damages. We refer to all those events caused by natural disasters, such as damage on people, infrastructure, vehicles, services and resource supply, as micro events. This paper addresses the problem of micro - event detection in online media sources. We present a natural language grammar learning algorithm and apply it to online news. The algorithm in question is based on distributional clustering and detection of word collocations. We also explore the extraction of micro-events from social media and describe a Twitter mining robot, who uses combinations of keywords to detect tweets which talk about effects of disasters.

Keywords: online news, natural language processing, machine learning, event extraction, crisis computing, disaster effects, Twitter

Procedia PDF Downloads 478
459 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception

Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu

Abstract:

Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.

Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish

Procedia PDF Downloads 146
458 Cardiovascular Disease Prediction Using Machine Learning Approaches

Authors: P. Halder, A. Zaman

Abstract:

It is estimated that heart disease accounts for one in ten deaths worldwide. United States deaths due to heart disease are among the leading causes of death according to the World Health Organization. Cardiovascular diseases (CVDs) account for one in four U.S. deaths, according to the Centers for Disease Control and Prevention (CDC). According to statistics, women are more likely than men to die from heart disease as a result of strokes. A 50% increase in men's mortality was reported by the World Health Organization in 2009. The consequences of cardiovascular disease are severe. The causes of heart disease include diabetes, high blood pressure, high cholesterol, abnormal pulse rates, etc. Machine learning (ML) can be used to make predictions and decisions in the healthcare industry. Thus, scientists have turned to modern technologies like Machine Learning and Data Mining to predict diseases. The disease prediction is based on four algorithms. Compared to other boosts, the Ada boost is much more accurate.

Keywords: heart disease, cardiovascular disease, coronary artery disease, feature selection, random forest, AdaBoost, SVM, decision tree

Procedia PDF Downloads 153
457 Value Analysis of Islamic Banking and Conventional Banking to Measure Value Co-Creation

Authors: Amna Javed, Hisashi Masuda, Youji Kohda

Abstract:

This study examines the value analysis in Islamic and conventional banking services in Pakistan. Many scholars have focused on co-creation of values in services but mainly economic values not non-economic. As Islamic banking is based on Islamic principles that are more concerned with non-economic values (well-being, partnership, fairness, trust worthy, and justice) than economic values as money in terms of interest. This study is important to know the providers point of view about the co-created values, because, it may be more sustainable and appropriate for today’s unpredictable socioeconomic environment. Data were collected from 4 banks (2 Islamic and 2 conventional banks). Text mining technique is applied for data analysis, and values with 100% occurrences in Islamic banking are chosen. The results reflect that Islamic banking is more centric towards non-economic values than economic values and it promotes team work and partnership concept by applying Islamic spirit and trust worthiness concept.

Keywords: economic values, Islamic banking, non-economic values, value system

Procedia PDF Downloads 462
456 Event Extraction, Analysis, and Event Linking

Authors: Anam Alam, Rahim Jamaluddin Kanji

Abstract:

With the rapid growth of event in everywhere, event extraction has now become an important matter to retrieve the information from the unstructured data. One of the challenging problems is to extract the event from it. An event is an observable occurrence of interaction among entities. The paper investigates the effectiveness of event extraction capabilities of three software tools that are Wandora, Nitro and SPSS. We performed standard text mining techniques of these tools on the data sets of (i) Afghan War Diaries (AWD collection), (ii) MUC4 and (iii) WebKB. Information retrieval measures such as precision and recall which are computed under extensive set of experiments for Event Extraction. The experimental study analyzes the difference between events extracted by the software and human. This approach helps to construct an algorithm that will be applied for different machine learning methods.

Keywords: event extraction, Wandora, nitro, SPSS, event analysis, extraction method, AFG, Afghan War Diaries, MUC4, 4 universities, dataset, algorithm, precision, recall, evaluation

Procedia PDF Downloads 595
455 Using Multi-Arm Bandits to Optimize Game Play Metrics and Effective Game Design

Authors: Kenny Raharjo, Ramon Lawrence

Abstract:

Game designers have the challenging task of building games that engage players to spend their time and money on the game. There are an infinite number of game variations and design choices, and it is hard to systematically determine game design choices that will have positive experiences for players. In this work, we demonstrate how multi-arm bandits can be used to automatically explore game design variations to achieve improved player metrics. The advantage of multi-arm bandits is that they allow for continuous experimentation and variation, intrinsically converge to the best solution, and require no special infrastructure to use beyond allowing minor game variations to be deployed to users for evaluation. A user study confirms that applying multi-arm bandits was successful in determining the preferred game variation with highest play time metrics and can be a useful technique in a game designer's toolkit.

Keywords: game design, multi-arm bandit, design exploration and data mining, player metric optimization and analytics

Procedia PDF Downloads 509