Search results for: spatial temporal data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27191

Search results for: spatial temporal data mining

26951 Impact of Stack Caches: Locality Awareness and Cost Effectiveness

Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang

Abstract:

Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.

Keywords: hit rate, locality of program, stack cache, stack data

Procedia PDF Downloads 302
26950 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)

Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim

Abstract:

This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.

Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm

Procedia PDF Downloads 401
26949 Spatiotemporal Analysis of Land Surface Temperature and Urban Heat Island Evaluation of Four Metropolitan Areas of Texas, USA

Authors: Chunhong Zhao

Abstract:

Remotely sensed land surface temperature (LST) is vital to understand the land-atmosphere energy balance, hydrological cycle, and thus is widely used to describe the urban heat island (UHI) phenomenon. However, due to technical constraints, satellite thermal sensors are unable to provide LST measurement with both high spatial and high temporal resolution. Despite different downscaling techniques and algorithms to generate high spatiotemporal resolution LST. Four major metropolitan areas in Texas, USA: Dallas-Fort Worth, Houston, San Antonio, and Austin all demonstrate UHI effects. Different cities are expected to have varying SUHI effect during the urban development trajectory. With the help of the Landsat, ASTER, and MODIS archives, this study focuses on the spatial patterns of UHIs and the seasonal and annual variation of these metropolitan areas. With Gaussian model, and Local Indicators of Spatial Autocorrelations (LISA), as well as data fusion methods, this study identifies the hotspots and the trajectory of the UHI phenomenon of the four cities. By making comparison analysis, the result can help to alleviate the advent effect of UHI and formulate rational urban planning in the long run.

Keywords: spatiotemporal analysis, land surface temperature, urban heat island evaluation, metropolitan areas of Texas, USA

Procedia PDF Downloads 416
26948 Analysis of Temporal Factors Influencing Minimum Dwell Time Distributions

Authors: T. Pedersen, A. Lindfeldt

Abstract:

The minimum dwell time is an important part of railway timetable planning. Due to its stochastic behaviour, the minimum dwell time should be considered to create resilient timetables. While there has been significant focus on how to determine and estimate dwell times, to our knowledge, little research has been carried out regarding temporal and running direction variations of these. In this paper, we examine how the minimum dwell time varies depending on temporal factors such as the time of day, day of the week and time of the year. We also examine how it is affected by running direction and station type. The minimum dwell time is estimated by means of track occupation data. A method is proposed to ensure that only minimum dwell times and not planned dwell times are acquired from the track occupation data. The results show that on an aggregated level, the average minimum dwell times in both running directions at a station are similar. However, when temporal factors are considered, there are significant variations. The minimum dwell time varies throughout the day with peak hours having the longest dwell times. It is also found that the minimum dwell times are influenced by weekday, and in particular, weekends are found to have lower minimum dwell times than most other days. The findings show that there is a potential to significantly improve timetable planning by taking minimum dwell time variations into account.

Keywords: minimum dwell time, operations quality, timetable planning, track occupation data

Procedia PDF Downloads 197
26947 Exploring the Correlation between Population Distribution and Urban Heat Island under Urban Data: Taking Shenzhen Urban Heat Island as an Example

Authors: Wang Yang

Abstract:

Shenzhen is a modern city of China's reform and opening-up policy, the development of urban morphology has been established on the administration of the Chinese government. This city`s planning paradigm is primarily affected by the spatial structure and human behavior. The subjective urban agglomeration center is divided into several groups and centers. In comparisons of this effect, the city development law has better to be neglected. With the continuous development of the internet, extensive data technology has been introduced in China. Data mining and data analysis has become important tools in municipal research. Data mining has been utilized to improve data cleaning such as receiving business data, traffic data and population data. Prior to data mining, government data were collected by traditional means, then were analyzed using city-relationship research, delaying the timeliness of urban development, especially for the contemporary city. Data update speed is very fast and based on the Internet. The city's point of interest (POI) in the excavation serves as data source affecting the city design, while satellite remote sensing is used as a reference object, city analysis is conducted in both directions, the administrative paradigm of government is broken and urban research is restored. Therefore, the use of data mining in urban analysis is very important. The satellite remote sensing data of the Shenzhen city in July 2018 were measured by the satellite Modis sensor and can be utilized to perform land surface temperature inversion, and analyze city heat island distribution of Shenzhen. This article acquired and classified the data from Shenzhen by using Data crawler technology. Data of Shenzhen heat island and interest points were simulated and analyzed in the GIS platform to discover the main features of functional equivalent distribution influence. Shenzhen is located in the east-west area of China. The city’s main streets are also determined according to the direction of city development. Therefore, it is determined that the functional area of the city is also distributed in the east-west direction. The urban heat island can express the heat map according to the functional urban area. Regional POI has correspondence. The research result clearly explains that the distribution of the urban heat island and the distribution of urban POIs are one-to-one correspondence. Urban heat island is primarily influenced by the properties of the underlying surface, avoiding the impact of urban climate. Using urban POIs as analysis object, the distribution of municipal POIs and population aggregation are closely connected, so that the distribution of the population corresponded with the distribution of the urban heat island.

Keywords: POI, satellite remote sensing, the population distribution, urban heat island thermal map

Procedia PDF Downloads 103
26946 Topic Prominence and Temporal Encoding in Mandarin Chinese

Authors: Tzu-I Chiang

Abstract:

A central question for finite-nonfinite distinction in Mandarin Chinese is how does Mandarin encode temporal information without the grammatical contrast between past and present tense. Moreover, how do L2 learners of Mandarin whose native language is English and whose L1 system has tense morphology, acquire the temporal encoding system in L2 Mandarin? The current study reports preliminary findings on the relationship between topic prominence and the temporal encoding in L1 and L2 Chinese. Oral narratives data from 30 natives and learners of Mandarin Chinese were collected via a film-retell task. In terms of coding, predicates collected from the narratives were transcribed and then coded based on four major verb types: n-degree Statives (quality-STA), point-scale Statives (status-STA), n-atom EVENT (ACT), and point EVENT (resultative-ACT). How native speakers and non-native speakers started retelling the story was calculated. Results of the study show that native speakers of Chinese tend to express Topic Time (TT) syntactically at the topic position; whereas L2 learners of Chinese across levels rely mainly on the default time encoded in the event types. Moreover, as the proficiency level of the learner increases, learners’ appropriate use of the event predicates increased, which supports the argument that L2 development of temporal encoding is affected by lexical aspect.

Keywords: topic prominence, temporal encoding, lexical aspect, L2 acquisition

Procedia PDF Downloads 201
26945 Secure Multiparty Computations for Privacy Preserving Classifiers

Authors: M. Sumana, K. S. Hareesha

Abstract:

Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.

Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data

Procedia PDF Downloads 410
26944 Wireless Sensor Network to Help Low Incomes Farmers to Face Drought Impacts

Authors: Fantazi Walid, Ezzedine Tahar, Bargaoui Zoubeida

Abstract:

This research presents the main ideas to implement an intelligent system composed by communicating wireless sensors measuring environmental data linked to drought indicators (such as air temperature, soil moisture , etc...). On the other hand, the setting up of a spatio temporal database communicating with a Web mapping application for a monitoring in real time in activity 24:00 /day, 7 days/week is proposed to allow the screening of the drought parameters time evolution and their extraction. Thus this system helps detecting surfaces touched by the phenomenon of drought. Spatio-temporal conceptual models seek to answer the users who need to manage soil water content for irrigating or fertilizing or other activities pursuing crop yield augmentation. Effectively, spatio-temporal conceptual models enable users to obtain a diagram of readable and easy data to apprehend. Based on socio-economic information, it helps identifying people impacted by the phenomena with the corresponding severity especially that this information is accessible by farmers and stakeholders themselves. The study will be applied in Siliana watershed Northern Tunisia.

Keywords: WSN, database spatio-temporal, GIS, web mapping, indicator of drought

Procedia PDF Downloads 493
26943 A Temporal QoS Ontology For ERTMS/ETCS

Authors: Marc Sango, Olimpia Hoinaru, Christophe Gransart, Laurence Duchien

Abstract:

Ontologies offer a means for representing and sharing information in many domains, particularly in complex domains. For example, it can be used for representing and sharing information of System Requirement Specification (SRS) of complex systems like the SRS of ERTMS/ETCS written in natural language. Since this system is a real-time and critical system, generic ontologies, such as OWL and generic ERTMS ontologies provide minimal support for modeling temporal information omnipresent in these SRS documents. To support the modeling of temporal information, one of the challenges is to enable representation of dynamic features evolving in time within a generic ontology with a minimal redesign of it. The separation of temporal information from other information can help to predict system runtime operation and to properly design and implement them. In addition, it is helpful to provide a reasoning and querying techniques to reason and query temporal information represented in the ontology in order to detect potential temporal inconsistencies. Indeed, a user operation, such as adding a new constraint on existing planning constraints can cause temporal inconsistencies, which can lead to system failures. To address this challenge, we propose a lightweight 3-layer temporal Quality of Service (QoS) ontology for representing, reasoning and querying over temporal and non-temporal information in a complex domain ontology. Representing QoS entities in separated layers can clarify the distinction between the non QoS entities and the QoS entities in an ontology. The upper generic layer of the proposed ontology provides an intuitive knowledge of domain components, specially ERTMS/ETCS components. The separation of the intermediate QoS layer from the lower QoS layer allows us to focus on specific QoS Characteristics, such as temporal or integrity characteristics. In this paper, we focus on temporal information that can be used to predict system runtime operation. To evaluate our approach, an example of the proposed domain ontology for handover operation, as well as a reasoning rule over temporal relations in this domain-specific ontology, are given.

Keywords: system requirement specification, ERTMS/ETCS, temporal ontologies, domain ontologies

Procedia PDF Downloads 420
26942 Performance Evaluation of Production Schedules Based on Process Mining

Authors: Kwan Hee Han

Abstract:

External environment of enterprise is rapidly changing majorly by global competition, cost reduction pressures, and new technology. In these situations, production scheduling function plays a critical role to meet customer requirements and to attain the goal of operational efficiency. It deals with short-term decision making in the production process of the whole supply chain. The major task of production scheduling is to seek a balance between customer orders and limited resources. In manufacturing companies, this task is so difficult because it should efficiently utilize resource capacity under the careful consideration of many interacting constraints. At present, many computerized software solutions have been utilized in many enterprises to generate a realistic production schedule to overcome the complexity of schedule generation. However, most production scheduling systems do not provide sufficient information about the validity of the generated schedule except limited statistics. Process mining only recently emerged as a sub-discipline of both data mining and business process management. Process mining techniques enable the useful analysis of a wide variety of processes such as process discovery, conformance checking, and bottleneck analysis. In this study, the performance of generated production schedule is evaluated by mining event log data of production scheduling software system by using the process mining techniques since every software system generates event logs for the further use such as security investigation, auditing and error bugging. An application of process mining approach is proposed for the validation of the goodness of production schedule generated by scheduling software systems in this study. By using process mining techniques, major evaluation criteria such as utilization of workstation, existence of bottleneck workstations, critical process route patterns, and work load balance of each machine over time are measured, and finally, the goodness of production schedule is evaluated. By using the proposed process mining approach for evaluating the performance of generated production schedule, the quality of production schedule of manufacturing enterprises can be improved.

Keywords: data mining, event log, process mining, production scheduling

Procedia PDF Downloads 279
26941 Decision Support System in Air Pollution Using Data Mining

Authors: E. Fathallahi Aghdam, V. Hosseini

Abstract:

Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.

Keywords: data mining, clustering, air pollution, crisp approach

Procedia PDF Downloads 426
26940 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 350
26939 Surveillance Video Summarization Based on Histogram Differencing and Sum Conditional Variance

Authors: Nada Jasim Habeeb, Rana Saad Mohammed, Muntaha Khudair Abbass

Abstract:

For more efficient and fast video summarization, this paper presents a surveillance video summarization method. The presented method works to improve video summarization technique. This method depends on temporal differencing to extract most important data from large video stream. This method uses histogram differencing and Sum Conditional Variance which is robust against to illumination variations in order to extract motion objects. The experimental results showed that the presented method gives better output compared with temporal differencing based summarization techniques.

Keywords: temporal differencing, video summarization, histogram differencing, sum conditional variance

Procedia PDF Downloads 346
26938 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams

Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem

Abstract:

In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.

Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data

Procedia PDF Downloads 159
26937 Dynamic Background Updating for Lightweight Moving Object Detection

Authors: Kelemewerk Destalem, Joongjae Cho, Jaeseong Lee, Ju H. Park, Joonhyuk Yoo

Abstract:

Background subtraction and temporal difference are often used for moving object detection in video. Both approaches are computationally simple and easy to be deployed in real-time image processing. However, while the background subtraction is highly sensitive to dynamic background and illumination changes, the temporal difference approach is poor at extracting relevant pixels of the moving object and at detecting the stopped or slowly moving objects in the scene. In this paper, we propose a moving object detection scheme based on adaptive background subtraction and temporal difference exploiting dynamic background updates. The proposed technique consists of a histogram equalization, a linear combination of background and temporal difference, followed by the novel frame-based and pixel-based background updating techniques. Finally, morphological operations are applied to the output images. Experimental results show that the proposed algorithm can solve the drawbacks of both background subtraction and temporal difference methods and can provide better performance than that of each method.

Keywords: background subtraction, background updating, real time, light weight algorithm, temporal difference

Procedia PDF Downloads 340
26936 Assessment of Tidal Influence in Spatial and Temporal Variations of Water Quality in Masan Bay, Korea

Authors: S. J. Kim, Y. J. Yoo

Abstract:

Slack-tide sampling was carried out at seven stations at high and low tides for a tidal cycle, in summer (7, 8, 9) and fall (10), 2016 to determine the differences of water quality according to tides in Masan Bay. The data were analyzed by Pearson correlation and factor analysis. The mixing state of all the water quality components investigated is well explained by the correlation with salinity (SAL). Turbidity (TURB), dissolved silica (DSi), nitrite and nitrate nitrogen (NNN) and total nitrogen (TN), which find their way into the bay from the streams and have no internal source and sink reaction, showed a strong negative correlation with SAL at low tide, indicating the property of conservative mixing. On the contrary, in summer and fall, dissolved oxygen (DO), hydrogen sulfide (H2S) and chemical oxygen demand with KMnO4 (CODMn) of the surface and bottom water, which were sensitive to an internal source and sink reaction, showed no significant correlation with SAL at high and low tides. The remaining water quality parameters showed a conservative or a non-conservative mixing pattern depending on the mixing characteristics at high and low tides, determined by the functional relationship between the changes of the flushing time and the changes of the characteristics of water quality components of the end-members in the bay. Factor analysis performed on the concentration difference data sets between high and low tides helped in identifying the principal latent variables for them. The concentration differences varied spatially and temporally. Principal factors (PFs) scores plots for each monitoring situation showed high associations of the variations to the monitoring sites. At sampling station 1 (ST1), temperature (TEMP), SAL, DSi, TURB, NNN and TN of the surface water in summer, TEMP, SAL, DSi, DO, TURB, NNN, TN, reactive soluble phosphorus (RSP) and total phosphorus (TP) of the bottom water in summer, TEMP, pH, SAL, DSi, DO, TURB, CODMn, particulate organic carbon (POC), ammonia nitrogen (AMN), NNN, TN and fecal coliform (FC) of the surface water in fall, TEMP, pH, SAL, DSi, H2S, TURB, CODMn, AMN, NNN and TN of the bottom water in fall commonly showed up as the most significant parameters and the large concentration differences between high and low tides. At other stations, the significant parameters showed differently according to the spatial and temporal variations of mixing pattern in the bay. In fact, there is no estuary that always maintains steady-state flow conditions. The mixing regime of an estuary might be changed at any time from linear to non-linear, due to the change of flushing time according to the combination of hydrogeometric properties, inflow of freshwater and tidal action, And furthermore the change of end-member conditions due to the internal sinks and sources makes the occurrence of concentration difference inevitable. Therefore, when investigating the water quality of the estuary, it is necessary to take a sampling method considering the tide to obtain average water quality data.

Keywords: conservative mixing, end-member, factor analysis, flushing time, high and low tide, latent variables, non-conservative mixing, slack-tide sampling, spatial and temporal variations, surface and bottom water

Procedia PDF Downloads 129
26935 A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing

Authors: Youngji Yoo, Seung Hwan Park, Daewoong An, Sung-Shick Kim, Jun-Geol Baek

Abstract:

The yield management system is very important to produce high-quality semiconductor chips in the semiconductor manufacturing process. In order to improve quality of semiconductors, various tests are conducted in the post fabrication (FAB) process. During the test process, large amount of data are collected and the data includes a lot of information about defect. In general, the defect on the wafer is the main causes of yield loss. Therefore, analyzing the defect data is necessary to improve performance of yield prediction. The wafer bin map (WBM) is one of the data collected in the test process and includes defect information such as the fail bit patterns. The fail bit has characteristics of spatial point patterns. Therefore, this paper proposes the feature extraction method using the spatial point pattern analysis. Actual data obtained from the semiconductor process is used for experiments and the experimental result shows that the proposed method is more accurately recognize the fail bit patterns.

Keywords: semiconductor, wafer bin map, feature extraction, spatial point patterns, contour map

Procedia PDF Downloads 380
26934 Evaluation of Classification Algorithms for Diagnosis of Asthma in Iranian Patients

Authors: Taha SamadSoltani, Peyman Rezaei Hachesu, Marjan GhaziSaeedi, Maryam Zolnoori

Abstract:

Introduction: Data mining defined as a process to find patterns and relationships along data in the database to build predictive models. Application of data mining extended in vast sectors such as the healthcare services. Medical data mining aims to solve real-world problems in the diagnosis and treatment of diseases. This method applies various techniques and algorithms which have different accuracy and precision. The purpose of this study was to apply knowledge discovery and data mining techniques for the diagnosis of asthma based on patient symptoms and history. Method: Data mining includes several steps and decisions should be made by the user which starts by creation of an understanding of the scope and application of previous knowledge in this area and identifying KD process from the point of view of the stakeholders and finished by acting on discovered knowledge using knowledge conducting, integrating knowledge with other systems and knowledge documenting and reporting.in this study a stepwise methodology followed to achieve a logical outcome. Results: Sensitivity, Specifity and Accuracy of KNN, SVM, Naïve bayes, NN, Classification tree and CN2 algorithms and related similar studies was evaluated and ROC curves were plotted to show the performance of the system. Conclusion: The results show that we can accurately diagnose asthma, approximately ninety percent, based on the demographical and clinical data. The study also showed that the methods based on pattern discovery and data mining have a higher sensitivity compared to expert and knowledge-based systems. On the other hand, medical guidelines and evidence-based medicine should be base of diagnostics methods, therefore recommended to machine learning algorithms used in combination with knowledge-based algorithms.

Keywords: asthma, datamining, classification, machine learning

Procedia PDF Downloads 446
26933 Critical Review of Web Content Mining Extraction Mechanisms

Authors: Rabia Bashir, Sajjad Akbar

Abstract:

There is an inevitable demand of web mining due to rapid increase of huge information on the Internet, but the striking variety of web structures has made required content retrieval a difficult task. To counter this issue, Web Content Mining (WCM) emerges as a potential candidate which extracts and integrates suitable resources of data to users. In past few years, research has been done on several extraction techniques for WCM i.e. agent-based, template-based, assumption-based, statistic-based, wrapper-based and machine learning. However, it is still unclear that either these approaches are efficiently tackling the significant challenges of WCM or not. To answer this question, this paper identifies these challenges such as language independency, structure flexibility, performance, automation, dynamicity, redundancy handling, intelligence, relevant content retrieval, and privacy. Further, mapping of these challenges is done with existing extraction mechanisms which helps to adopt the most suitable WCM approach, given some conditions and characteristics at hand.

Keywords: content mining challenges, web content mining, web content extraction approaches, web information retrieval

Procedia PDF Downloads 545
26932 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 361
26931 Exploring Legal Liabilities of Mining Companies for Human Rights Abuses: Case Study of Mongolian Mine

Authors: Azzaya Enkhjargal

Abstract:

Context: The mining industry has a long history of human rights abuses, including forced labor, environmental pollution, and displacement of communities. In recent years, there has been growing international pressure to hold mining companies accountable for these abuses. Research Aim: This study explores the legal liabilities of mining companies for human rights abuses. The study specifically examines the case of Erdenet Mining Corporation (EMC), a large mining company in Mongolia that has been accused of human rights abuses. Methodology: The study used a mixed-methods approach, which included a review of legal literature, interviews with community members and NGOs, and a case study of EMC. Findings: The study found that mining companies can be held liable for human rights abuses under a variety of regulatory frameworks, including soft law and self-regulatory instruments in the mining industry, international law, national law, and corporate law. The study also found that there are a number of challenges to holding mining companies accountable for human rights abuses, including the lack of effective enforcement mechanisms and the difficulty of proving causation. Theoretical Importance: The study contributes to the growing body of literature on the legal liabilities of mining companies for human rights abuses. The study also provides insights into the challenges of holding mining companies accountable for human rights abuses. Data Collection: The data for the study was collected through a variety of methods, including a review of legal literature, interviews with community members and NGOs, and a case study of EMC. Analysis Procedures: The data was analyzed using a variety of methods, including content analysis, thematic analysis, and case study analysis. Conclusion: The study concludes that mining companies can be held liable for human rights abuses under a variety of legal and regulatory frameworks. There are positive developments in ensuring greater accountability and protection of affected communities and the environment in countries with a strong economy. Regrettably, access to avenues of redress is reasonably low in less developed countries, where the governments have not implemented a robust mechanism to enforce liability requirements in the mining industry. The study recommends that governments and mining companies take more ambitious steps to enhance corporate accountability.

Keywords: human rights, human rights abuses, ESG, litigation, Erdenet Mining Corporation, corporate social responsibility, soft law, self-regulation, mining industry, parent company liability, sustainability, environment, UN

Procedia PDF Downloads 80
26930 Data Management System for Environmental Remediation

Authors: Elizaveta Petelina, Anton Sizo

Abstract:

Environmental remediation projects deal with a wide spectrum of data, including data collected during site assessment, execution of remediation activities, and environmental monitoring. Therefore, an appropriate data management is required as a key factor for well-grounded decision making. The Environmental Data Management System (EDMS) was developed to address all necessary data management aspects, including efficient data handling and data interoperability, access to historical and current data, spatial and temporal analysis, 2D and 3D data visualization, mapping, and data sharing. The system focuses on support of well-grounded decision making in relation to required mitigation measures and assessment of remediation success. The EDMS is a combination of enterprise and desktop level data management and Geographic Information System (GIS) tools assembled to assist to environmental remediation, project planning, and evaluation, and environmental monitoring of mine sites. EDMS consists of seven main components: a Geodatabase that contains spatial database to store and query spatially distributed data; a GIS and Web GIS component that combines desktop and server-based GIS solutions; a Field Data Collection component that contains tools for field work; a Quality Assurance (QA)/Quality Control (QC) component that combines operational procedures for QA and measures for QC; Data Import and Export component that includes tools and templates to support project data flow; a Lab Data component that provides connection between EDMS and laboratory information management systems; and a Reporting component that includes server-based services for real-time report generation. The EDMS has been successfully implemented for the Project CLEANS (Clean-up of Abandoned Northern Mines). Project CLEANS is a multi-year, multimillion-dollar project aimed at assessing and reclaiming 37 uranium mine sites in northern Saskatchewan, Canada. The EDMS has effectively facilitated integrated decision-making for CLEANS project managers and transparency amongst stakeholders.

Keywords: data management, environmental remediation, geographic information system, GIS, decision making

Procedia PDF Downloads 161
26929 Dynamic Environmental Impact Study during the Construction of the French Nuclear Power Plants

Authors: A. Er-Raki, D. Hartmann, J. P. Belaud, S. Negny

Abstract:

This paper has a double purpose: firstly, a literature review of the life cycle analysis (LCA) and secondly a comparison between conventional (static) LCA and multi-level dynamic LCA on the following items: (i) inventories evolution with time (ii) temporal evolution of the databases. The first part of the paper summarizes the state of the art of the static LCA approach. The different static LCA limits have been identified and especially the non-consideration of the spatial and temporal evolution in the inventory, for the characterization factors (FCs) and into the databases. Then a description of the different levels of integration of the notion of temporality in life cycle analysis studies was made. In the second part, the dynamic inventory has been evaluated firstly for a single nuclear plant and secondly for the entire French nuclear power fleet by taking into account the construction durations of all the plants. In addition, the databases have been adapted by integrating the temporal variability of the French energy mix. Several iterations were used to converge towards the real environmental impact of the energy mix. Another adaptation of the databases to take into account the temporal evolution of the market data of the raw material was made. An identification of the energy mix of the time studied was based on an extrapolation of the production reference values of each means of production. An application to the construction of the French nuclear power plants from 1971 to 2000 has been performed, in which a dynamic inventory of raw material has been evaluated. Then the impacts were characterized by the ILCD 2011 characterization method. In order to compare with a purely static approach, a static impact assessment was made with the V 3.4 Ecoinvent data sheets without adaptation and a static inventory considering that all the power stations would have been built at the same time. Finally, a comparison between static and dynamic LCA approaches was set up to determine the gap between them for each of the two levels of integration. The results were analyzed to identify the contribution of the evolving nuclear power fleet construction to the total environmental impacts of the French energy mix during the same period. An equivalent strategy using a dynamic approach will further be applied to identify the environmental impacts that different scenarios of the energy transition could bring, allowing to choose the best energy mix from an environmental viewpoint.

Keywords: LCA, static, dynamic, inventory, construction, nuclear energy, energy mix, energy transition

Procedia PDF Downloads 105
26928 The Principle Probabilities of Space-Distance Resolution for a Monostatic Radar and Realization in Cylindrical Array

Authors: Anatoly D. Pluzhnikov, Elena N. Pribludova, Alexander G. Ryndyk

Abstract:

In conjunction with the problem of the target selection on a clutter background, the analysis of the scanning rate influence on the spatial-temporal signal structure, the generalized multivariate correlation function and the quality of the resolution with the increase pulse repetition frequency is made. The possibility of the object space-distance resolution, which is conditioned by the range-to-angle conversion with an increased scanning rate, is substantiated. The calculations for the real cylindrical array at high scanning rate are presented. The high scanning rate let to get the signal to noise improvement of the order of 10 dB for the space-time signal processing.

Keywords: antenna pattern, array, signal processing, spatial resolution

Procedia PDF Downloads 177
26927 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform

Authors: Reza Mohammadzadeh

Abstract:

The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.

Keywords: data model, geotechnical risks, machine learning, underground coal mining

Procedia PDF Downloads 274
26926 A Concept of Data Mining with XML Document

Authors: Akshay Agrawal, Anand K. Srivastava

Abstract:

The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.

Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering

Procedia PDF Downloads 375
26925 Data Mining Approach: Classification Model Evaluation

Authors: Lubabatu Sada Sodangi

Abstract:

The rapid growth in exchange and accessibility of information via the internet makes many organisations acquire data on their own operation. The aim of data mining is to analyse the different behaviour of a dataset using observation. Although, the subset of the dataset being analysed may not display all the behaviours and relationships of the entire data and, therefore, may not represent other parts that exist in the dataset. There is a range of techniques used in data mining to determine the hidden or unknown information in datasets. In this paper, the performance of two algorithms Chi-Square Automatic Interaction Detection (CHAID) and multilayer perceptron (MLP) would be matched using an Adult dataset to find out the percentage of an/the adults that earn > 50k and those that earn <= 50k per year. The two algorithms were studied and compared using IBM SPSS statistics software. The result for CHAID shows that the most important predictors are relationship and education. The algorithm shows that those are married (husband) and have qualification: Bachelor, Masters, Doctorate or Prof-school whose their age is > 41<57 earn > 50k. Also, multilayer perceptron displays marital status and capital gain as the most important predictors of the income. It also shows that individuals that their capital gain is less than 6,849 and are single, separated or widow, earn <= 50K, whereas individuals with their capital gain is > 6,849, work > 35 hrs/wk, and > 27yrs their income will be > 50k. By comparing the two algorithms, it is observed that both algorithms are reliable but there is strong reliability in CHAID which clearly shows that relation and education contribute to the prediction as displayed in the data visualisation.

Keywords: data mining, CHAID, multi-layer perceptron, SPSS, Adult dataset

Procedia PDF Downloads 376
26924 A Ground Observation Based Climatology of Winter Fog: Study over the Indo-Gangetic Plains, India

Authors: Sanjay Kumar Srivastava, Anu Rani Sharma, Kamna Sachdeva

Abstract:

Every year, fog formation over the Indo-Gangetic Plains (IGPs) of Indian region during the winter months of December and January is believed to create numerous hazards, inconvenience, and economic loss to the inhabitants of this densely populated region of Indian subcontinent. The aim of the paper is to analyze the spatial and temporal variability of winter fog over IGPs. Long term ground observations of visibility and other meteorological parameters (1971-2010) have been analyzed to understand the formation of fog phenomena and its relevance during the peak winter months of January and December over IGP of India. In order to examine the temporal variability, time series and trend analysis were carried out by using the Mann-Kendall Statistical test. Trend analysis performed by using the Mann-Kendall test, accepts the alternate hypothesis with 95% confidence level indicating that there exists a trend. Kendall tau’s statistics showed that there exists a positive correlation between time series and fog frequency. Further, the Theil and Sen’s median slope estimate showed that the magnitude of trend is positive. Magnitude is higher during January compared to December for the entire IGP except in December when it is high over the western IGP. Decade wise time series analysis revealed that there has been continuous increase in fog days. The net overall increase of 99 % was observed over IGP in last four decades. Diurnal variability and average daily persistence were computed by using descriptive statistical techniques. Geo-statistical analysis of fog was carried out to understand the spatial variability of fog. Geo-statistical analysis of fog revealed that IGP is a high fog prone zone with fog occurrence frequency of more than 66% days during the study period. Diurnal variability indicates the peak occurrence of fog is between 06:00 and 10:00 local time and average daily fog persistence extends to 5 to 7 hours during the peak winter season. The results would offer a new perspective to take proactive measures in reducing the irreparable damage that could be caused due to changing trends of fog.

Keywords: fog, climatology, Mann-Kendall test, trend analysis, spatial variability, temporal variability, visibility

Procedia PDF Downloads 238
26923 The Spatial Equity Assessment of Community-Based Elderly Care Facilities in Old Neighborhood of Chongqing

Authors: Jiayue Zhao, Hongjuan Wu, Guiwen Liu

Abstract:

Old neighborhoods with a large elderly population depend on community-based elderly care facilities (community-based ECFs) for aging-in-place. Yet, due to scarce and scattered land, the facilities face inequitable distribution. This research uses spatial equity theory to measure the spatial equity of community-based ECFs in old neighborhoods. Field surveys gather granular data and methods, including coverage rate, Gini coefficient, Lorenz curve, and G2SFCA. The findings showed that coverage is substantial but does not indicate supply is a match to demand, nor does it imply superior accessibility. The key contributions are that structuring spatial equity framework considering elderly residents’ travel behavior. This study is dedicated to the international literature on spatial equity from the perspective of travel behavior and could provide valuable suggestions for the urban planning of old neighborhoods.

Keywords: community-based ECFs, elderly residents’ travel behavior, old neighborhoods, spatial equity

Procedia PDF Downloads 57
26922 Association of Social Data as a Tool to Support Government Decision Making

Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias

Abstract:

Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.

Keywords: social data, government decision making, association of social data, data mining

Procedia PDF Downloads 368