Search results for: geospatial data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 41538

Search results for: geospatial data analysis

41298 Preliminary Design of Maritime Energy Management System: Naval Architectural Approach to Resolve Recent Limitations

Authors: Seyong Jeong, Jinmo Park, Jinhyoun Park, Boram Kim, Kyoungsoo Ahn

Abstract:

Energy management in the maritime industry is being required by economics and in conformity with new legislative actions taken by the International Maritime Organization (IMO) and the European Union (EU). In response, the various performance monitoring methodologies and data collection practices have been examined by different stakeholders. While many assorted advancements in operation and technology are applicable, their adoption in the shipping industry stays small. This slow uptake can be considered due to many different barriers such as data analysis problems, misreported data, and feedback problems, etc. This study presents a conceptual design of an energy management system (EMS) and proposes the methodology to resolve the limitations (e.g., data normalization using naval architectural evaluation, management of misrepresented data, and feedback from shore to ship through management of performance analysis history). We expect this system to make even short-term charterers assess the ship performance properly and implement sustainable fleet control.

Keywords: data normalization, energy management system, naval architectural evaluation, ship performance analysis

Procedia PDF Downloads 444
41297 Measured versus Default Interstate Traffic Data in New Mexico, USA

Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder

Abstract:

This study investigates how the site specific traffic data differs from the Mechanistic Empirical Pavement Design Software default values. Two Weigh-in-Motion (WIM) stations were installed in Interstate-40 (I-40) and Interstate-25 (I-25) to developed site specific data. A computer program named WIM Data Analysis Software (WIMDAS) was developed using Microsoft C-Sharp (.Net) for quality checking and processing of raw WIM data. A complete year data from November 2013 to October 2014 was analyzed using the developed WIM Data Analysis Program. After that, the vehicle class distribution, directional distribution, lane distribution, monthly adjustment factor, hourly distribution, axle load spectra, average number of axle per vehicle, axle spacing, lateral wander distribution, and wheelbase distribution were calculated. Then a comparative study was done between measured data and AASHTOWare default values. It was found that the measured general traffic inputs for I-40 and I-25 significantly differ from the default values.

Keywords: AASHTOWare, traffic, weigh-in-motion, axle load distribution

Procedia PDF Downloads 337
41296 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 231
41295 Additive Weibull Model Using Warranty Claim and Finite Element Analysis Fatigue Analysis

Authors: Kanchan Mondal, Dasharath Koulage, Dattatray Manerikar, Asmita Ghate

Abstract:

This paper presents an additive reliability model using warranty data and Finite Element Analysis (FEA) data. Warranty data for any product gives insight to its underlying issues. This is often used by Reliability Engineers to build prediction model to forecast failure rate of parts. But there is one major limitation in using warranty data for prediction. Warranty periods constitute only a small fraction of total lifetime of a product, most of the time it covers only the infant mortality and useful life zone of a bathtub curve. Predicting with warranty data alone in these cases is not generally provide results with desired accuracy. Failure rate of a mechanical part is driven by random issues initially and wear-out or usage related issues at later stages of the lifetime. For better predictability of failure rate, one need to explore the failure rate behavior at wear out zone of a bathtub curve. Due to cost and time constraints, it is not always possible to test samples till failure, but FEA-Fatigue analysis can provide the failure rate behavior of a part much beyond warranty period in a quicker time and at lesser cost. In this work, the authors proposed an Additive Weibull Model, which make use of both warranty and FEA fatigue analysis data for predicting failure rates. It involves modeling of two data sets of a part, one with existing warranty claims and other with fatigue life data. Hazard rate base Weibull estimation has been used for the modeling the warranty data whereas S-N curved based Weibull parameter estimation is used for FEA data. Two separate Weibull models’ parameters are estimated and combined to form the proposed Additive Weibull Model for prediction.

Keywords: bathtub curve, fatigue, FEA, reliability, warranty, Weibull

Procedia PDF Downloads 66
41294 Analysis of Users’ Behavior on Book Loan Log Based on Association Rule Mining

Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong

Abstract:

This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24 percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.

Keywords: behavior, data mining technique, a priori algorithm, knowledge discovery

Procedia PDF Downloads 400
41293 Analyzing Medical Workflows Using Market Basket Analysis

Authors: Mohit Kumar, Mayur Betharia

Abstract:

Healthcare domain, with the emergence of Electronic Medical Record (EMR), collects a lot of data which have been attracting Data Mining expert’s interest. In the past, doctors have relied on their intuition while making critical clinical decisions. This paper presents the means to analyze the Medical workflows to get business insights out of huge dumped medical databases. Market Basket Analysis (MBA) which is a special data mining technique, has been widely used in marketing and e-commerce field to discover the association between products bought together by customers. It helps businesses in increasing their sales by analyzing the purchasing behavior of customers and pitching the right customer with the right product. This paper is an attempt to demonstrate Market Basket Analysis applications in healthcare. In particular, it discusses the Market Basket Analysis Algorithm ‘Apriori’ applications within healthcare in major areas such as analyzing the workflow of diagnostic procedures, Up-selling and Cross-selling of Healthcare Systems, designing healthcare systems more user-friendly. In the paper, we have demonstrated the MBA applications using Angiography Systems, but can be extrapolated to other modalities as well.

Keywords: data mining, market basket analysis, healthcare applications, knowledge discovery in healthcare databases, customer relationship management, healthcare systems

Procedia PDF Downloads 168
41292 Improving the Analytical Power of Dynamic DEA Models, by the Consideration of the Shape of the Distribution of Inputs/Outputs Data: A Linear Piecewise Decomposition Approach

Authors: Elias K. Maragos, Petros E. Maravelakis

Abstract:

In Dynamic Data Envelopment Analysis (DDEA), which is a subfield of Data Envelopment Analysis (DEA), the productivity of Decision Making Units (DMUs) is considered in relation to time. In this case, as it is accepted by the most of the researchers, there are outputs, which are produced by a DMU to be used as inputs in a future time. Those outputs are known as intermediates. The common models, in DDEA, do not take into account the shape of the distribution of those inputs, outputs or intermediates data, assuming that the distribution of the virtual value of them does not deviate from linearity. This weakness causes the limitation of the accuracy of the analytical power of the traditional DDEA models. In this paper, the authors, using the concept of piecewise linear inputs and outputs, propose an extended DDEA model. The proposed model increases the flexibility of the traditional DDEA models and improves the measurement of the dynamic performance of DMUs.

Keywords: Dynamic Data Envelopment Analysis, DDEA, piecewise linear inputs, piecewise linear outputs

Procedia PDF Downloads 156
41291 A Method for Identifying Unusual Transactions in E-commerce Through Extended Data Flow Conformance Checking

Authors: Handie Pramana Putra, Ani Dijah Rahajoe

Abstract:

The proliferation of smart devices and advancements in mobile communication technologies have permeated various facets of life with the widespread influence of e-commerce. Detecting abnormal transactions holds paramount significance in this realm due to the potential for substantial financial losses. Moreover, the fusion of data flow and control flow assumes a critical role in the exploration of process modeling and data analysis, contributing significantly to the accuracy and security of business processes. This paper introduces an alternative approach to identify abnormal transactions through a model that integrates both data and control flows. Referred to as the Extended Data Petri net (DPNE), our model encapsulates the entire process, encompassing user login to the e-commerce platform and concluding with the payment stage, including the mobile transaction process. We scrutinize the model's structure, formulate an algorithm for detecting anomalies in pertinent data, and elucidate the rationale and efficacy of the comprehensive system model. A case study validates the responsive performance of each system component, demonstrating the system's adeptness in evaluating every activity within mobile transactions. Ultimately, the results of anomaly detection are derived through a thorough and comprehensive analysis.

Keywords: database, data analysis, DPNE, extended data flow, e-commerce

Procedia PDF Downloads 50
41290 An Analysis of Sequential Pattern Mining on Databases Using Approximate Sequential Patterns

Authors: J. Suneetha, Vijayalaxmi

Abstract:

Sequential Pattern Mining involves applying data mining methods to large data repositories to extract usage patterns. Sequential pattern mining methodologies used to analyze the data and identify patterns. The patterns have been used to implement efficient systems can recommend on previously observed patterns, in making predictions, improve usability of systems, detecting events, and in general help in making strategic product decisions. In this paper, identified performance of approximate sequential pattern mining defines as identifying patterns approximately shared with many sequences. Approximate sequential patterns can effectively summarize and represent the databases by identifying the underlying trends in the data. Conducting an extensive and systematic performance over synthetic and real data. The results demonstrate that ApproxMAP effective and scalable in mining large sequences databases with long patterns.

Keywords: multiple data, performance analysis, sequential pattern, sequence database scalability

Procedia PDF Downloads 335
41289 An Exhaustive All-Subsets Examination of Trade Theory on WTO Data

Authors: Masoud Charkhabi

Abstract:

We examine trade theory with this motivation. The full set of World Trade Organization data are organized into country-year pairs, each treated as a different entity. Topological Data Analysis reveals that among the 16 region and 240 region-year pairs there exists in fact a distinguishable group of region-period pairs. The generally accepted periods of shifts from dissimilar-dissimilar to similar-similar trade in goods among regions are examined from this new perspective. The period breaks are treated as cumulative and are flexible. This type of all-subsets analysis is motivated from computer science and is made possible with Lossy Compression and Graph Theory. The results question many patterns in similar-similar to dissimilar-dissimilar trade. They also show indications of economic shifts that only later become evident in other economic metrics.

Keywords: econometrics, globalization, network science, topological data, analysis, trade theory, visualization, world trade

Procedia PDF Downloads 367
41288 Mapping and Database on Mass Movements along the Eastern Edge of the East African Rift in Burundi

Authors: L. Nahimana

Abstract:

The eastern edge of the East African Rift in Burundi shows many mass movement phenomena corresponding to landslides, mudflow, debris flow, spectacular erosion (mega-gully), flash floods and alluvial deposits. These phenomena usually occur during the rainy season. Their extent and consecutive damages vary widely. To manage these phenomena, it is necessary to adopt a methodological approach of their mapping with a structured database. The elements for this database are: three-dimensional extent of the phenomenon, natural causes and conditions (geological lithology, slope, weathering depth and products, rainfall patterns, natural environment) and the anthropogenic factors corresponding to the various human activities. The extent of the area provides information about the possibilities and opportunities for mitigation technique. The lithological nature allows understanding the influence of the nature of the rock and its structure on the intensity of the weathering of rocks, as well as the geotechnical properties of the weathering products. The slope influences the land stability. The intensity of annual, monthly and daily rainfall helps to understand the conditions of water saturation of the terrains. Certain natural circumstances such as the presence of streams and rivers promote foot slope erosion and thus the occurrence and activity of mass movements. The construction of some infrastructures such as new roads and agglomerations deeply modify the flow of surface and underground water followed by mass movements. Using geospatial data selected on the East African Rift in Burundi, it is presented case of mass movements illustrating the nature, importance, various factors and the extent of the damages. An analysis of these elements for each hazard can guide the options for mitigation of the phenomenon and its consequences.

Keywords: mass movement, landslide, mudflow, debris flow, spectacular erosion, mega-gully, flash flood, alluvial deposit, East African rift, Burundi

Procedia PDF Downloads 302
41287 Circular Tool and Dynamic Approach to Grow the Entrepreneurship of Macroeconomic Metabolism

Authors: Maria Areias, Diogo Simões, Ana Figueiredo, Anishur Rahman, Filipa Figueiredo, João Nunes

Abstract:

It is expected that close to 7 billion people will live in urban areas by 2050. In order to improve the sustainability of the territories and its transition towards circular economy, it’s necessary to understand its metabolism and promote and guide the entrepreneurship answer. The study of a macroeconomic metabolism involves the quantification of the inputs, outputs and storage of energy, water, materials and wastes for an urban region. This quantification and analysis representing one opportunity for the promotion of green entrepreneurship. There are several methods to assess the environmental impacts of an urban territory, such as human and environmental risk assessment (HERA), life cycle assessment (LCA), ecological footprint assessment (EF), material flow analysis (MFA), physical input-output table (PIOT), ecological network analysis (ENA), multicriteria decision analysis (MCDA) among others. However, no consensus exists about which of those assessment methods are best to analyze the sustainability of these complex systems. Taking into account the weaknesses and needs identified, the CiiM - Circular Innovation Inter-Municipality project aims to define an uniform and globally accepted methodology through the integration of various methodologies and dynamic approaches to increase the efficiency of macroeconomic metabolisms and promoting entrepreneurship in a circular economy. The pilot territory considered in CiiM project has a total area of 969,428 ha, comprising a total of 897,256 inhabitants (about 41% of the population of the Center Region). The main economic activities in the pilot territory, which contribute to a gross domestic product of 14.4 billion euros, are: social support activities for the elderly; construction of buildings; road transport of goods, retailing in supermarkets and hypermarkets; mass production of other garments; inpatient health facilities; and the manufacture of other components and accessories for motor vehicles. The region's business network is mostly constituted of micro and small companies (similar to the Central Region of Portugal), with a total of 53,708 companies identified in the CIM Region of Coimbra (39 large companies), 28,146 in the CIM Viseu Dão Lafões (22 large companies) and 24,953 in CIM Beiras and Serra da Estrela (13 large companies). For the construction of the database was taking into account data available at the National Institute of Statistics (INE), General Directorate of Energy and Geology (DGEG), Eurostat, Pordata, Strategy and Planning Office (GEP), Portuguese Environment Agency (APA), Commission for Coordination and Regional Development (CCDR) and Inter-municipal Community (CIM), as well as dedicated databases. In addition to the collection of statistical data, it was necessary to identify and characterize the different stakeholder groups in the pilot territory that are relevant to the different metabolism components under analysis. The CIIM project also adds the potential of a Geographic Information System (GIS) so that it is be possible to obtain geospatial results of the territorial metabolisms (rural and urban) of the pilot region. This platform will be a powerful visualization tool of flows of products/services that occur within the region and will support the stakeholders, improving their circular performance and identifying new business ideas and symbiotic partnerships.

Keywords: circular economy tools, life cycle assessment macroeconomic metabolism, multicriteria decision analysis, decision support tools, circular entrepreneurship, industrial and regional symbiosis

Procedia PDF Downloads 95
41286 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks

Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam

Abstract:

In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.

Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion

Procedia PDF Downloads 115
41285 Series Network-Structured Inverse Models of Data Envelopment Analysis: Pitfalls and Solutions

Authors: Zohreh Moghaddas, Morteza Yazdani, Farhad Hosseinzadeh

Abstract:

Nowadays, data envelopment analysis (DEA) models featuring network structures have gained widespread usage for evaluating the performance of production systems and activities (Decision-Making Units (DMUs)) across diverse fields. By examining the relationships between the internal stages of the network, these models offer valuable insights to managers and decision-makers regarding the performance of each stage and its impact on the overall network. To further empower system decision-makers, the inverse data envelopment analysis (IDEA) model has been introduced. This model allows the estimation of crucial information for estimating parameters while keeping the efficiency score unchanged or improved, enabling analysis of the sensitivity of system inputs or outputs according to managers' preferences. This empowers managers to apply their preferences and policies on resources, such as inputs and outputs, and analyze various aspects like production, resource allocation processes, and resource efficiency enhancement within the system. The results obtained can be instrumental in making informed decisions in the future. The top result of this study is an analysis of infeasibility and incorrect estimation that may arise in the theory and application of the inverse model of data envelopment analysis with network structures. By addressing these pitfalls, novel protocols are proposed to circumvent these shortcomings effectively. Subsequently, several theoretical and applied problems are examined and resolved through insightful case studies.

Keywords: inverse models of data envelopment analysis, series network, estimation of inputs and outputs, efficiency, resource allocation, sensitivity analysis, infeasibility

Procedia PDF Downloads 42
41284 A Study on Big Data Analytics, Applications and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 78
41283 A Study on Big Data Analytics, Applications, and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 89
41282 Analysis and Prediction of Netflix Viewing History Using Netflixlatte as an Enriched Real Data Pool

Authors: Amir Mabhout, Toktam Ghafarian, Amirhossein Farzin, Zahra Makki, Sajjad Alizadeh, Amirhossein Ghavi

Abstract:

The high number of Netflix subscribers makes it attractive for data scientists to extract valuable knowledge from the viewers' behavioural analyses. This paper presents a set of statistical insights into viewers' viewing history. After that, a deep learning model is used to predict the future watching behaviour of the users based on previous watching history within the Netflixlatte data pool. Netflixlatte in an aggregated and anonymized data pool of 320 Netflix viewers with a length 250 000 data points recorded between 2008-2022. We observe insightful correlations between the distribution of viewing time and the COVID-19 pandemic outbreak. The presented deep learning model predicts future movie and TV series viewing habits with an average loss of 0.175.

Keywords: data analysis, deep learning, LSTM neural network, netflix

Procedia PDF Downloads 237
41281 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 369
41280 Modelling the Education Supply Chain with Network Data Envelopment Analysis

Authors: Sourour Ramzi, Claudia Sarrico

Abstract:

Little has been done on network DEA in education, and nobody has attempted to model the whole education supply chain using network DEA. As such the contribution of the present paper is to propose a model for measuring the efficiency of education supply chains using network DEA. First, we use a general survey of data envelopment analysis (DEA) to establish the emergent themes for research in DEA, and focus on the theme of Network DEA. Second, we use a survey on two-stage DEA models, and Network DEA to write a state of the art on Network DEA, particularly applied to supply chain management. Third, we use a survey on DEA applications to establish the most influential papers on DEA education applications, in order to establish the state of the art on applications of DEA in education, in general, and applications of DEA to education using network DEA, in particular. Finally, we propose a model for measuring the performance of education supply chains of different education systems (countries or states within a country, for instance). We then use this model on some empirical data.

Keywords: supply chain, education, data envelopment analysis, network DEA

Procedia PDF Downloads 364
41279 Locating the Best Place for Earthquake Refugee Camps by OpenSource Software: A Case Study for Tehran, Iran

Authors: Reyhaneh Saeedi

Abstract:

Iran is one of the regions which are most prone for earthquakes annually having a large number of financial and mortality and financial losses. Every year around the world, a large number of people lose their home and life due to natural disasters such as earthquakes. It is necessary to provide and specify some suitable places for settling the homeless people before the occurrence of the earthquake, one of the most important factors in crisis planning and management. Some of the natural disasters can be Modeling and shown by Geospatial Information System (GIS). By using GIS, it would be possible to manage the spatial data and reach several goals by making use of the analyses existing in it. GIS has a determining role in disaster management because it can determine the best places for temporary resettling after such a disaster. In this research QuantumGIS software is used that It is an OpenSource software so that easy to access codes and It is also free. In this system, AHP method is used as decision model and to locate the best places for temporary resettling, is done based on the related organizations criteria with their weights and buffers. Also in this research are made the buffer layers of criteria and change them to the raster layers. Later on, the raster layers are multiplied on desired weights then, the results are added together. Eventually, there are suitable places for resettling of victims by desired criteria by different colors with their optimum rate in QuantumGIS platform.

Keywords: disaster management, temporary resettlement, earthquake, QuantumGIS

Procedia PDF Downloads 394
41278 Strategic Citizen Participation in Applied Planning Investigations: How Planners Use Etic and Emic Community Input Perspectives to Fill-in the Gaps in Their Analysis

Authors: John Gaber

Abstract:

Planners regularly use citizen input as empirical data to help them better understand community issues they know very little about. This type of community data is based on the lived experiences of local residents and is known as "emic" data. What is becoming more common practice for planners is their use of data from local experts and stakeholders (known as "etic" data or the outsider perspective) to help them fill in the gaps in their analysis of applied planning research projects. Utilizing international Health Impact Assessment (HIA) data, I look at who planners invite to their citizen input investigations. Research presented in this paper shows that planners access a wide range of emic and etic community perspectives in their search for the “community’s view.” The paper concludes with how planners can chart out a new empirical path in their execution of emic/etic citizen participation strategies in their applied planning research projects.

Keywords: citizen participation, emic data, etic data, Health Impact Assessment (HIA)

Procedia PDF Downloads 482
41277 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic

Authors: Fei Gao, Rodolfo C. Raga Jr.

Abstract:

This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.

Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle

Procedia PDF Downloads 67
41276 Monitoring Peri-Urban Growth and Land Use Dynamics with GIS and Remote Sensing Techniques: A Case Study of Burdwan City, India

Authors: Mohammad Arif, Soumen Chatterjee, Krishnendu Gupta

Abstract:

The peri-urban interface is an area of transition where the urban and rural areas meet and interact. So the peri-urban areas, which is characterized by strong urban influence, easy access to markets, services and other inputs, are ready supplies of labour but distant from the land paucity and pollution related to urban growth. Hence, the present study is primarily aimed at quantifying the spatio-temporal pattern of land use/land cover change during the last three decades (i.e., 1987 to 2016) in the peri-urban area of Burdwan city. In the recent past, the morphology of the study region has rapid change due to high growth of population and establishment of industries. The change has predominantly taken place along the State and National Highway 2 (NH-2) and around the Burdwan Municipality for meeting both residential and commercial purposes. To ascertain the degree of change in land use and land cover, over the specified time, satellite imageries and topographical sheets are employed. The data is processed through appropriate software packages to arrive at a deduction that most of the land use changes have occurred by obliterating agricultural land & water bodies and substituting them by built area and industrial spaces. Geospatial analysis of study area showed that this area has experienced a steep increase (30%) of built-up areas and excessive decrease (15%) in croplands between 1987 and 2016. Increase in built-up areas is attributed to the increase of out-migration during this period from the core city. This study also examined social, economic and institutional factors that lead to this rapid land use change in peri-urban areas of the Burdwan city by carrying out a field survey of 250 households in peri-urban areas. The research concludes with an urgency for regulating land subdivisions in peri-urban areas to prevent haphazard land use development. It is expected that the findings of the study would go a long way in facilitating better policy making.

Keywords: growth, land use land cover, morphology, peri-urban, policy making

Procedia PDF Downloads 169
41275 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators

Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros

Abstract:

Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.

Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis

Procedia PDF Downloads 131
41274 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault

Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola

Abstract:

Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.

Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula

Procedia PDF Downloads 73
41273 An Automated Approach to Consolidate Galileo System Availability

Authors: Marie Bieber, Fabrice Cosson, Olivier Schmitt

Abstract:

Europe's Global Navigation Satellite System, Galileo, provides worldwide positioning and navigation services. The satellites in space are only one part of the Galileo system. An extensive ground infrastructure is essential to oversee the satellites and ensure accurate navigation signals. High reliability and availability of the entire Galileo system are crucial to continuously provide positioning information of high quality to users. Outages are tracked, and operational availability is regularly assessed. A highly flexible and adaptive tool has been developed to automate the Galileo system availability analysis. Not only does it enable a quick availability consolidation, but it also provides first steps towards improving the data quality of maintenance tickets used for the analysis. This includes data import and data preparation, with a focus on processing strings used for classification and identifying faulty data. Furthermore, the tool allows to handle a low amount of data, which is a major constraint when the aim is to provide accurate statistics.

Keywords: availability, data quality, system performance, Galileo, aerospace

Procedia PDF Downloads 158
41272 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 429
41271 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 335
41270 Joint Probability Distribution of Extreme Water Level with Rainfall and Temperature: Trend Analysis of Potential Impacts of Climate Change

Authors: Ali Razmi, Saeed Golian

Abstract:

Climate change is known to have the potential to impact adversely hydrologic patterns for variables such as rainfall, maximum and minimum temperature and sea level rise. Long-term average of these climate variables could possibly change over time due to climate change impacts. In this study, trend analysis was performed on rainfall, maximum and minimum temperature and water level data of a coastal area in Manhattan, New York City, Central Park and Battery Park stations to investigate if there is a significant change in the data mean. Partial Man-Kendall test was used for trend analysis. Frequency analysis was then performed on data using common probability distribution functions such as Generalized Extreme Value (GEV), normal, log-normal and log-Pearson. Goodness of fit tests such as Kolmogorov-Smirnov are used to determine the most appropriate distributions. In flood frequency analysis, rainfall and water level data are often separately investigated. However, in determining flood zones, simultaneous consideration of rainfall and water level in frequency analysis could have considerable effect on floodplain delineation (flood extent and depth). The present study aims to perform flood frequency analysis considering joint probability distribution for rainfall and storm surge. First, correlation between the considered variables was investigated. Joint probability distribution of extreme water level and temperature was also investigated to examine how global warming could affect sea level flooding impacts. Copula functions were fitted to data and joint probability of water level with rainfall and temperature for different recurrence intervals of 2, 5, 25, 50, 100, 200, 500, 600 and 1000 was determined and compared with the severity of individual events. Results for trend analysis showed increase in long-term average of data that could be attributed to climate change impacts. GEV distribution was found as the most appropriate function to be fitted to the extreme climate variables. The results for joint probability distribution analysis confirmed the necessity for incorporation of both rainfall and water level data in flood frequency analysis.

Keywords: climate change, climate variables, copula, joint probability

Procedia PDF Downloads 354
41269 Model of Optimal Centroids Approach for Multivariate Data Classification

Authors: Pham Van Nha, Le Cam Binh

Abstract:

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.

Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization

Procedia PDF Downloads 203