Search results for: geospatial data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 41538

Search results for: geospatial data analysis

41358 Internal Displacement in Iraq due to ISIS Occupation and Its Effects on Human Security and Coexistence

Authors: Feisal Khudher Mahmood, Abdul Samad Rahman Sultan

Abstract:

Iraq had been a diverse society with races, cultures and religions that peacefully coexistence. The phenomenon of internal displacement occurred after April 2003, because of political instability as will as the deterioration of the political and security situation as a result of United States of America occupation. Biggest internal displacement have occurred (and keep happening) since 10th of June 2014 due to rise of Islamic State of Iraq and Syria (ISIS) and it’s occupation of one third of country territories. This crisis effected directly 3,275,000 people and reflected negatively on the social fabric of Iraq community and led to waves of sectorial violence that swept the country. Internal displaced communities are vulnerable, especially under non functional and weak government, that led to lose of essential human rights and dignity. Using Geographic Information System (GIS) and Geospatial Techniques, two types of internal displacement have been found; voluntary and forced. Both types of displacement are highly influenced by location, race and religion. The main challenge for Iraqi government and NGOs will be after defeating ISIS. Helping the displaced to resettle within their community and to re-establish the coexistence. By spatial-statical analysis hot spots of future conflicts among displaced community have been highlighted. This will help the government to tackle future conflicts before they occur. Also, it will be the base for social conflict early warning system.

Keywords: internal displacement, Iraq, ISIS, human security, human rights, GIS, spatial-statical analysis

Procedia PDF Downloads 519
41357 Heritage and Tourism in the Era of Big Data: Analysis of Chinese Cultural Tourism in Catalonia

Authors: Xinge Liao, Francesc Xavier Roige Ventura, Dolores Sanchez Aguilera

Abstract:

With the development of the Internet, the study of tourism behavior has rapidly expanded from the traditional physical market to the online market. Data on the Internet is characterized by dynamic changes, and new data appear all the time. In recent years the generation of a large volume of data was characterized, such as forums, blogs, and other sources, which have expanded over time and space, together they constitute large-scale Internet data, known as Big Data. This data of technological origin that derives from the use of devices and the activity of multiple users is becoming a source of great importance for the study of geography and the behavior of tourists. The study will focus on cultural heritage tourist practices in the context of Big Data. The research will focus on exploring the characteristics and behavior of Chinese tourists in relation to the cultural heritage of Catalonia. Geographical information, target image, perceptions in user-generated content will be studied through data analysis from Weibo -the largest social networks of blogs in China. Through the analysis of the behavior of heritage tourists in the Big Data environment, this study will understand the practices (activities, motivations, perceptions) of cultural tourists and then understand the needs and preferences of tourists in order to better guide the sustainable development of tourism in heritage sites.

Keywords: Barcelona, Big Data, Catalonia, cultural heritage, Chinese tourism market, tourists’ behavior

Procedia PDF Downloads 133
41356 Habitat Preference of Lepidoptera (Butterflies), Using Geospatial Analysis in Diyasaru Wetland Park, Western Province, Sri Lanka

Authors: Hiripurage Mallika Sandamali Dissanayaka

Abstract:

Butterflies are found everywhere on Earth, helping flowering plants reproduce through pollination. Wetlands perform many valuable functions such as providing wildlife habitat. Diyasaru Wetland Park was chosen as the study site. It is located in a highly urbanized area of Sri Jayawardenepura Kotte, Sri Lanka. A distribution map was prepared to increase butterfly habitat in the urbanized area, and research was conducted to determine the most suitable sections for using it. As this wetland has footpaths for walking, line transect surveys were used to mark species within the sampling area, and directly observed species were recorded. All data collection was done from 0900 to 1200 hours and 1300 to 1600 hours and fieldwork was done from 11 February 2020 to 20 January 2021. ED binoculars (10.5x45), DSLR cameras (Canon EOS/EFS5 mm 3.5-5.6), and Garmin GPS (Etrex 10) were used to observe butterfly species, identify locations, and take photographs as evidence. Analyzing their habitats using GIS (ArcGIS Pro) to identify their distribution within the park premises, the distribution density of the known size of the population was calculated for each point by kernel density, and local similarity values were calculated for each pair of corresponding features through hotspot analysis, and cell values were determined by inverse distance weighting (IDW) using a linearly weighted combination of a set of sample points. According to the maps prepared to predict the distribution of butterflies in this park, the high level of distribution or favorable areas were near flower gardens and meadows, but some individual species prefer habitats that are more suitable for their life activities, so they live in other areas. Sixty-six (66) species belonging to six (6) families have been recorded in the premises. Sixty (60) species of least concern (LC), two (2) near threatened (NT), and four (4) vulnerable (VU) species have been recorded, and several new species, such as Plum Judy (Abisara echerius), were reported. The outcome of the study will form the basis for decision-making by the Sri Lanka Land Development (SLLD) Corporation for the future development and maintenance of the park.

Keywords: wetland, Lepidoptera, habitat, urban, west

Procedia PDF Downloads 46
41355 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 302
41354 Detecting Potential Geothermal Sites by Using Well Logging, Geophysical and Remote Sensing Data at Siwa Oasis, Western Desert, Egypt

Authors: Amr S. Fahil, Eman Ghoneim

Abstract:

Egypt made significant efforts during the past few years to discover significant renewable energy sources. Regions in Egypt that have been identified for geothermal potential investigation include the Gulf of Suez and the Western Desert. One of the most promising sites for the development of Egypt's Northern Western Desert is Siwa Oasis. The geological setting of the oasis, a tectonically generated depression situated in the northernmost region of the Western desert, supports the potential for substantial geothermal resources. Field data obtained from 27 deep oil wells along the Western Desert included bottom-hole temperature (BHT) depth to basement measurements, and geological maps; data were utilized in this study. The major lithological units, elevation, surface gradient, lineaments density, and remote sensing multispectral and topographic were mapped together to generate the related physiographic variables. Eleven thematic layers were integrated in a geographic information system (GIS) to create geothermal maps to aid in the detection of significant potential geothermal spots along the Siwa Oasis and its vicinity. The contribution of total magnetic intensity data with reduction to the pole (RTP) to the first investigation of the geothermal potential in Siwa Oasis is applied in this work. The integration of geospatial data with magnetic field measurements showed a clear correlation between areas of high heat flow and magnetic anomalies. Such anomalies can be interpreted as related to the existence of high geothermal energy and dense rock, which also have high magnetic susceptibility. The outcomes indicated that the study area has a geothermal gradient ranging from 18 to 42 °C/km, a heat flow ranging from 24.7 to 111.3 m.W. k−1, a thermal conductivity of 1.3–2.65 W.m−1.k−1 and a measured amplitude temperature maximum of 100.7 °C. The southeastern part of the Siwa Oasis, and some sporadic locations on the eastern section of the oasis were found to have significant geothermal potential; consequently, this location is suitable for future geothermal investigation. The adopted method might be applied to identify significant prospective geothermal energy locations in other regions of Egypt and East Africa.

Keywords: magnetic data, SRTM, depth to basement, remote sensing, GIS, geothermal gradient, heat flow, thermal conductivity

Procedia PDF Downloads 101
41353 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 497
41352 Using SNAP and RADTRAD to Establish the Analysis Model for Maanshan PWR Plant

Authors: J. R. Wang, H. C. Chen, C. Shih, S. W. Chen, J. H. Yang, Y. Chiang

Abstract:

In this study, we focus on the establishment of the analysis model for Maanshan PWR nuclear power plant (NPP) by using RADTRAD and SNAP codes with the FSAR, manuals, and other data. In order to evaluate the cumulative dose at the Exclusion Area Boundary (EAB) and Low Population Zone (LPZ) outer boundary, Maanshan NPP RADTRAD/SNAP model was used to perform the analysis of the DBA LOCA case. The analysis results of RADTRAD were similar to FSAR data. These analysis results were lower than the failure criteria of 10 CFR 100.11 (a total radiation dose to the whole body, 250 mSv; a total radiation dose to the thyroid from iodine exposure, 3000 mSv).

Keywords: RADionuclide, transport, removal, and dose estimation (RADTRAD), symbolic nuclear analysis package (SNAP), dose, PWR

Procedia PDF Downloads 454
41351 Reducing Flood Risk through Value Capture and Risk Communication: A Case Study in Cocody-Abidjan

Authors: Dedjo Yao Simon, Takahiro Saito, Norikazu Inuzuka, Ikuo Sugiyama

Abstract:

Abidjan city (Republic of Ivory Coast) is an emerging megacity and an urban coastal area where the number of floods reported is on a rapid increase due to climate change and unplanned urbanization. However, comprehensive disaster mitigation plans, policies, and financial resources are still lacking as the population ignores the extent and location of the flood zones; making them unprepared to mitigate the damages. Considering the existing condition, this paper aims to discuss an approach for flood risk reduction in Cocody Commune through value capture strategy and flood risk communication. Using geospatial techniques and hydrological simulation, we start our study by delineating flood zones and depths under several return periods in the study area. Then, through a questionnaire a field survey is conducted in order to validate the flood maps, to estimate the flood risk and to collect some sample of the opinion of residents on how the flood risk information disclosure could affect the values of property located inside and outside the flood zones. The results indicate that the study area is highly vulnerable to 5-year floods and more, which can cause serious harm to human lives and to properties as demonstrated by the extent of the 5-year flood of 2014. Also, it is revealed there is a high probability that the values of property located within flood zones could decline, and the values of surrounding property in the safe area could increase when risk information disclosure commences. However in order to raise public awareness of flood disaster and to prevent future housing promotion in high-risk prospective areas, flood risk information should be disseminated through the establishment of an early warning system. In order to reduce the effect of risk information disclosure and to protect the values of property within the high-risk zone, we propose that property tax increments in flood free zones should be captured and be utilized for infrastructure development and to maintain the early warning system that will benefit people living in flood prone areas. Through this case study, it is shown that combination of value capture strategy and risk communication could be an effective tool to educate citizen and to invest in flood risk reduction in emerging countries.

Keywords: Cocody-Abidjan, flood, geospatial techniques, risk communication, value capture

Procedia PDF Downloads 268
41350 Efficiency of the Slovak Commercial Banks Applying the DEA Window Analysis

Authors: Iveta Řepková

Abstract:

The aim of this paper is to estimate the efficiency of the Slovak commercial banks employing the Data Envelopment Analysis (DEA) window analysis approach during the period 2003-2012. The research is based on unbalanced panel data of the Slovak commercial banks. Undesirable output was included into analysis of banking efficiency. It was found that most efficient banks were Postovabanka, UniCredit Bank and Istrobanka in CCR model and the most efficient banks were Slovenskasporitelna, Istrobanka and UniCredit Bank in BCC model. On contrary, the lowest efficient banks were found Privatbanka and CitiBank. We found that the largest banks in the Slovak banking market were lower efficient than medium-size and small banks. Results of the paper is that during the period 2003-2008 the average efficiency was increasing and then during the period 2010-2011 the average efficiency decreased as a result of financial crisis.

Keywords: data envelopment analysis, efficiency, Slovak banking sector, window analysis

Procedia PDF Downloads 354
41349 Spatial Variability of Brahmaputra River Flow Characteristics

Authors: Hemant Kumar

Abstract:

Brahmaputra River is known according to the Hindu mythology the son of the Lord Brahma. According to this name, the river Brahmaputra creates mass destruction during the monsoon season in Assam, India. It is a state situated in North-East part of India. This is one of the essential states out of the seven countries of eastern India, where almost all entire Brahmaputra flow carried out. The other states carry their tributaries. In the present case study, the spatial analysis performed in this specific case the number of MODIS data are acquired. In the method of detecting the change, the spray content was found during heavy rainfall and in the flooded monsoon season. By this method, particularly the analysis over the Brahmaputra outflow determines the flooded season. The charged particle-associated in aerosol content genuinely verifies the heavy water content below the ground surface, which is validated by trend analysis through rainfall spectrum data. This is confirmed by in-situ sampled view data from a different position of Brahmaputra River. Further, a Hyperion Hyperspectral 30 m resolution data were used to scan the sediment deposits, which is also confirmed by in-situ sampled view data from a different position.

Keywords: aerosol, change detection, spatial analysis, trend analysis

Procedia PDF Downloads 144
41348 Attribute Analysis of Quick Response Code Payment Users Using Discriminant Non-negative Matrix Factorization

Authors: Hironori Karachi, Haruka Yamashita

Abstract:

Recently, the system of quick response (QR) code is getting popular. Many companies introduce new QR code payment services and the services are competing with each other to increase the number of users. For increasing the number of users, we should grasp the difference of feature of the demographic information, usage information, and value of users between services. In this study, we conduct an analysis of real-world data provided by Nomura Research Institute including the demographic data of users and information of users’ usages of two services; LINE Pay, and PayPay. For analyzing such data and interpret the feature of them, Nonnegative Matrix Factorization (NMF) is widely used; however, in case of the target data, there is a problem of the missing data. EM-algorithm NMF (EMNMF) to complete unknown values for understanding the feature of the given data presented by matrix shape. Moreover, for comparing the result of the NMF analysis of two matrices, there is Discriminant NMF (DNMF) shows the difference of users features between two matrices. In this study, we combine EMNMF and DNMF and also analyze the target data. As the interpretation, we show the difference of the features of users between LINE Pay and Paypay.

Keywords: data science, non-negative matrix factorization, missing data, quality of services

Procedia PDF Downloads 126
41347 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining

Procedia PDF Downloads 432
41346 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 39
41345 Landfill Site Selection Using Multi-Criteria Decision Analysis A Case Study for Gulshan-e-Iqbal Town, Karachi

Authors: Javeria Arain, Saad Malik

Abstract:

The management of solid waste is a crucial and essential aspect of urban environmental management especially in a city with an ever increasing population such as Karachi. The total amount of municipal solid waste generated from Gulshan e Iqbal town on average is 444.48 tons per day and landfill sites are a widely accepted solution for final disposal of this waste. However, an improperly selected site can have immense environmental, economical and ecological impacts. To select an appropriate landfill site a number of factors should be kept into consideration to minimize the potential hazards of solid waste. The purpose of this research is to analyse the study area for the construction of an appropriate landfill site for disposal of municipal solid waste generated from Gulshan e-Iqbal Town by using geospatial techniques considering hydrological, geological, social and geomorphological factors. This was achieved using analytical hierarchy process and fuzzy analysis as a decision support tool with integration of geographic information sciences techniques. Eight most critical parameters, relevant to the study area, were selected. After generation of thematic layers for each parameter, overlay analysis was performed in ArcGIS 10.0 software. The results produced by both methods were then compared with each other and the final suitability map using AHP shows that 19% of the total area is Least Suitable, 6% is Suitable but avoided, 46% is Moderately Suitable, 26% is Suitable, 2% is Most Suitable and 1% is Restricted. In comparison the output map of fuzzy set theory is not in crisp logic rather it provides an output map with a range of 0-1, where 0 indicates least suitable and 1 indicates most suitable site. Considering the results it is deduced that the northern part of the city is appropriate for constructing the landfill site though a final decision for an optimal site could be made after field survey and considering economical and political factors.

Keywords: Analytical Hierarchy Process (AHP), fuzzy set theory, Geographic Information Sciences (GIS), Multi-Criteria Decision Analysis (MCDA)

Procedia PDF Downloads 500
41344 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 239
41343 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 172
41342 Structural Equation Modeling Semiparametric Truncated Spline Using Simulation Data

Authors: Adji Achmad Rinaldo Fernandes

Abstract:

SEM analysis is a complex multivariate analysis because it involves a number of exogenous and endogenous variables that are interconnected to form a model. The measurement model is divided into two, namely, the reflective model (reflecting) and the formative model (forming). Before carrying out further tests on SEM, there are assumptions that must be met, namely the linearity assumption, to determine the form of the relationship. There are three modeling approaches to path analysis, including parametric, nonparametric and semiparametric approaches. The aim of this research is to develop semiparametric SEM and obtain the best model. The data used in the research is secondary data as the basis for the process of obtaining simulation data. Simulation data was generated with various sample sizes of 100, 300, and 500. In the semiparametric SEM analysis, the form of the relationship studied was determined, namely linear and quadratic and determined one and two knot points with various levels of error variance (EV=0.5; 1; 5). There are three levels of closeness of relationship for the analysis process in the measurement model consisting of low (0.1-0.3), medium (0.4-0.6) and high (0.7-0.9) levels of closeness. The best model lies in the form of the relationship X1Y1 linear, and. In the measurement model, a characteristic of the reflective model is obtained, namely that the higher the closeness of the relationship, the better the model obtained. The originality of this research is the development of semiparametric SEM, which has not been widely studied by researchers.

Keywords: semiparametric SEM, measurement model, structural model, reflective model, formative model

Procedia PDF Downloads 33
41341 Prediction of Marine Ecosystem Changes Based on the Integrated Analysis of Multivariate Data Sets

Authors: Prozorkevitch D., Mishurov A., Sokolov K., Karsakov L., Pestrikova L.

Abstract:

The current body of knowledge about the marine environment and the dynamics of marine ecosystems includes a huge amount of heterogeneous data collected over decades. It generally includes a wide range of hydrological, biological and fishery data. Marine researchers collect these data and analyze how and why the ecosystem changes from past to present. Based on these historical records and linkages between the processes it is possible to predict future changes. Multivariate analysis of trends and their interconnection in the marine ecosystem may be used as an instrument for predicting further ecosystem evolution. A wide range of information about the components of the marine ecosystem for more than 50 years needs to be used to investigate how these arrays can help to predict the future.

Keywords: barents sea ecosystem, abiotic, biotic, data sets, trends, prediction

Procedia PDF Downloads 112
41340 Analysis of Lead Time Delays in Supply Chain: A Case Study

Authors: Abdel-Aziz M. Mohamed, Nermeen Coutry

Abstract:

Lead time is an important measure of supply chain performance. It impacts both customer satisfactions as well as the total cost of inventory. This paper presents the result of a study on the analysis of the customer order lead-time for a multinational company. In the study, the lead time was divided into three stages: order entry, order fulfillment, and order delivery. A sample of size 2,425 order lines from the company records were considered for this study. The sample data includes information regarding customer orders from the time of order entry until order delivery. Data regarding the lead time of each sage for different orders were also provided. Summary statistics on lead time data reveals that about 30% of the orders were delivered after the scheduled due date. The result of the multiple linear regression analysis technique revealed that component type, logistics parameter, order size and the customer type have significant impact on lead time. Data analysis on the stages of lead time indicates that stage 2 consumes over 50% of the lead time. Pareto analysis was made to study the reasons for the customer order delay in each of the 3 stages. Recommendation was given to resolve the problem.

Keywords: lead time reduction, customer satisfaction, service quality, statistical analysis

Procedia PDF Downloads 722
41339 Combining Diffusion Maps and Diffusion Models for Enhanced Data Analysis

Authors: Meng Su

Abstract:

High-dimensional data analysis often presents challenges in capturing the complex, nonlinear relationships and manifold structures inherent to the data. This article presents a novel approach that leverages the strengths of two powerful techniques, Diffusion Maps and Diffusion Probabilistic Models (DPMs), to address these challenges. By integrating the dimensionality reduction capability of Diffusion Maps with the data modeling ability of DPMs, the proposed method aims to provide a comprehensive solution for analyzing and generating high-dimensional data. The Diffusion Map technique preserves the nonlinear relationships and manifold structure of the data by mapping it to a lower-dimensional space using the eigenvectors of the graph Laplacian matrix. Meanwhile, DPMs capture the dependencies within the data, enabling effective modeling and generation of new data points in the low-dimensional space. The generated data points can then be mapped back to the original high-dimensional space, ensuring consistency with the underlying manifold structure. Through a detailed example implementation, the article demonstrates the potential of the proposed hybrid approach to achieve more accurate and effective modeling and generation of complex, high-dimensional data. Furthermore, it discusses possible applications in various domains, such as image synthesis, time-series forecasting, and anomaly detection, and outlines future research directions for enhancing the scalability, performance, and integration with other machine learning techniques. By combining the strengths of Diffusion Maps and DPMs, this work paves the way for more advanced and robust data analysis methods.

Keywords: diffusion maps, diffusion probabilistic models (DPMs), manifold learning, high-dimensional data analysis

Procedia PDF Downloads 96
41338 Terrestrial Laser Scans to Assess Aerial LiDAR Data

Authors: J. F. Reinoso-Gordo, F. J. Ariza-López, A. Mozas-Calvache, J. L. García-Balboa, S. Eddargani

Abstract:

The DEMs quality may depend on several factors such as data source, capture method, processing type used to derive them, or the cell size of the DEM. The two most important capture methods to produce regional-sized DEMs are photogrammetry and LiDAR; DEMs covering entire countries have been obtained with these methods. The quality of these DEMs has traditionally been evaluated by the national cartographic agencies through punctual sampling that focused on its vertical component. For this type of evaluation there are standards such as NMAS and ASPRS Positional Accuracy Standards for Digital Geospatial Data. However, it seems more appropriate to carry out this evaluation by means of a method that takes into account the superficial nature of the DEM and, therefore, its sampling is superficial and not punctual. This work is part of the Research Project "Functional Quality of Digital Elevation Models in Engineering" where it is necessary to control the quality of a DEM whose data source is an experimental LiDAR flight with a density of 14 points per square meter to which we call Point Cloud Product (PCpro). In the present work it is described the capture data on the ground and the postprocessing tasks until getting the point cloud that will be used as reference (PCref) to evaluate the PCpro quality. Each PCref consists of a patch 50x50 m size coming from a registration of 4 different scan stations. The area studied was the Spanish region of Navarra that covers an area of 10,391 km2; 30 patches homogeneously distributed were necessary to sample the entire surface. The patches have been captured using a Leica BLK360 terrestrial laser scanner mounted on a pole that reached heights of up to 7 meters; the position of the scanner was inverted so that the characteristic shadow circle does not exist when the scanner is in direct position. To ensure that the accuracy of the PCref is greater than that of the PCpro, the georeferencing of the PCref has been carried out with real-time GNSS, and its accuracy positioning was better than 4 cm; this accuracy is much better than the altimetric mean square error estimated for the PCpro (<15 cm); The kind of DEM of interest is the corresponding to the bare earth, so that it was necessary to apply a filter to eliminate vegetation and auxiliary elements such as poles, tripods, etc. After the postprocessing tasks the PCref is ready to be compared with the PCpro using different techniques: cloud to cloud or after a resampling process DEM to DEM.

Keywords: data quality, DEM, LiDAR, terrestrial laser scanner, accuracy

Procedia PDF Downloads 96
41337 Differentiation between Different Rangeland Sites Using Principal Component Analysis in Semi-Arid Areas of Sudan

Authors: Nancy Ibrahim Abdalla, Abdelaziz Karamalla Gaiballa

Abstract:

Rangelands in semi-arid areas provide a good source for feeding huge numbers of animals and serving environmental, economic and social importance; therefore, these areas are considered economically very important for the pastoral sector in Sudan. This paper investigates the means of differentiating between different rangelands sites according to soil types using principal component analysis to assist in monitoring and assessment purposes. Three rangeland sites were identified in the study area as flat sandy sites, sand dune site, and hard clay site. Principal component analysis (PCA) was used to reduce the number of factors needed to distinguish between rangeland sites and produce a new set of data including the most useful spectral information to run satellite image processing. It was performed using selected types of data (two vegetation indices, topographic data and vegetation surface reflectance within the three bands of MODIS data). Analysis with PCA indicated that there is a relatively high correspondence between vegetation and soil of the total variance in the data set. The results showed that the use of the principal component analysis (PCA) with the selected variables showed a high difference, reflected in the variance and eigenvalues and it can be used for differentiation between different range sites.

Keywords: principal component analysis, PCA, rangeland sites, semi-arid areas, soil types

Procedia PDF Downloads 177
41336 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 383
41335 Interpretation and Clustering Framework for Analyzing ECG Survey Data

Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif

Abstract:

As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.

Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix

Procedia PDF Downloads 464
41334 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders

Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh

Abstract:

Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.

Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches

Procedia PDF Downloads 67
41333 Longitudinal Analysis of Internet Speed Data in the Gulf Cooperation Council Region

Authors: Musab Isah

Abstract:

This paper presents a longitudinal analysis of Internet speed data in the Gulf Cooperation Council (GCC) region, focusing on the most populous cities of each of the six countries – Riyadh, Saudi Arabia; Dubai, UAE; Kuwait City, Kuwait; Doha, Qatar; Manama, Bahrain; and Muscat, Oman. The study utilizes data collected from the Measurement Lab (M-Lab) infrastructure over a five-year period from January 1, 2019, to December 31, 2023. The analysis includes downstream and upstream throughput data for the cities, covering significant events such as the launch of 5G networks in 2019, COVID-19-induced lockdowns in 2020 and 2021, and the subsequent recovery period and return to normalcy. The results showcase substantial increases in Internet speeds across the cities, highlighting improvements in both download and upload throughput over the years. All the GCC countries have achieved above-average Internet speeds that can conveniently support various online activities and applications with excellent user experience.

Keywords: internet data science, internet performance measurement, throughput analysis, internet speed, measurement lab, network diagnostic tool

Procedia PDF Downloads 54
41332 Extreme Temperature Forecast in Mbonge, Cameroon Through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution

Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph

Abstract:

In this paper, temperature extremes are forecast by employing the block maxima method of the generalized extreme value (GEV) distribution to analyse temperature data from the Cameroon Development Corporation (CDC). By considering two sets of data (raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data, while in the simulated data the return values show an increasing trend with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend with an upper bound. This clearly shows that although temperatures in the tropics show a sign of increase in the future, there is a maximum temperature at which there is no exceedance. The results of this paper are very vital in agricultural and environmental research.

Keywords: forecasting, generalized extreme value (GEV), meteorology, return level

Procedia PDF Downloads 472
41331 Analysis of Spatial and Temporal Data Using Remote Sensing Technology

Authors: Kapil Pandey, Vishnu Goyal

Abstract:

Spatial and temporal data analysis is very well known in the field of satellite image processing. When spatial data are correlated with time, series analysis it gives the significant results in change detection studies. In this paper the GIS and Remote sensing techniques has been used to find the change detection using time series satellite imagery of Uttarakhand state during the years of 1990-2010. Natural vegetation, urban area, forest cover etc. were chosen as main landuse classes to study. Landuse/ landcover classes within several years were prepared using satellite images. Maximum likelihood supervised classification technique was adopted in this work and finally landuse change index has been generated and graphical models were used to present the changes.

Keywords: GIS, landuse/landcover, spatial and temporal data, remote sensing

Procedia PDF Downloads 427
41330 Exploring the Spatial Relationship between Built Environment and Ride-hailing Demand: Applying Street-Level Images

Authors: Jingjue Bao, Ye Li, Yujie Qi

Abstract:

The explosive growth of ride-hailing has reshaped residents' travel behavior and plays a crucial role in urban mobility within the built environment. Contributing to the research of the spatial variation of ride-hailing demand and its relationship to the built environment and socioeconomic factors, this study utilizes multi-source data from Haikou, China, to construct a Multi-scale Geographically Weighted Regression model (MGWR), considering spatial scale heterogeneity. The regression results showed that MGWR model was demonstrated superior interpretability and reliability with an improvement of 3.4% on R2 and from 4853 to 4787 on AIC, compared with Geographically Weighted Regression model (GWR). Furthermore, to precisely identify the surrounding environment of sampling point, DeepLabv3+ model is employed to segment street-level images. Features extracted from these images are incorporated as variables in the regression model, further enhancing its rationality and accuracy by 7.78% improvement on R2 compared with the MGWR model only considered region-level variables. By integrating multi-scale geospatial data and utilizing advanced computer vision techniques, this study provides a comprehensive understanding of the spatial dynamics between ride-hailing demand and the urban built environment. The insights gained from this research are expected to contribute significantly to urban transportation planning and policy making, as well as ride-hailing platforms, facilitating the development of more efficient and effective mobility solutions in modern cities.

Keywords: travel behavior, ride-hailing, spatial relationship, built environment, street-level image

Procedia PDF Downloads 71
41329 On the Estimation of Crime Rate in the Southwest of Nigeria: Principal Component Analysis Approach

Authors: Kayode Balogun, Femi Ayoola

Abstract:

Crime is at alarming rate in this part of world and there are many factors that are contributing to this antisocietal behaviour both among the youths and old. In this work, principal component analysis (PCA) was used as a tool to reduce the dimensionality and to really know those variables that were crime prone in the study region. Data were collected on twenty-eight crime variables from National Bureau of Statistics (NBS) databank for a period of fifteen years, while retaining as much of the information as possible. We use PCA in this study to know the number of major variables and contributors to the crime in the Southwest Nigeria. The results of our analysis revealed that there were eight principal variables have been retained using the Scree plot and Loading plot which implies an eight-equation solution will be appropriate for the data. The eight components explained 93.81% of the total variation in the data set. We also found that the highest and commonly committed crimes in the Southwestern Nigeria were: Assault, Grievous Harm and Wounding, theft/stealing, burglary, house breaking, false pretence, unlawful arms possession and breach of public peace.

Keywords: crime rates, data, Southwest Nigeria, principal component analysis, variables

Procedia PDF Downloads 437