Search results for: ArcGIS data analysis
41945 Regionalization of IDF Curves, by Interpolating Intensity and Adjustment Parameters - Application to Boyacá, Colombia
Authors: Pedro Mauricio Acosta, Carlos Andrés Caro
Abstract:
This research presents the regionalization of IDF curves for the department of Boyacá, Colombia, which comprises 16 towns, including the provincial capital, Tunja. For regionalization adjustment parameters (U and alpha) of the IDF curves stations referred to in the studied area were used. Similar regionalization is used by the interpolation of intensities. In the case of regionalization by parameters found by the construction of the curves intensity, duration and frequency estimation methods using ordinary moments and maximum likelihood. Regionalization and interpolation of data were performed with the assistance of Arcgis software. Within the development of the project the best choice to provide a level of reliability such as to determine which of the options and ways to regionalize is best sought. The resulting isolines maps were made in the case of regionalization intensities, each map is associated with a different return period and duration in order to build IDF curves in the studied area. In the case of the regionalization maps parameters associated with each parameter were performed last.Keywords: intensity duration, frequency curves, regionalization, hydrology
Procedia PDF Downloads 32341944 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014
Authors: Alexiou Dimitra, Fragkaki Maria
Abstract:
The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.Keywords: Multiple Factorial Correspondence Analysis, Principal Component Analysis, Factor Analysis, E.U.-28 countries, Statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu Statistics
Procedia PDF Downloads 50941943 Survey on Arabic Sentiment Analysis in Twitter
Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb
Abstract:
Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.Keywords: big data, social networks, sentiment analysis, twitter
Procedia PDF Downloads 57541942 Analysis of Big Data
Authors: Sandeep Sharma, Sarabjit Singh
Abstract:
As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.Keywords: big data, unstructured data, volume, variety, velocity
Procedia PDF Downloads 54541941 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research
Authors: Carla Silva
Abstract:
Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.Keywords: data mining, research analysis, investment decision-making, educational research
Procedia PDF Downloads 35641940 A Geographical Spatial Analysis on the Benefits of Using Wind Energy in Kuwait
Authors: Obaid AlOtaibi, Salman Hussain
Abstract:
Wind energy is associated with many geographical factors including wind speed, climate change, surface topography, environmental impacts, and several economic factors, most notably the advancement of wind technology and energy prices. It is the fastest-growing and least economically expensive method for generating electricity. Wind energy generation is directly related to the characteristics of spatial wind. Therefore, the feasibility study for the wind energy conversion system is based on the value of the energy obtained relative to the initial investment and the cost of operation and maintenance. In Kuwait, wind energy is an appropriate choice as a source of energy generation. It can be used in groundwater extraction in agricultural areas such as Al-Abdali in the north and Al-Wafra in the south, or in fresh and brackish groundwater fields or remote and isolated locations such as border areas and projects away from conventional power electricity services, to take advantage of alternative energy, reduce pollutants, and reduce energy production costs. The study covers the State of Kuwait with an exception of metropolitan area. Climatic data were attained through the readings of eight distributed monitoring stations affiliated with Kuwait Institute for Scientific Research (KISR). The data were used to assess the daily, monthly, quarterly, and annual available wind energy accessible for utilization. The researchers applied the Suitability Model to analyze the study by using the ArcGIS program. It is a model of spatial analysis that compares more than one location based on grading weights to choose the most suitable one. The study criteria are: the average annual wind speed, land use, topography of land, distance from the main road networks, urban areas. According to the previous criteria, the four proposed locations to establish wind farm projects are selected based on the weights of the degree of suitability (excellent, good, average, and poor). The percentage of areas that represents the most suitable locations with an excellent rank (4) is 8% of Kuwait’s area. It is relatively distributed as follows: Al-Shqaya, Al-Dabdeba, Al-Salmi (5.22%), Al-Abdali (1.22%), Umm al-Hayman (0.70%), North Wafra and Al-Shaqeeq (0.86%). The study recommends to decision-makers to consider the proposed location (No.1), (Al-Shqaya, Al-Dabdaba, and Al-Salmi) as the most suitable location for future development of wind farms in Kuwait, this location is economically feasible.Keywords: Kuwait, renewable energy, spatial analysis, wind energy
Procedia PDF Downloads 14641939 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process
Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek
Abstract:
Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process
Procedia PDF Downloads 40141938 Comparative Morphometric Analysis of Yelganga-Shivbhadra and Kohilla River Sub-Basins in Aurangabad District Maharashtra India
Authors: Chandrakant Gurav, Md Babar, Ajaykumar Asode
Abstract:
Morphometric analysis is the first stage of any basin analysis. By using these morphometric parameters we give indirect information about the nature and relations of stream with other streams, Geology of the area, groundwater condition and tectonic history of the basin. In the present study, Yelganga, Shivbhadra and Kohilla rivers, tributaries of the Godavari River in Aurangabad district, Maharashtra, India are considered to compare and study their morphometric characters. The linear, areal and relief morphometric aspects of the sub-basins have been assessed and evaluated in GIS environment. For this study, ArcGIS 10.1 software has been used for delineating, digitizing and generating different thematic maps. The Survey of India (SOI) toposheets maps and Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) on resolution 30 m downloaded from United States Geological Survey (USGS) have been used for preparation of map and data generation. Geologically, the study area is covered by Central Deccan Volcanic Province (CDVP). It mainly consists of ‘aa’ type of basaltic lava flows of Late (upper) Cretaceous to Early (lower) Eocene age. The total geographical area of Yelganga, Shivbhadra and Kohilla river sub-basins are 185.5 sq. km., 142.6 sq. km and 122.3 sq. km. respectively The stream ordering method as suggested by the Strahler has been employed for present study and found that all the sub-basins are of 5th order streams. The average bifurcation ratio value of the sub-basins is below 5, indicates that there appears to be no strong structural control on drainage development, homogeneous nature of lithology and drainage network is in well-developed stage of erosion. The drainage density of Yelganga, Shivbhadra and Kohilla Sub-basins is 1.79 km/km2, 1.48 km/km2 and 1.89 km/km2 respectively and stream frequency is 1.94 streams/km2, 1.19 streams/km2 and 1.68 streams/km2 respectively, indicating semi-permeable sub-surface. Based on textural ratio values it indicates that the sub-basins have coarse texture. Shape parameters such as form factor ratio, circularity ratio and elongation ratio values shows that all three sub- basins are elongated in shape.Keywords: GIS, Kohilla, morphometry, Shivbhadra, Yelganga
Procedia PDF Downloads 15341937 The Economic Limitations of Defining Data Ownership Rights
Authors: Kacper Tomasz Kröber-Mulawa
Abstract:
This paper will address the topic of data ownership from an economic perspective, and examples of economic limitations of data property rights will be provided, which have been identified using methods and approaches of economic analysis of law. To properly build a background for the economic focus, in the beginning a short perspective of data and data ownership in the EU’s legal system will be provided. It will include a short introduction to its political and social importance and highlight relevant viewpoints. This will stress the importance of a Single Market for data but also far-reaching regulations of data governance and privacy (including the distinction of personal and non-personal data, data held by public bodies and private businesses). The main discussion of this paper will build upon the briefly referred to legal basis as well as methods and approaches of economic analysis of law.Keywords: antitrust, data, data ownership, digital economy, property rights
Procedia PDF Downloads 8041936 Suspended Sediment Concentration and Water Quality Monitoring Along Aswan High Dam Reservoir Using Remote Sensing
Authors: M. Aboalazayem, Essam A. Gouda, Ahmed M. Moussa, Amr E. Flifl
Abstract:
Field data collecting is considered one of the most difficult work due to the difficulty of accessing large zones such as large lakes. Also, it is well known that the cost of obtaining field data is very expensive. Remotely monitoring of lake water quality (WQ) provides an economically feasible approach comparing to field data collection. Researchers have shown that lake WQ can be properly monitored via Remote sensing (RS) analyses. Using satellite images as a method of WQ detection provides a realistic technique to measure quality parameters across huge areas. Landsat (LS) data provides full free access to often occurring and repeating satellite photos. This enables researchers to undertake large-scale temporal comparisons of parameters related to lake WQ. Satellite measurements have been extensively utilized to develop algorithms for predicting critical water quality parameters (WQPs). The goal of this paper is to use RS to derive WQ indicators in Aswan High Dam Reservoir (AHDR), which is considered Egypt's primary and strategic reservoir of freshwater. This study focuses on using Landsat8 (L-8) band surface reflectance (SR) observations to predict water-quality characteristics which are limited to Turbidity (TUR), total suspended solids (TSS), and chlorophyll-a (Chl-a). ArcGIS pro is used to retrieve L-8 SR data for the study region. Multiple linear regression analysis was used to derive new correlations between observed optical water-quality indicators in April and L-8 SR which were atmospherically corrected by values of various bands, band ratios, and or combinations. Field measurements taken in the month of May were used to validate WQP obtained from SR data of L-8 Operational Land Imager (OLI) satellite. The findings demonstrate a strong correlation between indicators of WQ and L-8 .For TUR, the best validation correlation with OLI SR bands blue, green, and red, were derived with high values of Coefficient of correlation (R2) and Root Mean Square Error (RMSE) equal 0.96 and 3.1 NTU, respectively. For TSS, Two equations were strongly correlated and verified with band ratios and combinations. A logarithm of the ratio of blue and green SR was determined to be the best performing model with values of R2 and RMSE equal to 0.9861 and 1.84 mg/l, respectively. For Chl-a, eight methods were presented for calculating its value within the study area. A mix of blue, red, shortwave infrared 1(SWR1) and panchromatic SR yielded the greatest validation results with values of R2 and RMSE equal 0.98 and 1.4 mg/l, respectively.Keywords: remote sensing, landsat 8, nasser lake, water quality
Procedia PDF Downloads 9141935 Horizontal Development of Built-up Area and Its Impacts on the Agricultural Land of Peshawar City District (1991-2014)
Authors: Pukhtoon Yar
Abstract:
Peshawar City is experiencing a rapid spatial urban growth primarily as a result of high rate of urbanization along with economic development. This paper was designed to understand the impacts of urbanization on agriculture land use change by particularly focusing on land use change trajectories from the past (1991-2014). We used Landsat imageries (30 meters) for1991along with Spot images (2.5 meters) for year 2014. . The ground truthing of the satellite data was performed by collecting information from Peshawar Development Authority, revenue department, real estate agents and interviews with the officials of city administration. The temporal satellite images were processed by applying supervised maximum likelihood classification technique in ArcGIS 9.3. The procedure resulted into five main classes of land use i.e. built-up area, farmland, barren land, cultivable-wasteland and water bodies. The analysis revealed that, in Peshawar City the built-up environment has been doubled from 8.1 percent in 1991 to over 18.2 percent in 2014 by predominantly encroaching land producing food. Furthermore, the CA-Markov Model predicted that the area under impervious surfaces would continue to flourish during the next three decades. This rapid increase in built-up area is accredited to the lack of proper land use planning and management, which has caused chaotic urban sprawl with detrimental social and environmental consequences.Keywords: Urban Expansion, Land use, GIS, Remote Sensing, Markov Model, Peshawar City
Procedia PDF Downloads 18541934 Iot Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework
Authors: Femi Elegbeleye, Omobayo Esan, Muienge Mbodila, Patrick Bowe
Abstract:
This paper focused on cost effective storage architecture using fog and cloud data storage gateway and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. The several results obtained from this study on data privacy model shows that when two or more data privacy model is combined we tend to have a more stronger privacy to our data, and when fog storage gateway have several advantages over using the traditional cloud storage, from our result shows fog has reduced latency/delay, low bandwidth consumption, and energy usage when been compare with cloud storage, therefore, fog storage will help to lessen excessive cost. This paper dwelt more on the system descriptions, the researchers focused on the research design and framework design for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, its structure, and its interrelationships.Keywords: IoT, fog, cloud, data analysis, data privacy
Procedia PDF Downloads 9541933 Cloud Design for Storing Large Amount of Data
Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás
Abstract:
Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization
Procedia PDF Downloads 35141932 Data Integration with Geographic Information System Tools for Rural Environmental Monitoring
Authors: Tamas Jancso, Andrea Podor, Eva Nagyne Hajnal, Peter Udvardy, Gabor Nagy, Attila Varga, Meng Qingyan
Abstract:
The paper deals with the conditions and circumstances of integration of remotely sensed data for rural environmental monitoring purposes. The main task is to make decisions during the integration process when we have data sources with different resolution, location, spectral channels, and dimension. In order to have exact knowledge about the integration and data fusion possibilities, it is necessary to know the properties (metadata) that characterize the data. The paper explains the joining of these data sources using their attribute data through a sample project. The resulted product will be used for rural environmental analysis.Keywords: remote sensing, GIS, metadata, integration, environmental analysis
Procedia PDF Downloads 11841931 Modeling Floodplain Vegetation Response to Groundwater Variability Using ArcSWAT Hydrological Model, Moderate Resolution Imaging Spectroradiometer - Normalised Difference Vegetation Index Data, and Machine Learning
Authors: Newton Muhury, Armando A. Apan, Tek Maraseni
Abstract:
This study modelled the relationships between vegetation response and available water below the soil surface using the Terra’s Moderate Resolution Imaging Spectroradiometer (MODIS) generated Normalised Difference Vegetation Index (NDVI) and soil water content (SWC) data. The Soil & Water Assessment Tool (SWAT) interface known as ArcSWAT was used in ArcGIS for the groundwater analysis. The SWAT model was calibrated and validated in SWAT-CUP software using 10 years (2001-2010) of monthly streamflow data. The average Nash-Sutcliffe Efficiency during the calibration and validation was 0.54 and 0.51, respectively, indicating that the model performances were good. Twenty years (2001-2020) of monthly MODIS NDVI data for three different types of vegetation (forest, shrub, and grass) and soil water content for 43 sub-basins were analysed using the WEKA, machine learning tool with a selection of two supervised machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF). The modelling results show that different types of vegetation response and soil water content vary in the dry and wet season. For example, the model generated high positive relationships (r=0.76, 0.73, and 0.81) between the measured and predicted NDVI values of all vegetation in the study area against the groundwater flow (GW), soil water content (SWC), and the combination of these two variables, respectively, during the dry season. However, these relationships were reduced by 36.8% (r=0.48) and 13.6% (r=0.63) against GW and SWC, respectively, in the wet season. On the other hand, the model predicted a moderate positive relationship (r=0.63) between shrub vegetation type and soil water content during the dry season, which was reduced by 31.7% (r=0.43) during the wet season. Our models also predicted that vegetation in the top location (upper part) of the sub-basin is highly responsive to GW and SWC (r=0.78, and 0.70) during the dry season. The results of this study indicate the study region is suitable for seasonal crop production in dry season. Moreover, the results predicted that the growth of vegetation in the top-point location is highly dependent on groundwater flow in both dry and wet seasons, and any instability or long-term drought can negatively affect these floodplain vegetation communities. This study has enriched our knowledge of vegetation responses to groundwater in each season, which will facilitate better floodplain vegetation management.Keywords: ArcSWAT, machine learning, floodplain vegetation, MODIS NDVI, groundwater
Procedia PDF Downloads 11641930 Spatial Distribution of Land Use in the North Canal of Beijing Subsidiary Center and Its Impact on the Water Quality
Authors: Alisa Salimova, Jiane Zuo, Christopher Homer
Abstract:
The objective of this study is to analyse the North Canal riparian zone land use with the help of remote sensing analysis in ArcGis using 30 cloudless Landsat8 open-source satellite images from May to August of 2013 and 2017. Land cover, urban construction, heat island effect, vegetation cover, and water system change were chosen as the main parameters and further analysed to evaluate its impact on the North Canal water quality. The methodology involved the following steps: firstly, 30 cloudless satellite images were collected from the Landsat TM image open-source database. The visual interpretation method was used to determine different land types in a catchment area. After primary and secondary classification, 28 land cover types in total were classified. Visual interpretation method was used with the help ArcGIS for the grassland monitoring, US Landsat TM remote sensing image processing with a resolution of 30 meters was used to analyse the vegetation cover. The water system was analysed using the visual interpretation method on the GIS software platform to decode the target area, water use and coverage. Monthly measurements of water temperature, pH, BOD, COD, ammonia nitrogen, total nitrogen and total phosphorus in 2013 and 2017 were taken from three locations of the North Canal in Tongzhou district. These parameters were used for water quality index calculation and compared to land-use changes. The results of this research were promising. The vegetation coverage of North Canal riparian zone in 2017 was higher than the vegetation coverage in 2013. The surface brightness temperature value was positively correlated with the vegetation coverage density and the distance from the surface of the water bodies. This indicates that the vegetation coverage and water system have a great effect on temperature regulation and urban heat island effect. Surface temperature in 2017 was higher than in 2013, indicating a global warming effect. The water volume in the river area has been partially reduced, indicating the potential water scarcity risk in North Canal watershed. Between 2013 and 2017, urban residential, industrial and mining storage land areas significantly increased compared to other land use types; however, water quality has significantly improved in 2017 compared to 2013. This observation indicates that the Tongzhou Water Restoration Plan showed positive results and water management of Tongzhou district had been improved.Keywords: North Canal, land use, riparian vegetation, river ecology, remote sensing
Procedia PDF Downloads 10941929 Analysis of Genomics Big Data in Cloud Computing Using Fuzzy Logic
Authors: Mohammad Vahed, Ana Sadeghitohidi, Majid Vahed, Hiroki Takahashi
Abstract:
In the genomics field, the huge amounts of data have produced by the next-generation sequencers (NGS). Data volumes are very rapidly growing, as it is postulated that more than one billion bases will be produced per year in 2020. The growth rate of produced data is much faster than Moore's law in computer technology. This makes it more difficult to deal with genomics data, such as storing data, searching information, and finding the hidden information. It is required to develop the analysis platform for genomics big data. Cloud computing newly developed enables us to deal with big data more efficiently. Hadoop is one of the frameworks distributed computing and relies upon the core of a Big Data as a Service (BDaaS). Although many services have adopted this technology, e.g. amazon, there are a few applications in the biology field. Here, we propose a new algorithm to more efficiently deal with the genomics big data, e.g. sequencing data. Our algorithm consists of two parts: First is that BDaaS is applied for handling the data more efficiently. Second is that the hybrid method of MapReduce and Fuzzy logic is applied for data processing. This step can be parallelized in implementation. Our algorithm has great potential in computational analysis of genomics big data, e.g. de novo genome assembly and sequence similarity search. We will discuss our algorithm and its feasibility.Keywords: big data, fuzzy logic, MapReduce, Hadoop, cloud computing
Procedia PDF Downloads 29941928 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease
Authors: Usama Ahmed
Abstract:
Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.Keywords: data mining, classification, diabetes, WEKA
Procedia PDF Downloads 14541927 Estimation of Missing Values in Aggregate Level Spatial Data
Authors: Amitha Puranik, V. S. Binu, Seena Biju
Abstract:
Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis
Procedia PDF Downloads 38041926 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 18841925 Analysis and Forecasting of Bitcoin Price Using Exogenous Data
Authors: J-C. Leneveu, A. Chereau, L. Mansart, T. Mesbah, M. Wyka
Abstract:
Extracting and interpreting information from Big Data represent a stake for years to come in several sectors such as finance. Currently, numerous methods are used (such as Technical Analysis) to try to understand and to anticipate market behavior, with mixed results because it still seems impossible to exactly predict a financial trend. The increase of available data on Internet and their diversity represent a great opportunity for the financial world. Indeed, it is possible, along with these standard financial data, to focus on exogenous data to take into account more macroeconomic factors. Coupling the interpretation of these data with standard methods could allow obtaining more precise trend predictions. In this paper, in order to observe the influence of exogenous data price independent of other usual effects occurring in classical markets, behaviors of Bitcoin users are introduced in a model reconstituting Bitcoin value, which is elaborated and tested for prediction purposes.Keywords: big data, bitcoin, data mining, social network, financial trends, exogenous data, global economy, behavioral finance
Procedia PDF Downloads 35441924 Analysis of Cyber Activities of Potential Business Customers Using Neo4j Graph Databases
Authors: Suglo Tohari Luri
Abstract:
Data analysis is an important aspect of business performance. With the application of artificial intelligence within databases, selecting a suitable database engine for an application design is also very crucial for business data analysis. The application of business intelligence (BI) software into some relational databases such as Neo4j has proved highly effective in terms of customer data analysis. Yet what remains of great concern is the fact that not all business organizations have the neo4j business intelligence software applications to implement for customer data analysis. Further, those with the BI software lack personnel with the requisite expertise to use it effectively with the neo4j database. The purpose of this research is to demonstrate how the Neo4j program code alone can be applied for the analysis of e-commerce website customer visits. As the neo4j database engine is optimized for handling and managing data relationships with the capability of building high performance and scalable systems to handle connected data nodes, it will ensure that business owners who advertise their products at websites using neo4j as a database are able to determine the number of visitors so as to know which products are visited at routine intervals for the necessary decision making. It will also help in knowing the best customer segments in relation to specific goods so as to place more emphasis on their advertisement on the said websites.Keywords: data, engine, intelligence, customer, neo4j, database
Procedia PDF Downloads 19341923 Estimating the Life-Distribution Parameters of Weibull-Life PV Systems Utilizing Non-Parametric Analysis
Authors: Saleem Z. Ramadan
Abstract:
In this paper, a model is proposed to determine the life distribution parameters of the useful life region for the PV system utilizing a combination of non-parametric and linear regression analysis for the failure data of these systems. Results showed that this method is dependable for analyzing failure time data for such reliable systems when the data is scarce.Keywords: masking, bathtub model, reliability, non-parametric analysis, useful life
Procedia PDF Downloads 56041922 Study on Changes of Land Use impacting the Process of Urbanization, by Using Landsat Data in African Regions: A Case Study in Kigali, Rwanda
Authors: Delphine Mukaneza, Lin Qiao, Wang Pengxin, Li Yan, Chen Yingyi
Abstract:
Human activities on land use make the land-cover gradually change or transit. In this study, we examined the use of Landsat TM data to detect the land use change of Kigali between 1987 and 2009 using remote sensing techniques and analysis of data using ENVI and ArcGIS, a GIS software. Six different categories of land use were distinguished: bare soil, built up land, wetland, water, vegetation, and others. With remote sensing techniques, we analyzed land use data in 1987, 1999 and 2009, changed areas were found and a dynamic situation of land use in Kigali city was found during the 22 years studied. According to relevant Landsat data, the research focused on land use change in accordance with the role of remote sensing in the process of urbanization. The result of the work has shown the rapid increase of built up land between 1987 and 1999 and a big decrease of vegetation caused by the rebuild of the city after the 1994 genocide, while in the period of 1999 to 2009 there was a reduction in built up land and vegetation, after the authority of Kigali city established, a Master Plan where all constructions which were not in the range of the master Plan were destroyed. Rwanda's capital, Kigali City, through the expansion of the urban area, it is increasing the internal employment rate and attracts business investors and the service sector to improve their economy, which will increase the population growth and provide a better life. The overall planning of the city of Kigali considers the environment, land use, infrastructure, cultural and socio-economic factors, the economic development and population forecast, urban development, and constraints specification. To achieve the above purpose, the Government has set for the overall planning of city Kigali, different stages of the detailed description of the design, strategy and action plan that would guide Kigali planners and members of the public in the future to have more detailed regional plans and practical measures. Thus, land use change is significantly the performance of Kigali active human area, which plays an important role for the country to take certain decisions. Another area to take into account is the natural situation of Kigali city. Agriculture in the region does not occupy a dominant position, and with the population growth and socio-economic development, the construction area will gradually rise and speed up the process of urbanization. Thus, as a developing country, Rwanda's population continues to grow and there is low rate of utilization of land, where urbanization remains low. As mentioned earlier, the 1994 genocide massacres, population growth and urbanization processes, have been the factors driving the dramatic changes in land use. The focus on further research would be on analysis of Rwanda’s natural resources, social and economic factors that could be, the driving force of land use change.Keywords: land use change, urbanization, Kigali City, Landsat
Procedia PDF Downloads 30541921 The Extent of Big Data Analysis by the External Auditors
Authors: Iyad Ismail, Fathilatul Abdul Hamid
Abstract:
This research was mainly investigated to recognize the extent of big data analysis by external auditors. This paper adopts grounded theory as a framework for conducting a series of semi-structured interviews with eighteen external auditors. The research findings comprised the availability extent of big data and big data analysis usage by the external auditors in Palestine, Gaza Strip. Considering the study's outcomes leads to a series of auditing procedures in order to improve the external auditing techniques, which leads to high-quality audit process. Also, this research is crucial for auditing firms by giving an insight into the mechanisms of auditing firms to identify the most important strategies that help in achieving competitive audit quality. These results are aims to instruct the auditing academic and professional institutions in developing techniques for external auditors in order to the big data analysis. This paper provides appropriate information for the decision-making process and a source of future information which affects technological auditing.Keywords: big data analysis, external auditors, audit reliance, internal audit function
Procedia PDF Downloads 6841920 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining
Procedia PDF Downloads 34941919 Spatio-Temporal Risk Analysis of Cancer to Assessed Environmental Exposures in Coimbatore, India
Authors: Janani Selvaraj, M. Prashanthi Devi, P. B. Harathi
Abstract:
Epidemiologic studies conducted over several decades have provided evidence to suggest that long-term exposure to elevated ambient levels of particulate air pollution is associated with increased mortality. Air quality risk management is significant in developing countries and it highlights the need to understand the role of ecologic covariates in the association between air pollution and mortality. Several new methods show promise in exploring the geographical distribution of disease and the identification of high risk areas using epidemiological maps. However, the addition of the temporal attribute would further give us an in depth idea of the disease burden with respect to forecasting measures. In recent years, new methods developed in the reanalysis were useful for exploring the spatial structure of the data and the impact of spatial autocorrelation on estimates of risk associated with exposure to air pollution. Based on this, our present study aims to explore the spatial and temporal distribution of the lung cancer cases in the Coimbatore district of Tamil Nadu in relation to air pollution risk areas. A spatio temporal moving average method was computed using the CrimeStat software and visualized in ArcGIS 10.1 to document the spatio temporal movement of the disease in the study region. The random walk analysis performed showed the progress of the peak cancer incidences in the intersection regions of the Coimbatore North and South taluks that include major commercial and residential regions like Gandhipuram, Peelamedu, Ganapathy, etc. Our study shows evidence that daily exposure to high air pollutant concentration zones may lead to the risk of lung cancer. The observations from the present study will be useful in delineating high risk zones of environmental exposure that contribute to the increase of cancer among daily commuters. Through our study we suggest that spatially resolved exposure models in relevant time frames will produce higher risks zones rather than solely on statistical theory about the impact of measurement error and the empirical findings.Keywords: air pollution, cancer, spatio-temporal analysis, India
Procedia PDF Downloads 51241918 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement
Authors: Wang Lin, Li Zhiqiang
Abstract:
The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.Keywords: behavior pattern, cooperative learning, data analyze, k-means clustering algorithm
Procedia PDF Downloads 18641917 Big Data Analysis with Rhipe
Authors: Byung Ho Jung, Ji Eun Shin, Dong Hoon Lim
Abstract:
Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.Keywords: big data, Hadoop, Parallel regression analysis, R, Rhipe
Procedia PDF Downloads 49541916 What the Future Holds for Social Media Data Analysis
Authors: P. Wlodarczak, J. Soar, M. Ally
Abstract:
The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.Keywords: social media, text mining, knowledge discovery, predictive analysis, machine learning
Procedia PDF Downloads 422