Search results for: VIIRS data
25253 Prediction of Embankment Fires at Railway Infrastructure Using Machine Learning, Geospatial Data and VIIRS Remote Sensing Imagery
Authors: Jan-Peter Mund, Christian Kind
Abstract:
In view of the ongoing climate change and global warming, fires along railways in Germany are occurring more frequently, with sometimes massive consequences for railway operations and affected railroad infrastructure. In the absence of systematic studies within the infrastructure network of German Rail, little is known about the causes of such embankment fires. Since a further increase in these hazards is to be expected in the near future, there is a need for a sound knowledge of triggers and drivers for embankment fires as well as methodical knowledge of prediction tools. Two predictable future trends speak for the increasing relevance of the topic: through the intensification of the use of rail for passenger and freight transport (e.g..: doubling of annual passenger numbers by 2030, compared to 2019), there will be more rail traffic and also more maintenance and construction work on the railways. This research project approach uses satellite data to identify historical embankment fires along rail network infrastructure. The team links data from these fires with infrastructure and weather data and trains a machine-learning model with the aim of predicting fire hazards on sections of the track. Companies reflect on the results and use them on a pilot basis in precautionary measures.Keywords: embankment fires, railway maintenance, machine learning, remote sensing, VIIRS data
Procedia PDF Downloads 8925252 Urbanization and Income Inequality in Thailand
Authors: Acumsiri Tantikarnpanit
Abstract:
This paper aims to examine the relationship between urbanization and income inequality in Thailand during the period 2002–2020. Using a panel of data for 76 provinces collected from Thailand’s National Statistical Office (Labor Force Survey: LFS), as well as geospatial data from the U.S. Air Force Defense Meteorological Satellite Program (DMSP) and the Visible Infrared Imaging Radiometer Suite Day/Night band (VIIRS-DNB) satellite for nineteen selected years. This paper employs two different definitions to identify urban areas: 1) Urban areas defined by Thailand's National Statistical Office (Labor Force Survey: LFS), and 2) Urban areas estimated using nighttime light data from the DMSP and VIIRS-DNB satellite. The second method includes two sub-categories: 2.1) Determining urban areas by calculating nighttime light density with a population density of 300 people per square kilometer, and 2.2) Calculating urban areas based on nighttime light density corresponding to a population density of 1,500 people per square kilometer. The empirical analysis based on Ordinary Least Squares (OLS), fixed effects, and random effects models reveals a consistent U-shaped relationship between income inequality and urbanization. The findings from the econometric analysis demonstrate that urbanization or population density has a significant and negative impact on income inequality. Moreover, the square of urbanization shows a statistically significant positive impact on income inequality. Additionally, there is a negative association between logarithmically transformed income and income inequality. This paper also proposes the inclusion of satellite imagery, geospatial data, and spatial econometric techniques in future studies to conduct quantitative analysis of spatial relationships.Keywords: income inequality, nighttime light, population density, Thailand, urbanization
Procedia PDF Downloads 7725251 Artificial Neural Network Approach for Vessel Detection Using Visible Infrared Imaging Radiometer Suite Day/Night Band
Authors: Takashi Yamaguchi, Ichio Asanuma, Jong G. Park, Kenneth J. Mackin, John Mittleman
Abstract:
In this paper, vessel detection using the artificial neural network is proposed in order to automatically construct the vessel detection model from the satellite imagery of day/night band (DNB) in visible infrared in the products of Imaging Radiometer Suite (VIIRS) on Suomi National Polar-orbiting Partnership (Suomi-NPP).The goal of our research is the establishment of vessel detection method using the satellite imagery of DNB in order to monitor the change of vessel activity over the wide region. The temporal vessel monitoring is very important to detect the events and understand the circumstances within the maritime environment. For the vessel locating and detection techniques, Automatic Identification System (AIS) and remote sensing using Synthetic aperture radar (SAR) imagery have been researched. However, each data has some lack of information due to uncertain operation or limitation of continuous observation. Therefore, the fusion of effective data and methods is important to monitor the maritime environment for the future. DNB is one of the effective data to detect the small vessels such as fishery ships that is difficult to observe in AIS. DNB is the satellite sensor data of VIIRS on Suomi-NPP. In contrast to SAR images, DNB images are moderate resolution and gave influence to the cloud but can observe the same regions in each day. DNB sensor can observe the lights produced from various artifact such as vehicles and buildings in the night and can detect the small vessels from the fishing light on the open water. However, the modeling of vessel detection using DNB is very difficult since complex atmosphere and lunar condition should be considered due to the strong influence of lunar reflection from cloud on DNB. Therefore, artificial neural network was applied to learn the vessel detection model. For the feature of vessel detection, Brightness Temperature at the 3.7 μm (BT3.7) was additionally used because BT3.7 can be used for the parameter of atmospheric conditions.Keywords: artificial neural network, day/night band, remote sensing, Suomi National Polar-orbiting Partnership, vessel detection, Visible Infrared Imaging Radiometer Suite
Procedia PDF Downloads 23625250 Research Analysis of Urban Area Expansion Based on Remote Sensing
Authors: Sheheryar Khan, Weidong Li, Fanqian Meng
Abstract:
The Urban Heat Island (UHI) effect is one of the foremost problems out of other ecological and socioeconomic issues in urbanization. Due to this phenomenon that human-made urban areas have replaced the rural landscape with the surface that increases thermal conductivity and urban warmth; as a result, the temperature in the city is higher than in the surrounding rural areas. To affect the evidence of this phenomenon in the Zhengzhou city area, an observation of the temperature variations in the urban area is done through a scientific method that has been followed. Landsat 8 satellite images were taken from 2013 to 2015 to calculate the effect of Urban Heat Island (UHI) along with the NPP-VRRIS night-time remote sensing data to analyze the result for a better understanding of the center of the built-up area. To further support the evidence, the correlation between land surface temperatures and the normalized difference vegetation index (NDVI) was calculated using the Red band 4 and Near-infrared band 5 of the Landsat 8 data. Mono-window algorithm was applied to retrieve the land surface temperature (LST) distribution from the Landsat 8 data using Band 10 and 11 accordingly to convert the top-of-atmosphere radiance (TOA) and to convert the satellite brightness temperature. Along with Landsat 8 data, NPP-VIIRS night-light data is preprocessed to get the research area data. The analysis between Landsat 8 data and NPP night-light data was taken to compare the output center of the Built-up area of Zhengzhou city.Keywords: built-up area, land surface temperature, mono-window algorithm, NDVI, remote sensing, threshold method, Zhengzhou
Procedia PDF Downloads 13925249 Multi-Temporal Mapping of Built-up Areas Using Daytime and Nighttime Satellite Images Based on Google Earth Engine Platform
Authors: S. Hutasavi, D. Chen
Abstract:
The built-up area is a significant proxy to measure regional economic growth and reflects the Gross Provincial Product (GPP). However, an up-to-date and reliable database of built-up areas is not always available, especially in developing countries. The cloud-based geospatial analysis platform such as Google Earth Engine (GEE) provides an opportunity with accessibility and computational power for those countries to generate the built-up data. Therefore, this study aims to extract the built-up areas in Eastern Economic Corridor (EEC), Thailand using day and nighttime satellite imagery based on GEE facilities. The normalized indices were generated from Landsat 8 surface reflectance dataset, including Normalized Difference Built-up Index (NDBI), Built-up Index (BUI), and Modified Built-up Index (MBUI). These indices were applied to identify built-up areas in EEC. The result shows that MBUI performs better than BUI and NDBI, with the highest accuracy of 0.85 and Kappa of 0.82. Moreover, the overall accuracy of classification was improved from 79% to 90%, and error of total built-up area was decreased from 29% to 0.7%, after night-time light data from the Visible and Infrared Imaging Suite (VIIRS) Day Night Band (DNB). The results suggest that MBUI with night-time light imagery is appropriate for built-up area extraction and be utilize for further study of socioeconomic impacts of regional development policy over the EEC region.Keywords: built-up area extraction, google earth engine, adaptive thresholding method, rapid mapping
Procedia PDF Downloads 12625248 Detection of Temporal Change of Fishery and Island Activities by DNB and SAR on the South China Sea
Authors: I. Asanuma, T. Yamaguchi, J. Park, K. J. Mackin
Abstract:
Fishery lights on the surface could be detected by the Day and Night Band (DNB) of the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi National Polar-orbiting Partnership (Suomi-NPP). The DNB covers the spectral range of 500 to 900 nm and realized a higher sensitivity. The DNB has a difficulty of identification of fishing lights from lunar lights reflected by clouds, which affects observations for the half of the month. Fishery lights and lights of the surface are identified from lunar lights reflected by clouds by a method using the DNB and the infrared band, where the detection limits are defined as a function of the brightness temperature with a difference from the maximum temperature for each level of DNB radiance and with the contrast of DNB radiance against the background radiance. Fishery boats or structures on islands could be detected by the Synthetic Aperture Radar (SAR) on the polar orbit satellites using the reflected microwave by the surface reflecting targets. The SAR has a difficulty of tradeoff between spatial resolution and coverage while detecting the small targets like fishery boats. A distribution of fishery boats and island activities were detected by the scan-SAR narrow mode of Radarsat-2, which covers 300 km by 300 km with various combinations of polarizations. The fishing boats were detected as a single pixel of highly scattering targets with the scan-SAR narrow mode of which spatial resolution is 30 m. As the look angle dependent scattering signals exhibits the significant differences, the standard deviations of scattered signals for each look angles were taken into account as a threshold to identify the signal from fishing boats and structures on the island from background noise. It was difficult to validate the detected targets by DNB with SAR data because of time lag of observations for 6 hours between midnight by DNB and morning or evening by SAR. The temporal changes of island activities were detected as a change of mean intensity of DNB for circular area for a certain scale of activities. The increase of DNB mean intensity was corresponding to the beginning of dredging and the change of intensity indicated the ending of reclamation and following constructions of facilities.Keywords: day night band, SAR, fishery, South China Sea
Procedia PDF Downloads 23525247 Data Transformations in Data Envelopment Analysis
Authors: Mansour Mohammadpour
Abstract:
Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.Keywords: data transformation, data envelopment analysis, undesirable data, negative data
Procedia PDF Downloads 2425246 Processing Big Data: An Approach Using Feature Selection
Authors: Nikat Parveen, M. Ananthi
Abstract:
Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.Keywords: big data, key value, feature selection, retrieval, performance
Procedia PDF Downloads 34225245 Applications of Big Data in Education
Authors: Faisal Kalota
Abstract:
Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.Keywords: big data, learning analytics, analytics, big data in education, Hadoop
Procedia PDF Downloads 42725244 Analysis of Big Data
Authors: Sandeep Sharma, Sarabjit Singh
Abstract:
As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.Keywords: big data, unstructured data, volume, variety, velocity
Procedia PDF Downloads 54925243 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, WangQun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.Keywords: data cleaning, dependency rules, violation data discovery, data repair
Procedia PDF Downloads 56425242 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity
Authors: Hoda A. Abdel Hafez
Abstract:
Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.Keywords: mining big data, big data, machine learning, telecommunication
Procedia PDF Downloads 41025241 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach
Authors: Theertha Chandroth
Abstract:
This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.Keywords: XML, JSON, data comparison, integration testing, Python, SQL
Procedia PDF Downloads 14025240 Using Machine Learning Techniques to Extract Useful Information from Dark Data
Authors: Nigar Hussain
Abstract:
It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.Keywords: big data, dark data, machine learning, heatmap, random forest
Procedia PDF Downloads 3125239 Multi-Source Data Fusion for Urban Comprehensive Management
Authors: Bolin Hua
Abstract:
In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data
Procedia PDF Downloads 39425238 Reviewing Privacy Preserving Distributed Data Mining
Authors: Sajjad Baghernezhad, Saeideh Baghernezhad
Abstract:
Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.Keywords: data mining, distributed data mining, privacy protection, privacy preserving
Procedia PDF Downloads 52625237 The Right to Data Portability and Its Influence on the Development of Digital Services
Authors: Roman Bieda
Abstract:
The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.Keywords: data portability, digital market, GDPR, personal data
Procedia PDF Downloads 47625236 Recent Advances in Data Warehouse
Authors: Fahad Hanash Alzahrani
Abstract:
This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing
Procedia PDF Downloads 40425235 How to Use Big Data in Logistics Issues
Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy
Abstract:
Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.Keywords: big data, logistics, operational efficiency, risk management
Procedia PDF Downloads 64225234 Implementation of an IoT Sensor Data Collection and Analysis Library
Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee
Abstract:
Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data
Procedia PDF Downloads 37825233 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles
Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis
Abstract:
Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review
Procedia PDF Downloads 16325232 Government Big Data Ecosystem: A Systematic Literature Review
Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis
Abstract:
Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review
Procedia PDF Downloads 23125231 A Machine Learning Decision Support Framework for Industrial Engineering Purposes
Authors: Anli Du Preez, James Bekker
Abstract:
Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.Keywords: Data analytics, Industrial engineering, Machine learning, Value creation
Procedia PDF Downloads 16825230 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm
Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima
Abstract:
In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.Keywords: cloud space, AES, FTP, NetBeans IDE
Procedia PDF Downloads 20625229 Business Intelligence for Profiling of Telecommunication Customer
Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro
Abstract:
Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.Keywords: business intelligence, customer segmentation, data warehouse, data mining
Procedia PDF Downloads 48525228 Imputation Technique for Feature Selection in Microarray Data Set
Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam
Abstract:
Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.Keywords: DNA microarray, feature selection, missing data, bioinformatics
Procedia PDF Downloads 57425227 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework
Authors: Lutful Karim, Mohammed S. Al-kahtani
Abstract:
Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.Keywords: big data, clustering, tree topology, data aggregation, sensor networks
Procedia PDF Downloads 34725226 Control the Flow of Big Data
Authors: Shizra Waris, Saleem Akhtar
Abstract:
Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.Keywords: computer, it community, industry, big data
Procedia PDF Downloads 19425225 High Performance Computing and Big Data Analytics
Authors: Branci Sarra, Branci Saadia
Abstract:
Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.Keywords: high performance computing, HPC, big data, data analysis
Procedia PDF Downloads 52025224 Predicting OpenStreetMap Coverage by Means of Remote Sensing: The Case of Haiti
Authors: Ran Goldblatt, Nicholas Jones, Jennifer Mannix, Brad Bottoms
Abstract:
Accurate, complete, and up-to-date geospatial information is the foundation of successful disaster management. When the 2010 Haiti Earthquake struck, accurate and timely information on the distribution of critical infrastructure was essential for the disaster response community for effective search and rescue operations. Existing geospatial datasets such as Google Maps did not have comprehensive coverage of these features. In the days following the earthquake, many organizations released high-resolution satellite imagery, catalyzing a worldwide effort to map Haiti and support the recovery operations. Of these organizations, OpenStreetMap (OSM), a collaborative project to create a free editable map of the world, used the imagery to support volunteers to digitize roads, buildings, and other features, creating the most detailed map of Haiti in existence in just a few weeks. However, large portions of the island are still not fully covered by OSM. There is an increasing need for a tool to automatically identify which areas in Haiti, as well as in other countries vulnerable to disasters, that are not fully mapped. The objective of this project is to leverage different types of remote sensing measurements, together with machine learning approaches, in order to identify geographical areas where OSM coverage of building footprints is incomplete. Several remote sensing measures and derived products were assessed as potential predictors of OSM building footprints coverage, including: intensity of light emitted at night (based on VIIRS measurements), spectral indices derived from Sentinel-2 satellite (normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), soil-adjusted vegetation index (SAVI), urban index (UI)), surface texture (based on Sentinel-1 SAR measurements)), elevation and slope. Additional remote sensing derived products, such as Hansen Global Forest Change, DLR`s Global Urban Footprint (GUF), and World Settlement Footprint (WSF), were also evaluated as predictors, as well as OSM street and road network (including junctions). Using a supervised classification with a random forest classifier resulted in the prediction of 89% of the variation of OSM building footprint area in a given cell. These predictions allowed for the identification of cells that are predicted to be covered but are actually not mapped yet. With these results, this methodology could be adapted to any location to assist with preparing for future disastrous events and assure that essential geospatial information is available to support the response and recovery efforts during and following major disasters.Keywords: disaster management, Haiti, machine learning, OpenStreetMap, remote sensing
Procedia PDF Downloads 125