Search results for: Census data

7354 Comparison of Automated Zone Design Census Output Areas with Existing Output Areas in South Africa

Authors: T. Mokhele, O. Mutanga, F. Ahmed

Abstract:

South Africa is one of the few countries that have stopped using the same Enumeration Areas (EAs) for census enumeration and dissemination. The advantage of this change is that confidentiality issue could be addressed for census dissemination as the design of geographic unit for collection is mainly to ensure that this unit is covered by one enumerator. The objective of this paper was to evaluate the performance of automated zone design output areas against non-zone design developed geographies using the 2001 census data, and 2011 census to some extent, as the main input. The comparison of the Automated Zone-design Tool (AZTool) census output areas with the Small Area Layers (SALs) and SubPlaces based on confidentiality limit, population distribution, and degree of homogeneity, as well as shape compactness, was undertaken. Further, SPSS was employed for validation of the AZTool output results. The results showed that AZTool developed output areas out-perform the existing official SAL and SubPlaces with regard to minimum population threshold, population distribution and to some extent to homogeneity. Therefore, it was concluded that AZTool program provides a new alternative to the creation of optimised census output areas for dissemination of population census data in South Africa.

Keywords: AZTool, enumeration areas, small areal layers, South Africa.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 699

7353 Identifying Neighborhoods at Potential Risk of Food Insecurity in Rural British Columbia

Authors: Amirmohsen Behjat, Aleck Ostry, Christina Miewald, Bernie Pauly

Abstract:

Substantial research has indicated that socioeconomic and demographic characteristics’ of neighborhoods are strong determinants of food security. The aim of this study was to develop a Food Insecurity Neighborhood Index (FINI) based on the associated socioeconomic and demographic variables to identify the areas at potential risk of food insecurity in rural British Columbia (BC). Principle Component Analysis (PCA) technique was used to calculate the FINI for each rural Dissemination Area (DA) using the food security determinant variables from Canadian Census data. Using ArcGIS, the neighborhoods with the top quartile FINI values were classified as food insecure. The results of this study indicated that the most food insecure neighborhood with the highest FINI value of 99.1 was in the Bulkley-Nechako (central BC) area whereas the lowest FINI with the value of 2.97 was for a rural neighborhood in the Cowichan Valley area. In total, 98.049 (19%) of the rural population of British Columbians reside in high food insecure areas. Moreover, the distribution of food insecure neighborhoods was found to be strongly dependent on the degree of rurality in BC. In conclusion, the cluster of food insecure neighbourhoods was more pronounced in Central Coast, Mount Wadington, Peace River, Kootenay Boundary, and the Alberni-Clayoqout Regional Districts.

Keywords: Neighbourhood food insecurity index, socioeconomic and demographic determinants, principal component analysis, Canada Census, ArcGIS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 813

7352 Assessment of Agricultural Land Use Land Cover, Land Surface Temperature and Population Changes Using Remote Sensing and GIS: Southwest Part of Marmara Sea, Turkey

Authors: Melis Inalpulat, Levent Genc

Abstract:

Land Use Land Cover (LULC) changes due to human activities and natural causes have become a major environmental concern. Assessment of temporal remote sensing data provides information about LULC impacts on environment. Land Surface Temperature (LST) is one of the important components for modeling environmental changes in climatological, hydrological, and agricultural studies. In this study, LULC changes (September 7, 1984 and July 8, 2014) especially in agricultural lands together with population changes (1985-2014) and LST status were investigated using remotely sensed and census data in South Marmara Watershed, Turkey. LULC changes were determined using Landsat TM and Landsat OLI data acquired in 1984 and 2014 summers. Six-band TM and OLI images were classified using supervised classification method to prepare LULC map including five classes including Forest (F), Grazing Land (G), Agricultural Land (A), Water Surface (W), Residential Area-Bare Soil (R-B) classes. The LST image was also derived from thermal bands of the same dates. LULC classification results showed that forest areas, agricultural lands, water surfaces and residential area-bare soils were increased as 65751 ha, 20163 ha, 1924 ha and 20462 ha respectively. In comparison, a dramatic decrement occurred in grazing land (107985 ha) within three decades. The population increased 29% between years 1984-2014 in whole study area. Along with the natural causes, migration also caused this increase since the study area has an important employment potential. LULC was transformed among the classes due to the expansion in residential, commercial and industrial areas as well as political decisions. In the study, results showed that agricultural lands around the settlement areas transformed to residential areas in 30 years. The LST images showed that mean temperatures were ranged between 26-32°C in 1984 and 27-33°C in 2014. Minimum temperature of agricultural lands was increased 3°C and reached to 23°C. In contrast, maximum temperature of A class decreased to 41°C from 44°C. Considering temperatures of the 2014 R-B class and 1984 status of same areas, it was seen that mean, min and max temperatures increased by 2°C. As a result, the dynamism of population, LULC and LST resulted in increasing mean and maximum surface temperatures, living spaces/industrial areas and agricultural lands.

Keywords: Census data, landsat, land surface temperature (LST), land use land cover (LULC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2074

7351 Creative Mapping Landuse and Human Activities: From the Inventories of Factories to the History of the City and Citizens

Authors: R. Tamborrino, F. Rinaudo

Abstract:

Digital technologies offer possibilities to effectively convert historical archives into instruments of knowledge able to provide a guide for the interpretation of historical phenomena. Digital conversion and management of those documents allow the possibility to add other sources in a unique and coherent model that permits the intersection of different data able to open new interpretations and understandings. Urban history uses, among other sources, the inventories that register human activities in a specific space (e.g. cadastres, censuses, etc.). The geographic localisation of that information inside cartographic supports allows for the comprehension and visualisation of specific relationships between different historical realities registering both the urban space and the peoples living there. These links that merge the different nature of data and documentation through a new organisation of the information can suggest a new interpretation of other related events. In all these kinds of analysis, the use of GIS platforms today represents the most appropriate answer. The design of the related databases is the key to realise the ad-hoc instrument to facilitate the analysis and the intersection of data of different origins. Moreover, GIS has become the digital platform where it is possible to add other kinds of data visualisation. This research deals with the industrial development of Turin at the beginning of the 20th century. A census of factories realized just prior to WWI provides the opportunity to test the potentialities of GIS platforms for the analysis of urban landscape modifications during the first industrial development of the town. The inventory includes data about location, activities, and people. GIS is shaped in a creative way linking different sources and digital systems aiming to create a new type of platform conceived as an interface integrating different kinds of data visualisation. The data processing allows linking this information to an urban space, and also visualising the growth of the city at that time. The sources, related to the urban landscape development in that period, are of a different nature. The emerging necessity to build, enlarge, modify and join different buildings to boost the industrial activities, according to their fast development, is recorded by different official permissions delivered by the municipality and now stored in the Historical Archive of the Municipality of Turin. Those documents, which are reports and drawings, contain numerous data on the buildings themselves, including the block where the plot is located, the district, and the people involved such as the owner, the investor, and the engineer or architect designing the industrial building. All these collected data offer the possibility to firstly re-build the process of change of the urban landscape by using GIS and 3D modelling technologies thanks to the access to the drawings (2D plans, sections and elevations) that show the previous and the planned situation. Furthermore, they access information for different queries of the linked dataset that could be useful for different research and targets such as economics, biographical, architectural, or demographical. By superimposing a layer of the present city, the past meets to the present-industrial heritage, and people meet urban history.

Keywords: Digital urban history, census, digitalisation, GIS, modelling, digital humanities.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1185

7350 Study on Guangzhou's Employment Subcentres and Polycentricity

Authors: L. Jiang

Abstract:

Since the late 1980s, the new phenomena of 'employment subcentres' or 'polycentricity' has appeared in the metropolises of North American and Western Europe and it has been an interesting topic for academics and researchers. This paper specifically uses one case study-Guangzhou to explore the development and the mechanism of employment subcentres and polycentricity in Chinese metropolises by spatial analysis method on the basis of the first economic census data. In conclusion, the paper regards that the employment subcentres and polycentricity has existed in Chinese metropolises. And that, the mechanism of them is mainly from the secondary industry instead of the tertiary industry in North American and Western Europe

Keywords: Employment Subcentre, Polycentricity, Guangzhou.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1764

7349 Transportation and Physical Development around Kumasi, Ghana

Authors: Justice K. Owusu-Ansah, Kevin O'Connor

Abstract:

This research explores the links between physical development and transportation infrastructure around Kumasi, Ghana. It utilizes census data as well as fieldwork and interviews carried out during July and December 2005. The results suggest that there is a weak association between transportation investments and physical development, and that recent housing has generally occurred in poorly accessible locations. Road investments have generally followed physical expansion rather than the reverse. Hence policies designed to manage the fast growth now occurring around Ghanaian cities should not focus exclusively on improving transportation infrastructure but also strengthening the underlying the traditional land management structures and the official land administrative institutions that operate within those structures.

Keywords: Housing, Kumasi, population, physical development, transportation, villages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2105

7348 Socio-Economic Characteristics of Tribal Areas in KwaZulu-Natal, South Africa

Authors: Carilette Fourie, Chris Cloete

Abstract:

The occurrence of traditional authorities and tribal land within South Africa results in unique developmental trends and challenges. Tribal communities, typically located in rural environments, are perceived to be severely affected by poverty and poor living conditions relative to their urban counterparts. The exact extent of the socio-economic disparity between tribal and non-tribal communities is addressed in this paper. After adjustment of available census data to correspond with the delineation of tribal and non-tribal land in the Kwazulu-Natal province, seven selected socio-economic indicators were compared. The investigation revealed that although tribal areas are characterised by low employment rates and educational levels, a young population, fairly large household sizes, lower access to basic services and lower income households that are highly dependent on social grants, tribal area populations do have moderate levels of education, access to formal housing and relatively good access to services.

Keywords: KwaZulu-Natal, tribal areas, traditional authority, socio-economic, well-being.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 322

7347 Mapping Crime against Women in India: Spatio-Temporal Analysis, 2001-2012

Authors: Ritvik Chauhan, Vijay Kumar Baraik

Abstract:

Women are most vulnerable to crime despite occupying central position in shaping a society as the first teacher of children. In India too, having equal rights and constitutional safeguards, the incidences of crime against them are large and grave. In this context of crime against women, especially rape has been increasing over time. This paper explores the spatial and temporal aspects of crime against women in India with special reference to rape. It also examines the crime against women with its spatial, socio-economic and demographic associates using related data obtained from the National Crime Records Bureau India, Indian Census and other government sources of the Government of India. The simple statistical, choropleth mapping and other cartographic representation methods have been used to see the crime rates, spatio-temporal patterns of crime, and association of crime with its correlates. The major findings are visible spatial variations across the country and are also in the rising trends in terms of incidence and rates over the reference period. The study also indicates that the geographical associations are somewhat observed. However, selected indicators of socio-economic factors seem to have no significant bearing on crime against women at this level.

Keywords: Crime against women, crime mapping, trend analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641

7346 Big Data: Big Challenges to Privacy and Data Protection

Authors: Abu Bakar Munir, Siti Hajar Mohd Yasin, Firdaus Muhammad-Sukki

Abstract:

This paper seeks to analyse the benefits of big data and more importantly the challenges it pose to the subject of privacy and data protection. First, the nature of big data will be briefly deliberated before presenting the potential of big data in the present days. Afterwards, the issue of privacy and data protection is highlighted before discussing the challenges of implementing this issue in big data. In conclusion, the paper will put forward the debate on the adequacy of the existing legal framework in protecting personal data in the era of big data.

Keywords: Big data, data protection, information, privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3857

7345 Strategic Investment in Infrastructure Development to Facilitate Economic Growth in the United States

Authors: Arkaprabha Bhattacharyya, Makarand Hastak

Abstract:

The COVID-19 pandemic is unprecedented in terms of its global reach and economic impacts. Historically, investment in infrastructure development projects has been touted to boost the economic growth of a nation. The State and Local governments responsible for delivering infrastructure assets work under tight budgets. Therefore, it is important to understand which infrastructure projects have the highest potential of boosting economic growth in the post-pandemic era. This paper presents relationships between infrastructure projects and economic growth. Statistical relationships between investment in different types of infrastructure projects (transit, water and wastewater, highways, power, manufacturing etc.) and indicators of economic growth are presented using historic data between 2002 and 2020 from the U.S. Census Bureau and U.S. Bureau of Economic Analysis (BEA). The outcome of the paper is the comparison of statistical correlations between investment in different types of infrastructure projects and indicators of economic growth. The comparison of the statistical correlations is useful in ranking the types of infrastructure projects based on their ability to influence economic prosperity. Therefore, investment in the infrastructures with the higher rank will have a better chance of boosting the economic growth. Once, the ranks are derived, they can be used by the decision-makers in infrastructure investment related decision-making process.

Keywords: Economic growth, infrastructure development, infrastructure projects, strategic investment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 588

7344 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5926

7343 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4815

7342 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2561

7341 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1515

7340 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2418

7339 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3727

7338 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1256

7337 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1594

7336 The Proposal of a Shared Mobility City Index to Support Investment Decision Making for Carsharing

Authors: S. Murr, S. Phillips

Abstract:

One of the biggest challenges entering a market with a carsharing or any other shared mobility (SM) service is sound investment decision-making. To support this process, the authors think that a city index evaluating different criteria is necessary. The goal of such an index is to benchmark cities along a set of external measures to answer the main two challenges: financially viability and the understanding of its specific requirements. The authors have consulted several shared mobility projects and industry experts to create such a Shared Mobility City Index (SMCI). The current proposal of the SMCI consists of 11 individual index measures: general data (demographics, geography, climate and city culture), shared mobility landscape (current SM providers, public transit options, commuting patterns and driving culture) and political vision and goals (vision of the Mayor, sustainability plan, bylaws/tenders supporting SM). To evaluate the suitability of the index, 16 cities on the East Coast of North America were selected and secondary research was conducted. The main sources of this study were census data, organisational records, independent press releases and informational websites. Only non-academic sources where used because the relevant data for the chosen cities is not published in academia. Applying the index measures to the selected cities resulted in three major findings. Firstly, density (city area divided by number of inhabitants) is not an indicator for the number of SM services offered: the city with the lowest density has five bike and carsharing options. Secondly, there is a direct correlation between commuting patterns and how many shared mobility services are offered. New York, Toronto and Washington DC have the highest public transit ridership and the most shared mobility providers. Lastly, except one, all surveyed cities support shared mobility with their sustainability plan. The current version of the shared mobility index is proving a practical tool to evaluate cities, and to understand functional, political, social and environmental considerations. More cities will have to be evaluated to refine the criteria further. However, the current version of the index can be used to assess cities on their suitability for shared mobility services and will assist investors deciding which city is a financially viable market.

Keywords: Carsharing, transportation, urban planning, shared mobility city index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2265

7335 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958

7334 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1987

7333 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2725

7332 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589

7331 Rail Corridors between Minimal Use of Train and Unsystematic Tightening of Population: A Methodological Essay

Authors: A. Benaiche

Abstract:

In the current situation, the automobile has become the main means of locomotion. It allows traveling long distances, encouraging urban sprawl. To counteract this trend, the train is often proposed as an alternative to the car. Simultaneously, the favoring of urban development around public transport nodes such as railway stations is one of the main issues of the coordination between urban planning and transportation and the keystone of the sustainable urban development implementation. In this context, this paper focuses on the study of the spatial structuring dynamics around the railway. Specifically, it is a question of studying the demographic dynamics in rail corridors of Nantes, Angers and Le Mans (Western France) basing on the radiation of railway stations. Consequently, the methodology is concentrated on the knowledge of demographic weight and gains of these corridors, the index of urban intensity and the mobility behaviors (workers’ travels, scholars' travels, modal practices of travels). The perimeter considered to define the rail corridors includes the communes of urban area which have a railway station and communes with an access time to the railway station is less than fifteen minutes by car (time specified by the Regional Transport Scheme of Travelers). The main tools used are the statistical data from the census of population, the basis of detailed tables and databases on mobility flows. The study reveals that the population is not tightened along rail corridors and train use is minimal despite the presence of a nearby railway station. These results lead to propose guidelines to make the train, a real vector of mobility across the rail corridors.

Keywords: Coordination between urban planning and transportation, Rail corridors, Railway stations, Travels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1089

7330 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1166

7329 Steganalysis of Data Hiding via Halftoning and Coordinate Projection

Authors: Woong Hee Kim, Ilhwan Park

Abstract:

Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.

Keywords: Steganography, steganalysis, digital halftoning, data hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551

7328 Biological Data Integration using SOA

Authors: Noura Meshaan Al-Otaibi, Amin Yousef Noaman

Abstract:

Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. This research suggests the use of Service Oriented Architecture (SOA) to integrate biological data from different data sources. This work shows SOA will solve the problems that facing integration process and if the biologist scientists can access the biological data in easier way. There are several methods to implement SOA but web service is the most popular method. The Microsoft .Net Framework used to implement proposed architecture.

Keywords: Bioinformatics, Biological data, Data Integration, SOA and Web Services.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2416

7327 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2557

7326 Female Work Force Participation and Women Empowerment in Haryana

Authors: Dinabandhu Mahata, Amit Kumar, Ambarish Kumar Rai

Abstract:

India is known as a country of diversity regarding the social, cultural and wide geographical variations. In the north and north-west part of the country, the strong patriarchal norms and the male dominance based social structure are the important constructs. Patriarchal social setup adversely affects the women’s social and economic wellbeing and hence in that social structure women are considered as second level citizen. Work participation rate of women has directly linked to the development of society or household. Haryana is one of the developed states of India, still being ahead in economic prosperity, much lagged behind in gender-based equality and male dominance in all dimensions of life. The position of women in the Haryana is no better than the other states of India. Haryana state has the great difference among the male-female sex ratio which is a serious concern for social science research as a demographic problem for the state. Now women are requiring for their holistic empowerment and that will take care of them for an enabling process that must lead to their economic as well as social transformation. Hence, the objective of the paper is to address the role of sex ratio, women literacy and her work participation in the process of their empowerment with special attention to the gender perspective. The study used the data from Census of India from 1991 to 2011. This paper will examine the regional disparity of sex ratio, literacy rate and female work participation and the improvement of empowerment of women in the state of Haryana. This paper will suggest the idea for focusing much intensively on the issues of women empowerment through enhancement of her education, workforce participation and social participation with people participation and holistic approach.

Keywords: Sex ratio, literacy rate, workforce participation rate, women empowerment, Haryana.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2568

7325 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: Communication, computer network, data collection, probe.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743