Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 44

Search results for: big data

44 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 82
43 Microclimate Variations in Rio de Janeiro Related to Massive Public Transportation

Authors: Marco E. O. Jardim, Frederico A. M. Souza, Valeria M. Bastos, Myrian C. A. Costa, Nelson F. F. Ebecken

Abstract:

Urban public transportation in Rio de Janeiro is based on bus lines, powered by diesel, and four limited metro lines that support only some neighborhoods. This work presents an infrastructure built to better understand microclimate variations related to massive urban transportation in some specific areas of the city. The use of sensor nodes with small analytics capacity provides environmental information to population or public services. The analyses of data collected from a few small sensors positioned near some heavy traffic streets show the harmful impact due to poor bus route plan.

Keywords: Big data, IoT, public transportation, public health system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166
42 Unstructured-Data Content Search Based on Optimized EEG Signal Processing and Multi-Objective Feature Extraction

Authors: Qais M. Yousef, Yasmeen A. Alshaer

Abstract:

Over the last few years, the amount of data available on the globe has been increased rapidly. This came up with the emergence of recent concepts, such as the big data and the Internet of Things, which have furnished a suitable solution for the availability of data all over the world. However, managing this massive amount of data remains a challenge due to their large verity of types and distribution. Therefore, locating the required file particularly from the first trial turned to be a not easy task, due to the large similarities of names for different files distributed on the web. Consequently, the accuracy and speed of search have been negatively affected. This work presents a method using Electroencephalography signals to locate the files based on their contents. Giving the concept of natural mind waves processing, this work analyses the mind wave signals of different people, analyzing them and extracting their most appropriate features using multi-objective metaheuristic algorithm, and then classifying them using artificial neural network to distinguish among files with similar names. The aim of this work is to provide the ability to find the files based on their contents using human thoughts only. Implementing this approach and testing it on real people proved its ability to find the desired files accurately within noticeably shorter time and retrieve them as a first choice for the user.

Keywords: Artificial intelligence, data contents search, human active memory, mind wave, multi-objective optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 326
41 Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

Authors: Sana Hamdi, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.

Keywords: Real-Time Spatial Big Data, Quality Of Service, Vertical partitioning, Horizontal partitioning, Matching algorithm, Hamming distance, Stream query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408
40 Knowledge Representation and Inconsistency Reasoning of Class Diagram Maintenance in Big Data

Authors: Chi-Lun Liu

Abstract:

Requirements modeling and analysis are important in successful information systems' maintenance. Unified Modeling Language (UML) class diagrams are useful standards for modeling information systems. To our best knowledge, there is a lack of a systems development methodology described by the organism metaphor. The core concept of this metaphor is adaptation. Using the knowledge representation and reasoning approach and ontologies to adopt new requirements are emergent in recent years. This paper proposes an organic methodology which is based on constructivism theory. This methodology is a knowledge representation and reasoning approach to analyze new requirements in the class diagrams maintenance. The process and rules in the proposed methodology automatically analyze inconsistencies in the class diagram. In the big data era, developing an automatic tool based on the proposed methodology to analyze large amounts of class diagram data is an important research topic in the future.

Keywords: Knowledge representation, reasoning, ontology, class diagram, software engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 373
39 A Web and Cloud-Based Measurement System Analysis Tool for the Automotive Industry

Authors: C. A. Barros, Ana P. Barroso

Abstract:

Any industrial company needs to determine the amount of variation that exists within its measurement process and guarantee the reliability of their data, studying the performance of their measurement system, in terms of linearity, bias, repeatability and reproducibility and stability. This issue is critical for automotive industry suppliers, who are required to be certified by the 16949:2016 standard (replaces the ISO/TS 16949) of International Automotive Task Force, defining the requirements of a quality management system for companies in the automotive industry. Measurement System Analysis (MSA) is one of the mandatory tools. Frequently, the measurement system in companies is not connected to the equipment and do not incorporate the methods proposed by the Automotive Industry Action Group (AIAG). To address these constraints, an R&D project is in progress, whose objective is to develop a web and cloud-based MSA tool. This MSA tool incorporates Industry 4.0 concepts, such as, Internet of Things (IoT) protocols to assure the connection with the measuring equipment, cloud computing, artificial intelligence, statistical tools, and advanced mathematical algorithms. This paper presents the preliminary findings of the project. The web and cloud-based MSA tool is innovative because it implements all statistical tests proposed in the MSA-4 reference manual from AIAG as well as other emerging methods and techniques. As it is integrated with the measuring devices, it reduces the manual input of data and therefore the errors. The tool ensures traceability of all performed tests and can be used in quality laboratories and in the production lines. Besides, it monitors MSAs over time, allowing both the analysis of deviations from the variation of the measurements performed and the management of measurement equipment and calibrations. To develop the MSA tool a ten-step approach was implemented. Firstly, it was performed a benchmarking analysis of the current competitors and commercial solutions linked to MSA, concerning Industry 4.0 paradigm. Next, an analysis of the size of the target market for the MSA tool was done. Afterwards, data flow and traceability requirements were analysed in order to implement an IoT data network that interconnects with the equipment, preferably via wireless. The MSA web solution was designed under UI/UX principles and an API in python language was developed to perform the algorithms and the statistical analysis. Continuous validation of the tool by companies is being performed to assure real time management of the ‘big data’. The main results of this R&D project are: MSA Tool, web and cloud-based; Python API; New Algorithms to the market; and Style Guide of UI/UX of the tool. The MSA tool proposed adds value to the state of the art as it ensures an effective response to the new challenges of measurement systems, which are increasingly critical in production processes. Although the automotive industry has triggered the development of this innovative MSA tool, other industries would also benefit from it. Currently, companies from molds and plastics, chemical and food industry are already validating it.

Keywords: Automotive industry, Industry 4.0, internet of things, IATF 16949:2016, measurement system analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 400
38 A Discrete Event Simulation Model to Manage Bed Usage for Non-Elective Admissions in a Geriatric Medicine Speciality

Authors: Muhammed Ordu, Eren Demir, Chris Tofallis

Abstract:

Over the past decade, the non-elective admissions in the UK have increased significantly. Taking into account limited resources (i.e. beds), the related service managers are obliged to manage their resources effectively due to the non-elective admissions which are mostly admitted to inpatient specialities via A&E departments. Geriatric medicine is one of specialities that have long length of stay for the non-elective admissions. This study aims to develop a discrete event simulation model to understand how possible increases on non-elective demand over the next 12 months affect the bed occupancy rate and to determine required number of beds in a geriatric medicine speciality in a UK hospital. In our validated simulation model, we take into account observed frequency distributions which are derived from a big data covering the period April, 2009 to January, 2013, for the non-elective admission and the length of stay. An experimental analysis, which consists of 16 experiments, is carried out to better understand possible effects of case studies and scenarios related to increase on demand and number of bed. As a result, the speciality does not achieve the target level in the base model although the bed occupancy rate decreases from 125.94% to 96.41% by increasing the number of beds by 30%. In addition, the number of required beds is more than the number of beds considered in the scenario analysis in order to meet the bed requirement. This paper sheds light on bed management for service managers in geriatric medicine specialities.

Keywords: Bed management, bed occupancy rate, discrete event simulation, geriatric medicine, non-elective admission.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 445
37 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 805
36 A Study of the Adaptive Reuse for School Land Use Strategy: An Application of the Analytic Network Process and Big Data

Authors: Wann-Ming Wey

Abstract:

In today's popularity and progress of information technology, the big data set and its analysis are no longer a major conundrum. Now, we could not only use the relevant big data to analysis and emulate the possible status of urban development in the near future, but also provide more comprehensive and reasonable policy implementation basis for government units or decision-makers via the analysis and emulation results as mentioned above. In this research, we set Taipei City as the research scope, and use the relevant big data variables (e.g., population, facility utilization and related social policy ratings) and Analytic Network Process (ANP) approach to implement in-depth research and discussion for the possible reduction of land use in primary and secondary schools of Taipei City. In addition to enhance the prosperous urban activities for the urban public facility utilization, the final results of this research could help improve the efficiency of urban land use in the future. Furthermore, the assessment model and research framework established in this research also provide a good reference for schools or other public facilities land use and adaptive reuse strategies in the future.

Keywords: Adaptive reuse, analytic network process, big data, land use strategy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 350
35 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases

Authors: Hao-Hsiang Ku, Ching-Ho Chi

Abstract:

Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.

Keywords: Hadoop, NoSQL, ontology, backpropagation neural network, and high distributed file system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 588
34 Integration of Big Data to Predict Transportation for Smart Cities

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system.  The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.

Keywords: Big data, bus headway prediction, machine learning, public transportation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 826
33 Exploring the Activity Fabric of an Intelligent Environment with Hierarchical Hidden Markov Theory

Authors: Chiung-Hui Chen

Abstract:

The Internet of Things (IoT) was designed for widespread convenience. With the smart tag and the sensing network, a large quantity of dynamic information is immediately presented in the IoT. Through the internal communication and interaction, meaningful objects provide real-time services for users. Therefore, the service with appropriate decision-making has become an essential issue. Based on the science of human behavior, this study employed the environment model to record the time sequences and locations of different behaviors and adopted the probability module of the hierarchical Hidden Markov Model for the inference. The statistical analysis was conducted to achieve the following objectives: First, define user behaviors and predict the user behavior routes with the environment model to analyze user purposes. Second, construct the hierarchical Hidden Markov Model according to the logic framework, and establish the sequential intensity among behaviors to get acquainted with the use and activity fabric of the intelligent environment. Third, establish the intensity of the relation between the probability of objects’ being used and the objects. The indicator can describe the possible limitations of the mechanism. As the process is recorded in the information of the system created in this study, these data can be reused to adjust the procedure of intelligent design services.

Keywords: Behavior, big data, hierarchical Hidden Markov Model, intelligent object.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 381
32 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 610
31 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform

Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu

Abstract:

Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.

Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance empirical formula, typical SQL query tasks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 406
30 Urban Big Data: An Experimental Approach to Building-Value Estimation Using Web-Based Data

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

Current real-estate value estimation, difficult for laymen, usually is performed by specialists. This paper presents an automated estimation process based on big data and machine-learning technology that calculates influences of building conditions on real-estate price measurement. The present study analyzed actual building sales sample data for Nonhyeon-dong, Gangnam-gu, Seoul, Korea, measuring the major influencing factors among the various building conditions. Further to that analysis, a prediction model was established and applied using RapidMiner Studio, a graphical user interface (GUI)-based tool for derivation of machine-learning prototypes. The prediction model is formulated by reference to previous examples. When new examples are applied, it analyses and predicts accordingly. The analysis process discerns the crucial factors effecting price increases by calculation of weighted values. The model was verified, and its accuracy determined, by comparing its predicted values with actual price increases.

Keywords: Big data, building-value analysis, machine learning, price prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 738
29 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: Big data, correlation analysis, data recommendation system, urban data network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 674
28 Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R

Authors: Jaya Mathew

Abstract:

Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.

Keywords: Predictive maintenance, machine learning, big data, cloud, on premise SQL, R.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1265
27 Exploring Influence Range of Tainan City Using Electronic Toll Collection Big Data

Authors: Chen Chou, Feng-Tyan Lin

Abstract:

Big Data has been attracted a lot of attentions in many fields for analyzing research issues based on a large number of maternal data. Electronic Toll Collection (ETC) is one of Intelligent Transportation System (ITS) applications in Taiwan, used to record starting point, end point, distance and travel time of vehicle on the national freeway. This study, taking advantage of ETC big data, combined with urban planning theory, attempts to explore various phenomena of inter-city transportation activities. ETC, one of government's open data, is numerous, complete and quick-update. One may recall that living area has been delimited with location, population, area and subjective consciousness. However, these factors cannot appropriately reflect what people’s movement path is in daily life. In this study, the concept of "Living Area" is replaced by "Influence Range" to show dynamic and variation with time and purposes of activities. This study uses data mining with Python and Excel, and visualizes the number of trips with GIS to explore influence range of Tainan city and the purpose of trips, and discuss living area delimited in current. It dialogues between the concepts of "Central Place Theory" and "Living Area", presents the new point of view, integrates the application of big data, urban planning and transportation. The finding will be valuable for resource allocation and land apportionment of spatial planning.

Keywords: Big Data, ITS, influence range, living area, central place theory, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 575
26 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis

Authors: Ju-Young Kim, Tae-Heon Moon, Jung-Hun Cho

Abstract:

Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.

Keywords: Land use, SNS, text mining, urban regeneration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 785
25 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1085
24 A Proposal for U-City (Smart City) Service Method Using Real-Time Digital Map

Authors: SangWon Han, MuWook Pyeon, Sujung Moon, DaeKyo Seo

Abstract:

Recently, technologies based on three-dimensional (3D) space information are being developed and quality of life is improving as a result. Research on real-time digital map (RDM) is being conducted now to provide 3D space information. RDM is a service that creates and supplies 3D space information in real time based on location/shape detection. Research subjects on RDM include the construction of 3D space information with matching image data, complementing the weaknesses of image acquisition using multi-source data, and data collection methods using big data. Using RDM will be effective for space analysis using 3D space information in a U-City and for other space information utilization technologies.

Keywords: RDM, multi-source data, big data, U-City.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 411
23 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: Big data, big data Analytics, Hadoop framework, cloud computing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430
22 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: Big data, open data, productivity, transparency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1204
21 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Keywords: Big data, k-NN, machine learning, traffic speed prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 528
20 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues

Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid

Abstract:

New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.

Keywords: Information visualization, visual analytics, text mining, visual text analytics tools, big data visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 495
19 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1467
18 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis

Authors: William Xiu Shun Wong, Myungsu Lim, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.

Keywords: Big data, social network analysis, text mining, topic modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
17 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1190
16 A Simple User Administration View of Computing Clusters

Authors: Valeria M. Bastos, Myrian A. Costa, Matheus Ambrozio, Nelson F. F. Ebecken

Abstract:

In this paper a very simple and effective user administration view of computing clusters systems is implemented in order of friendly provide the configuration and monitoring of distributed application executions. The user view, the administrator view, and an internal control module create an illusionary management environment for better system usability. The architecture, properties, performance, and the comparison with others software for cluster management are briefly commented.

Keywords: Big data, computing clusters, administration view, user view.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1100
15 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1025