Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 95

Big Data Related Abstracts

95 Big Data Strategy for Telco: Network Transformation

Authors: F. Amin, S. Feizi

Abstract:

Big data has the potential to improve the quality of services; enable infrastructure that businesses depend on to adapt continually and efficiently; improve the performance of employees; help organizations better understand customers; and reduce liability risks. Analytics and marketing models of fixed and mobile operators are falling short in combating churn and declining revenue per user. Big Data presents new method to reverse the way and improve profitability. The benefits of Big Data and next-generation network, however, are more exorbitant than improved customer relationship management. Next generation of networks are in a prime position to monetize rich supplies of customer information—while being mindful of legal and privacy issues. As data assets are transformed into new revenue streams will become integral to high performance.

Keywords: Strategy, Big Data, next generation networks, network transformation

Procedia PDF Downloads 204
94 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: Big Data, Unstructured data, Variety, volume, velocity

Procedia PDF Downloads 359
93 On Cloud Computing: A Review of the Features

Authors: Assem Abdel Hamed Mousa

Abstract:

The Internet of Things probably already influences your life. And if it doesn’t, it soon will, say computer scientists; Ubiquitous computing names the third wave in computing, just now beginning. First were mainframes, each shared by lots of people. Now we are in the personal computing era, person and machine staring uneasily at each other across the desktop. Next comes ubiquitous computing, or the age of calm technology, when technology recedes into the background of our lives. Alan Kay of Apple calls this "Third Paradigm" computing. Ubiquitous computing is essentially the term for human interaction with computers in virtually everything. Ubiquitous computing is roughly the opposite of virtual reality. Where virtual reality puts people inside a computer-generated world, ubiquitous computing forces the computer to live out here in the world with people. Virtual reality is primarily a horse power problem; ubiquitous computing is a very difficult integration of human factors, computer science, engineering, and social sciences. The approach: Activate the world. Provide hundreds of wireless computing devices per person per office, of all scales (from 1" displays to wall sized). This has required new work in operating systems, user interfaces, networks, wireless, displays, and many other areas. We call our work "ubiquitous computing". This is different from PDA's, dynabooks, or information at your fingertips. It is invisible; everywhere computing that does not live on a personal device of any sort, but is in the woodwork everywhere. The initial incarnation of ubiquitous computing was in the form of "tabs", "pads", and "boards" built at Xerox PARC, 1988-1994. Several papers describe this work, and there are web pages for the Tabs and for the Boards (which are a commercial product now): Ubiquitous computing will drastically reduce the cost of digital devices and tasks for the average consumer. With labor intensive components such as processors and hard drives stored in the remote data centers powering the cloud , and with pooled resources giving individual consumers the benefits of economies of scale, monthly fees similar to a cable bill for services that feed into a consumer’s phone.

Keywords: Ubiquitous Computing, Cloud Computing, Internet, Big Data

Procedia PDF Downloads 255
92 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: Data Analysis, High Performance Computing, Big Data, HPC

Procedia PDF Downloads 346
91 Big Data-Driven Smart Policing: Big Data-Based Patrol Car Dispatching in Abu Dhabi, UAE

Authors: Oualid Walid Ben Ali

Abstract:

Big Data has become one of the buzzwords today. The recent explosion of digital data has led the organization, either private or public, to a new era towards a more efficient decision making. At some point, business decided to use that concept in order to learn what make their clients tick with phrases like ‘sales funnel’ analysis, ‘actionable insights’, and ‘positive business impact’. So, it stands to reason that Big Data was viewed through green (read: money) colored lenses. Somewhere along the line, however someone realized that collecting and processing data doesn’t have to be for business purpose only, but also could be used for other purposes to assist law enforcement or to improve policing or in road safety. This paper presents briefly, how Big Data have been used in the fields of policing order to improve the decision making process in the daily operation of the police. As example, we present a big-data driven system which is sued to accurately dispatch the patrol cars in a geographic environment. The system is also used to allocate, in real-time, the nearest patrol car to the location of an incident. This system has been implemented and applied in the Emirate of Abu Dhabi in the UAE.

Keywords: Intelligent, Police, Big Data, GIS, Big data analytics, Abu Dhabi, UAE, patrol car allocation, dispatching

Procedia PDF Downloads 342
90 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: Social Networks, Big Data, sentiment analysis, Twitter

Procedia PDF Downloads 339
89 Bioinformatics High Performance Computation and Big Data

Authors: Javed Mohammed

Abstract:

Right now, bio-medical infrastructure lags well behind the curve. Our healthcare system is dispersed and disjointed; medical records are a bit of a mess; and we do not yet have the capacity to store and process the crazy amounts of data coming our way from widespread whole-genome sequencing. And then there are privacy issues. Despite these infrastructure challenges, some researchers are plunging into bio medical Big Data now, in hopes of extracting new and actionable knowledge. They are doing delving into molecular-level data to discover bio markers that help classify patients based on their response to existing treatments; and pushing their results out to physicians in novel and creative ways. Computer scientists and bio medical researchers are able to transform data into models and simulations that will enable scientists for the first time to gain a profound under-standing of the deepest biological functions. Solving biological problems may require High-Performance Computing HPC due either to the massive parallel computation required to solve a particular problem or to algorithmic complexity that may range from difficult to intractable. Many problems involve seemingly well-behaved polynomial time algorithms (such as all-to-all comparisons) but have massive computational requirements due to the large data sets that must be analyzed. High-throughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. With the increased availability of genomic data traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is possible for researchers to consider modeling the folding of a protein or even the simulation of an entire human body. This research paper emphasizes the computational biology's growing need for high-performance computing and Big Data. It illustrates this article’s indispensability in meeting the scientific and engineering challenges of the twenty-first century, and how Protein Folding (the structure and function of proteins) and Phylogeny Reconstruction (evolutionary history of a group of genes) can use HPC that provides sufficient capability for evaluating or solving more limited but meaningful instances. This article also indicates solutions to optimization problems, and benefits Big Data and Computational Biology. The article illustrates the Current State-of-the-Art and Future-Generation Biology of HPC Computing with Big Data.

Keywords: Computational Biology, High Performance, Big Data, Parallel Computation, molecular data

Procedia PDF Downloads 251
88 Big Data’s Mechanistic View of Human Behavior May Displace Traditional Library Missions That Empower Users

Authors: Gabriel Gomez

Abstract:

The very concept of information seeking behavior, and the means by which librarians teach users to gain information, that is information literacy, are at the heart of how libraries deliver information, but big data will forever change human interaction with information and the way such behavior is both studied and taught. Just as importantly, big data will orient the study of behavior towards commercial ends because of a tendency towards instrumentalist views of human behavior, something one might also call a trend towards behaviorism. This oral presentation seeks to explore how the impact of big data on understandings of human behavior might impact a library information science (LIS) view of human behavior and information literacy, and what this might mean for social justice aims and concomitant community action normally at the center of librarianship. The methodology employed here is a non-empirical examination of current understandings of LIS in regards to social justice alongside an examination of the benefits and dangers foreseen with the growth of big data analysis. The rise of big data within the ever-changing information environment encapsulates a shift to a more mechanistic view of human behavior, one that can easily encompass information seeking behavior and information use. As commercial aims displace the important political and ethical aims that are often central to the missions espoused by libraries and the social sciences, the very altruism and power relations found in LIS are at risk. In this oral presentation, an examination of the social justice impulses of librarians regarding power and information demonstrates how such impulses can be challenged by big data, particularly as librarians understand user behavior and promote information literacy. The creeping behaviorist impulse inherent in the emphasis big data places on specific solutions, that is answers to question that ask how, as opposed to larger questions that hint at an understanding of why people learn or use information threaten library information science ideals. Together with the commercial nature of most big data, this existential threat can harm the social justice nature of librarianship.

Keywords: Big Data, behaviorism, Librarianship, library information science

Procedia PDF Downloads 277
87 How to Use Big Data in Logistics Issues

Authors: Mehmet Şimşek, Mehmet Akif Aslan, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: Logistics, Risk management, Big Data, operational efficiency

Procedia PDF Downloads 345
86 Natural Language News Generation from Big Data

Authors: Bastian Haarmann, Likas Sikorski

Abstract:

In this paper, we introduce an NLG application for the automatic creation of ready-to-publish texts from big data. The fully automatic generated stories have a high resemblance to the style in which the human writer would draw up a news story. Topics may include soccer games, stock exchange market reports, weather forecasts and many more. The generation of the texts runs according to the human language production. Each generated text is unique. Ready-to-publish stories written by a computer application can help humans to quickly grasp the outcomes of big data analyses, save time-consuming pre-formulations for journalists and cater to rather small audiences by offering stories that would otherwise not exist.

Keywords: Big Data, Publishing, Natural Language Generation, robotic journalism

Procedia PDF Downloads 328
85 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data, Learning Analytics, Hadoop, big data in education

Procedia PDF Downloads 206
84 Big Data: Appearance and Disappearance

Authors: James Moir

Abstract:

The mainstay of Big Data is prediction in that it allows practitioners, researchers, and policy analysts to predict trends based upon the analysis of large and varied sources of data. These can range from changing social and political opinions, patterns in crimes, and consumer behaviour. Big Data has therefore shifted the criterion of success in science from causal explanations to predictive modelling and simulation. The 19th-century science sought to capture phenomena and seek to show the appearance of it through causal mechanisms while 20th-century science attempted to save the appearance and relinquish causal explanations. Now 21st-century science in the form of Big Data is concerned with the prediction of appearances and nothing more. However, this pulls social science back in the direction of a more rule- or law-governed reality model of science and away from a consideration of the internal nature of rules in relation to various practices. In effect Big Data offers us no more than a world of surface appearance and in doing so it makes disappear any context-specific conceptual sensitivity.

Keywords: Surface, Big Data, Epistemology, appearance, disappearance

Procedia PDF Downloads 232
83 Analysis and Forecasting of Bitcoin Price Using Exogenous Data

Authors: T. Mesbah, J-C. Leneveu, A. Chereau, L. Mansart, M. Wyka

Abstract:

Extracting and interpreting information from Big Data represent a stake for years to come in several sectors such as finance. Currently, numerous methods are used (such as Technical Analysis) to try to understand and to anticipate market behavior, with mixed results because it still seems impossible to exactly predict a financial trend. The increase of available data on Internet and their diversity represent a great opportunity for the financial world. Indeed, it is possible, along with these standard financial data, to focus on exogenous data to take into account more macroeconomic factors. Coupling the interpretation of these data with standard methods could allow obtaining more precise trend predictions. In this paper, in order to observe the influence of exogenous data price independent of other usual effects occurring in classical markets, behaviors of Bitcoin users are introduced in a model reconstituting Bitcoin value, which is elaborated and tested for prediction purposes.

Keywords: Data Mining, Social Network, Big Data, Behavioral finance, global economy, bitcoin, financial trends, exogenous data

Procedia PDF Downloads 219
82 Wireless Transmission of Big Data Using Novel Secure Algorithm

Authors: K. Thiagarajan, K. Saranya, A. Veeraiah, B. Sudha

Abstract:

This paper presents a novel algorithm for secure, reliable and flexible transmission of big data in two hop wireless networks using cooperative jamming scheme. Two hop wireless networks consist of source, relay and destination nodes. Big data has to transmit from source to relay and from relay to destination by deploying security in physical layer. Cooperative jamming scheme determines transmission of big data in more secure manner by protecting it from eavesdroppers and malicious nodes of unknown location. The novel algorithm that ensures secure and energy balance transmission of big data, includes selection of data transmitting region, segmenting the selected region, determining probability ratio for each node (capture node, non-capture and eavesdropper node) in every segment, evaluating the probability using binary based evaluation. If it is secure transmission resume with the two- hop transmission of big data, otherwise prevent the attackers by cooperative jamming scheme and transmit the data in two-hop transmission.

Keywords: Big Data, Energy Balance, two-hop transmission, physical layer wireless security, cooperative jamming

Procedia PDF Downloads 348
81 BigCrypt: A Probable Approach of Big Data Encryption to Protect Personal and Business Privacy

Authors: Abdullah Al Mamun, Talal Alkharobi

Abstract:

As data size is growing up, people are became more familiar to store big amount of secret information into cloud storage. Companies are always required to need transfer massive business files from one end to another. We are going to lose privacy if we transmit it as it is and continuing same scenario repeatedly without securing the communication mechanism means proper encryption. Although asymmetric key encryption solves the main problem of symmetric key encryption but it can only encrypt limited size of data which is inapplicable for large data encryption. In this paper we propose a probable approach of pretty good privacy for encrypt big data using both symmetric and asymmetric keys. Our goal is to achieve encrypt huge collection information and transmit it through a secure communication channel for committing the business and personal privacy. To justify our method an experimental dataset from three different platform is provided. We would like to show that our approach is working for massive size of various data efficiently and reliably.

Keywords: Cryptography, Cloud Computing, Big Data, Hadoop, public key

Procedia PDF Downloads 186
80 Survey on Big Data Stream Classification by Decision Tree

Authors: Samira Kalantary, Mansoureh Ghiasabadi Farahani, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: Big Data, classification, data streams, Decision Tree

Procedia PDF Downloads 275
79 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: Cloud Computing, Big Data, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 256
78 Sensor Data Analysis for a Large Mining Major

Authors: Sudipto Shanker Dasgupta

Abstract:

One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.

Keywords: Big Data, Data Science, sensor data, Hadoop, streaming analytics, high throughput

Procedia PDF Downloads 245
77 Big Data Analysis with RHadoop

Authors: Dong Hoon Lim, Ji Eun Shin, Byung Ho Jung

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: Big Data, Hadoop, parallel regression analysis, RHadoop

Procedia PDF Downloads 294
76 Big Data Analysis with Rhipe

Authors: Dong Hoon Lim, Ji Eun Shin, Byung Ho Jung

Abstract:

Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.

Keywords: Big Data, Hadoop, parallel regression analysis, Rhipe

Procedia PDF Downloads 382
75 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Big Data, data warehouse, model data, object fact, object relational fact, process developed data warehouse

Procedia PDF Downloads 271
74 Grid and Market Integration of Large Scale Wind Farms using Advanced Predictive Data Mining Techniques

Authors: Umit Cali

Abstract:

The integration of intermittent energy sources like wind farms into the electricity grid has become an important challenge for the utilization and control of electric power systems, because of the fluctuating behaviour of wind power generation. Wind power predictions improve the economic and technical integration of large amounts of wind energy into the existing electricity grid. Trading, balancing, grid operation, controllability and safety issues increase the importance of predicting power output from wind power operators. Therefore, wind power forecasting systems have to be integrated into the monitoring and control systems of the transmission system operator (TSO) and wind farm operators/traders. The wind forecasts are relatively precise for the time period of only a few hours, and, therefore, relevant with regard to Spot and Intraday markets. In this work predictive data mining techniques are applied to identify a statistical and neural network model or set of models that can be used to predict wind power output of large onshore and offshore wind farms. These advanced data analytic methods helps us to amalgamate the information in very large meteorological, oceanographic and SCADA data sets into useful information and manageable systems. Accurate wind power forecasts are beneficial for wind plant operators, utility operators, and utility customers. An accurate forecast allows grid operators to schedule economically efficient generation to meet the demand of electrical customers. This study is also dedicated to an in-depth consideration of issues such as the comparison of day ahead and the short-term wind power forecasting results, determination of the accuracy of the wind power prediction and the evaluation of the energy economic and technical benefits of wind power forecasting.

Keywords: Artificial Intelligence, Data Mining, Forecasting, wind power, Big Data, Renewable Energy Sources, Energy Economics, Power Grids, power trading

Procedia PDF Downloads 391
73 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis

Authors: Chen Liu, Kee-Young Kwahk, Yoonjin Hyun, Seongi Choi, Namgyu Kim, Dasom Kim, Myungsu Lim, William Xiu Shun Wong

Abstract:

Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.

Keywords: Text Mining, Big Data, Social Network Analysis, topic modeling

Procedia PDF Downloads 163
72 Social Data Aggregator and Locator of Knowledge (STALK)

Authors: Rashmi Raghunandan, Sanjana Shankar, Rakshitha K. Bhat

Abstract:

Social media contributes a vast amount of data and information about individuals to the internet. This project will greatly reduce the need for unnecessary manual analysis of large and diverse social media profiles by filtering out and combining the useful information from various social media profiles, eliminating irrelevant data. It differs from the existing social media aggregators in that it does not provide a consolidated view of various profiles. Instead, it provides consolidated INFORMATION derived from the subject’s posts and other activities. It also allows analysis over multiple profiles and analytics based on several profiles. We strive to provide a query system to provide a natural language answer to questions when a user does not wish to go through the entire profile. The information provided can be filtered according to the different use cases it is used for.

Keywords: Analysis, Social Network, Big Data, Facebook, LinkedIn, git

Procedia PDF Downloads 268
71 A Simple User Administration View of Computing Clusters

Authors: Valeria M. Bastos, Myrian A. Costa, Matheus Ambrozio, Nelson F. F. Ebecken

Abstract:

In this paper a very simple and effective user administration view of computing clusters systems is implemented in order of friendly provide the configuration and monitoring of distributed application executions. The user view, the administrator view, and an internal control module create an illusionary management environment for better system usability. The architecture, properties, performance, and the comparison with others software for cluster management are briefly commented.

Keywords: Big Data, computing clusters, administration view, user view

Procedia PDF Downloads 190
70 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: Productivity, open data, Big Data, Data Governance

Procedia PDF Downloads 232
69 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Machine Learning, Telecommunication, Big Data, mining big data

Procedia PDF Downloads 225
68 Adaptive Swarm Balancing Algorithms for Rare-Event Prediction in Imbalanced Healthcare Data

Authors: Jinyan Li, Simon Fong, Raymond Wong, Mohammed Sabah, Fiaidhi Jinan

Abstract:

Clinical data analysis and forecasting have make great contributions to disease control, prevention and detection. However, such data usually suffer from highly unbalanced samples in class distributions. In this paper, we target at the binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat-inspired algorithm, and combine both of them with the synthetic minority over-sampling technique (SMOTE) for processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reveal that while the performance improvements obtained by the former methods are not scalable to larger data scales, the later one, which we call Adaptive Swarm Balancing Algorithms, leads to significant efficiency and effectiveness improvements on large datasets. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. Leading to more credible performances of the classifier, and shortening the running time compared with the brute-force method.

Keywords: Big Data, Imbalanced dataset, meta-heuristic algorithm, SMOTE

Procedia PDF Downloads 222
67 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Dong Hoon Lim, Byung Ho Jung

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: Big Data, Logistic Regression, MapReduce, RHadoop

Procedia PDF Downloads 160
66 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Dong Hoon Lim, Ji Eun Shin

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: Big Data, K-Means Clustering, RHadoop, combiner

Procedia PDF Downloads 239