Search results for: crowdsourced data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25118

Search results for: crowdsourced data

25118 The Changes in Motivations and the Use of Translation Strategies in Crowdsourced Translation: A Case Study on Global Voices’ Chinese Translation Project

Authors: Ya-Mei Chen

Abstract:

Online crowdsourced translation, an innovative translation practice brought by Web 2.0 technologies and the democratization of information, has become increasingly popular in the Internet era. Carried out by grass-root internet users, crowdsourced translation contains fundamentally different features from its off-line traditional counterpart, such as voluntary participation and parallel collaboration. To better understand such a participatory and collaborative nature, this paper will use the online Chinese translation project of Global Voices as a case study to investigate the following issues: (1) the changes in volunteer translators’ and reviewers’ motivations for participation, (2) translators’ and reviewers’ use of translation strategies and (3) the correlations of translators’ and reviewers’ motivations and strategies with the organizational mission, the translation style guide, the translator-reviewer interaction, the mediation of the translation platform and various types of capital within the translation field. With an aim to systematically explore the above three issues, this paper will collect both quantitative and qualitative data and then draw upon Engestrom’s activity theory and Bourdieu’s field theory as a theoretical framework to analyze the data in question. An online anonymous questionnaire will be conducted to obtain the quantitative data. The questionnaire will contain questions related to volunteer translators’ and reviewers’ backgrounds, participation motivations, translation strategies and mutual relations as well as the operation of the translation platform. Concerning the qualitative data, they will come from (1) a comparative study between some English news texts published on Global Voices and their Chinese translations, (2) an analysis of the online discussion forum associated with Global Voices’ Chinese translation project and (3) the information about the project’s translation mission and guidelines. It is hoped that this research, through a detailed sociological analysis of a cause-driven crowdsourced translation project, can enable translation researchers and practitioners to adequately meet the translation challenges appearing in the digital age.

Keywords: crowdsourced translation, global voices, motivation, translation strategies

Procedia PDF Downloads 370
25117 Open Innovation for Crowdsourced Product Development: The Case Study of Quirky.com

Authors: Ana Bilandzic, Marcus Foth, Greg Hearn

Abstract:

In a narrow sense, innovation is the invention and commercialisation of a new product or service in the marketplace. The literature suggests places that support knowledge exchange and social interaction, e.g. coffee shops, to nurture innovative ideas. With the widespread success of Internet, interpersonal communication and interaction changed. Online platforms complement physical places for idea exchange and innovation – the rise of hybrid, ‘net localities.’ Further, since its introduction in 2003 by Chesbrough, the concept of open innovation received increased attention as a topic in academic research as well as an innovation strategy applied by companies. Open innovation allows companies to seek and release intellectual property and new ideas from outside of their own company. As a consequence, the innovation process is no longer only managed within the company, but it is pursued in a co-creation process with customers, suppliers, and other stakeholders. Quirky.com (Quirky), a company founded by Ben Kaufman in 2009, recognised the opportunity given by the Internet for knowledge exchange and open innovation. Quirky developed an online platform that makes innovation available to everyone. This paper reports on a study that analysed Quirky’s business process in an extended event-driven process chain (eEPC). The aim was to determine how the platform enabled crowdsourced innovation for physical products on the Internet. The analysis reveals that key elements of the business model are based on open innovation. Quirky is an example of how open innovation can support crowdsourced and crowdfunded product ideation, development and selling. The company opened up various stages in the innovation process to its members to contribute in the product development, e.g. product ideation, design, and market research. Throughout the process, members earn influence through participating in the product development. Based on the influence they receive, shares on the product’s turnover. The outcomes of the study’s analysis highlighted certain benefits of open innovation for product development. The paper concludes with recommendations for future research to look into opportunities of open innovation approaches to be adopted by tertiary institutions as a novel way to commercialise research intellectual property.

Keywords: business process, crowdsourced innovation, open innovation, Quirky

Procedia PDF Downloads 227
25116 Crowdsourced Economic Valuation of the Recreational Benefits of Constructed Wetlands

Authors: Andrea Ghermandi

Abstract:

Constructed wetlands have long been recognized as sources of ancillary benefits such as support for recreational activities. To date, there is a lack of quantitative understanding of the extent and welfare impact of such benefits. Here, it is shown how geotagged, passively crowdsourced data from online social networks (e.g., Flickr and Panoramio) and Geographic Information Systems (GIS) techniques can: (1) be used to infer annual recreational visits to 273 engineered wetlands worldwide; and (2) be integrated with non-market economic valuation techniques (e.g., travel cost method) to infer the monetary value of recreation in these systems. Counts of social media photo-user-days are highly correlated with the number of observed visits in 62 engineered wetlands worldwide (Pearson’s r = 0.811; p-value < 0.001). The estimated, mean willingness to pay for access to 115 wetlands ranges between $5.3 and $374. In 50% of the investigated wetlands providing polishing treatment to advanced municipal wastewater, the present value of such benefits exceeds that of the capital, operation and maintenance costs (lifetime = 45 years; discount rate = 6%), indicating that such systems are sources of net societal benefits even before factoring in benefits derived from water quality improvement and storage. Based on the above results, it is argued that recreational benefits should be taken into account in the design and management of constructed wetlands, as well as when such green infrastructure systems are compared with conventional wastewater treatment solutions.

Keywords: constructed wetlands, cultural ecosystem services, ecological engineering, social media

Procedia PDF Downloads 128
25115 Using Crowd-Sourced Data to Assess Safety in Developing Countries: The Case Study of Eastern Cairo, Egypt

Authors: Mahmoud Ahmed Farrag, Ali Zain Elabdeen Heikal, Mohamed Shawky Ahmed, Ahmed Osama Amer

Abstract:

Crowd-sourced data refers to data that is collected and shared by a large number of individuals or organizations, often through the use of digital technologies such as mobile devices and social media. The shortage in crash data collection in developing countries makes it difficult to fully understand and address road safety issues in these regions. In developing countries, crowd-sourced data can be a valuable tool for improving road safety, particularly in urban areas where the majority of road crashes occur. This study is -to our best knowledge- the first to develop safety performance functions using crowd-sourced data by adopting a negative binomial structure model and the Full Bayes model to investigate traffic safety for urban road networks and provide insights into the impact of roadway characteristics. Furthermore, as a part of the safety management process, network screening has been undergone through applying two different methods to rank the most hazardous road segments: PCR method (adopted in the Highway Capacity Manual HCM) as well as a graphical method using GIS tools to compare and validate. Lastly, recommendations were suggested for policymakers to ensure safer roads.

Keywords: crowdsourced data, road crashes, safety performance functions, Full Bayes models, network screening

Procedia PDF Downloads 52
25114 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques

Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian

Abstract:

Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.

Keywords: data mining, k-means, road traffic accidents, Waze, Weka

Procedia PDF Downloads 416
25113 Valuing Cultural Ecosystem Services of Natural Treatment Systems Using Crowdsourced Data

Authors: Andrea Ghermandi

Abstract:

Natural treatment systems such as constructed wetlands and waste stabilization ponds are increasingly used to treat water and wastewater from a variety of sources, including stormwater and polluted surface water. The provision of ancillary benefits in the form of cultural ecosystem services makes these systems unique among water and wastewater treatment technologies and greatly contributes to determine their potential role in promoting sustainable water management practices. A quantitative analysis of these benefits, however, has been lacking in the literature. Here, a critical assessment of the recreational and educational benefits in natural treatment systems is provided, which combines observed public use from a survey of managers and operators with estimated public use as obtained using geotagged photos from social media as a proxy for visitation rates. Geographic Information Systems (GIS) are used to characterize the spatial boundaries of 273 natural treatment systems worldwide. Such boundaries are used as input for the Application Program Interfaces (APIs) of two popular photo-sharing websites (Flickr and Panoramio) in order to derive the number of photo-user-days, i.e., the number of yearly visits by individual photo users in each site. The adequateness and predictive power of four univariate calibration models using the crowdsourced data as a proxy for visitation are evaluated. A high correlation is found between photo-user-days and observed annual visitors (Pearson's r = 0.811; p-value < 0.001; N = 62). Standardized Major Axis (SMA) regression is found to outperform Ordinary Least Squares regression and count data models in terms of predictive power insofar as standard verification statistics – such as the root mean square error of prediction (RMSEP), the mean absolute error of prediction (MAEP), the reduction of error (RE), and the coefficient of efficiency (CE) – are concerned. The SMA regression model is used to estimate the intensity of public use in all 273 natural treatment systems. System type, influent water quality, and area are found to statistically affect public use, consistently with a priori expectations. Publicly available information regarding the home location of the sampled visitors is derived from their social media profiles and used to infer the distance they are willing to travel to visit the natural treatment systems in the database. Such information is analyzed using the travel cost method to derive monetary estimates of the recreational benefits of the investigated natural treatment systems. Overall, the findings confirm the opportunities arising from an integrated design and management of natural treatment systems, which combines the objectives of water quality enhancement and provision of cultural ecosystem services through public use in a multi-functional approach and compatibly with the need to protect public health.

Keywords: constructed wetlands, cultural ecosystem services, ecological engineering, waste stabilization ponds

Procedia PDF Downloads 180
25112 Recreation and Environmental Quality of Tropical Wetlands: A Social Media Based Spatial Analysis

Authors: Michael Sinclair, Andrea Ghermandi, Sheela A. Moses, Joseph Sabu

Abstract:

Passively crowdsourced data, such as geotagged photographs from social media, represent an opportunistic source of location-based and time-specific behavioral data for ecosystem services analysis. Such data have innovative applications for environmental management and protection, which are replicable at wide spatial scales and in the context of both developed and developing countries. Here we test one such innovation, based on the analysis of the metadata of online geotagged photographs, to investigate the provision of recreational services by the entire network of wetland ecosystems in the state of Kerala, India. We estimate visitation to individual wetlands state-wide and extend, for the first time to a developing region, the emerging application of cultural ecosystem services modelling using data from social media. The impacts of restoration of wetland areal extension and water quality improvement are explored as a means to inform more sustainable management strategies. Findings show that improving water quality to a level suitable for the preservation of wildlife and fisheries could increase annual visits by 350,000, an increase of 13% in wetland visits state-wide, while restoring previously encroached wetland area could result in a 7% increase in annual visits, corresponding to 49,000 visitors, in the Ashtamudi and Vembanad lakes alone, two large coastal Ramsar wetlands in Kerala. We discuss how passive crowdsourcing of social media data has the potential to improve current ecosystem service analyses and environmental management practices also in the context of developing countries.

Keywords: coastal wetlands, cultural ecosystem services, India, passive crowdsourcing, social media, wetland restoration

Procedia PDF Downloads 154
25111 A Crowdsourced Homeless Data Collection System and Its Econometric Analysis: Strengthening Inclusive Public Administration Policies

Authors: Praniil Nagaraj

Abstract:

This paper proposes a method to collect homeless data using crowdsourcing and presents an approach to analyze the data, demonstrating its potential to strengthen existing and future policies aimed at promoting socio-economic equilibrium. This paper's contributions can be categorized into three main areas. Firstly, a unique method for collecting homeless data is introduced, utilizing a user-friendly smartphone app (currently available for Android). The app enables the general public to quickly record information about homeless individuals, including the number of people and details about their living conditions. The collected data, including date, time, and location, is anonymized and securely transmitted to the cloud. It is anticipated that an increasing number of users motivated to contribute to society will adopt the app, thus expanding the data collection efforts. Duplicate data is addressed through simple classification methods, and historical data is utilized to fill in missing information. The second contribution of this paper is the description of data analysis techniques applied to the collected data. By combining this new data with existing information, statistical regression analysis is employed to gain insights into various aspects, such as distinguishing between unsheltered and sheltered homeless populations, as well as examining their correlation with factors like unemployment rates, housing affordability, and labor demand. Initial data is collected in San Francisco, while pre-existing information is drawn from three cities: San Francisco, New York City, and Washington D.C., facilitating the conduction of simulations. The third contribution focuses on demonstrating the practical implications of the data processing results. The challenges faced by key stakeholders, including charitable organizations and local city governments, are taken into consideration. Two case studies are presented as examples. The first case study explores improving the efficiency of food and necessities distribution, as well as medical assistance, driven by charitable organizations. The second case study examines the correlation between micro-geographic budget expenditure by local city governments and homeless information to justify budget allocation and expenditures. The ultimate objective of this endeavor is to enable the continuous enhancement of the quality of life for the underprivileged. It is hoped that through increased crowdsourcing of data from the public, the Generosity Curve and the Need Curve will intersect, leading to a better world for all.

Keywords: crowdsourcing, homelessness, socio-economic policies, statistical analysis

Procedia PDF Downloads 42
25110 Crowdsensing Project in the Brazilian Municipality of Florianópolis for the Number of Visitors Measurement

Authors: Carlos Roberto De Rolt, Julio da Silva Dias, Rafael Tezza, Luca Foschini, Matteo Mura

Abstract:

The seasonal population fluctuation presents a challenge to touristic cities since the number of inhabitants can double according to the season. The aim of this work is to develop a model that correlates the waste collected with the population of the city and also allow cooperation between the inhabitants and the local government. The model allows public managers to evaluate the impact of the seasonal population fluctuation on waste generation and also to improve planning resource utilization throughout the year. The study uses data from the company that collects the garbage in Florianópolis, a Brazilian city that presents the profile of a city that attracts tourists due to numerous beaches and warm weather. The fluctuations are caused by the number of people that come to the city throughout the year for holidays, summer time vacations or business events. Crowdsensing will be accomplished through smartphones with access to an app for data collection, with voluntary participation of the population. Crowdsensing participants can access information collected in waves for this portal. Crowdsensing represents an innovative and participatory approach which involves the population in gathering information to improve the quality of life. The management of crowdsensing solutions plays an essential role given the complexity to foster collaboration, establish available sensors and collect and process the collected data. Practical implications of this tool described in this paper refer, for example, to the management of seasonal tourism in a large municipality, whose public services are impacted by the floating of the population. Crowdsensing and big data support managers in predicting the arrival, permanence, and movement of people in a given urban area. Also, by linking crowdsourced data to databases from other public service providers - e.g., water, garbage collection, electricity, public transport, telecommunications - it is possible to estimate the floating of the population of an urban area affected by seasonal tourism. This approach supports the municipality in increasing the effectiveness of resource allocation while, at the same time, increasing the quality of the service as perceived by citizens and tourists.

Keywords: big data, dashboards, floating population, smart city, urban management solutions

Procedia PDF Downloads 287
25109 Crowdalert: An Android Application for Increasing the Awareness and Response Initiatives of the Citizens through Crowdsourcing

Authors: John Benedict Bernardo

Abstract:

Crowdsourcing is a way of collecting information provided by the volunteers. This crowdsourced information has the capacity to increase the people’s situational awareness in times of disasters. The research reflected in this paper strives to demonstrate the benefits of crowdsourcing during natural disasters and the ways of utilizing it for disaster response. Shared information regarding natural disasters from social media is often scattered as the inputs from these media are uncategorized. For this reason, the study aims to equip the citizens a medium that is solely intended for sharing and/or obtaining natural disaster-related information. Ergo, an android application was developed to gather and publicize this volunteered information. The capability of crowdsourcing and the effectiveness of the application were evaluated and the result shows overwhelming agreement that this study is indeed efficient in increasing the awareness and response initiatives of the citizens during natural disasters.

Keywords: crowdsourcing, natural disasters, mobile application, social media

Procedia PDF Downloads 318
25108 Data Transformations in Data Envelopment Analysis

Authors: Mansour Mohammadpour

Abstract:

Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.

Keywords: data transformation, data envelopment analysis, undesirable data, negative data

Procedia PDF Downloads 20
25107 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 340
25106 Dual-use UAVs in Armed Conflicts: Opportunities and Risks for Cyber and Electronic Warfare

Authors: Piret Pernik

Abstract:

Based on strategic, operational, and technical analysis of the ongoing armed conflict in Ukraine, this paper will examine the opportunities and risks of using small commercial drones (dual-use unmanned aerial vehicles, UAV) for military purposes. The paper discusses the opportunities and risks in the information domain, encompassing both cyber and electromagnetic interference and attacks. The paper will draw conclusions on a possible strategic impact to the battlefield outcomes in the modern armed conflicts by the widespread use of dual-use UAVs. This article will contribute to filling the gap in the literature by examining based on empirical data cyberattacks and electromagnetic interference. Today, more than one hundred states and non-state actors possess UAVs ranging from low cost commodity models, widely are dual-use, available and affordable to anyone, to high-cost combat UAVs (UCAV) with lethal kinetic strike capabilities, which can be enhanced with Artificial Intelligence (AI) and Machine Learning (ML). Dual-use UAVs have been used by various actors for intelligence, reconnaissance, surveillance, situational awareness, geolocation, and kinetic targeting. Thus they function as force multipliers enabling kinetic and electronic warfare attacks and provide comparative and asymmetric operational and tactical advances. Some go as far as argue that automated (or semi-automated) systems can change the character of warfare, while others observe that the use of small drones has not changed the balance of power or battlefield outcomes. UAVs give considerable opportunities for commanders, for example, because they can be operated without GPS navigation, makes them less vulnerable and dependent on satellite communications. They can and have been used to conduct cyberattacks, electromagnetic interference, and kinetic attacks. However, they are highly vulnerable to those attacks themselves. So far, strategic studies, literature, and expert commentary have overlooked cybersecurity and electronic interference dimension of the use of dual use UAVs. The studies that link technical analysis of opportunities and risks with strategic battlefield outcomes is missing. It is expected that dual use commercial UAV proliferation in armed and hybrid conflicts will continue and accelerate in the future. Therefore, it is important to understand specific opportunities and risks related to the crowdsourced use of dual-use UAVs, which can have kinetic effects. Technical countermeasures to protect UAVs differ depending on a type of UAV (small, midsize, large, stealth combat), and this paper will offer a unique analysis of small UAVs both from the view of opportunities and risks for commanders and other actors in armed conflict.

Keywords: dual-use technology, cyber attacks, electromagnetic warfare, case studies of cyberattacks in armed conflicts

Procedia PDF Downloads 102
25105 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 424
25104 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 547
25103 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 563
25102 Quality Assurance in Translation Crowdsourcing: The TED Open Translation Project

Authors: Ya-Mei Chen

Abstract:

The participatory culture enabled by Web 2.0 technologies has led to the emergence of online translation crowdsourcing, which mainly relies on the collective intelligence of volunteer translators. Due to the fact that many volunteer translators do not have formal translator training, concerns have been raised about the quality of crowdsourced translations. Some empirical research has been done to examine the translation quality of for-profit crowdsourcing initiatives. However, quality assurance of non-profit translation crowdsourcing has rarely been explored in detail. Using the TED Open Translation Project as a case study, this paper investigates how the translation-review-approval method adopted by TED can (1) direct the volunteer translators’ use of translation strategies as well as the reviewers’ adoption of revising strategies and (2) shape the final translation products. To well examine the actual effect of TED’s translation-review-approval method, this paper will focus on its two major quality assurance mechanisms, that is, TED’s style guidelines and quality review. Based on an anonymous questionnaire, this research will first explore whether the volunteer translators and reviewers are aware of the style guidelines and whether their use of translation strategies is similar to that advised in the guidelines. The questionnaire, which will be posted online, will consist of two parts: demographic information and translation strategies. The invitations to complete it will then be distributed through TED Translator Facebook groups. With an aim to investigate if the style guidelines have any substantial impacts on actual subtitling practices, a comparison will be made between the original English subtitles of 20 TED talks (each around 5 to 7 minutes) and their Chinese subtitle translations to identify regularly adopted strategies. Concerning the function of the reviewing stage, a comparative study will be conducted between the drafts of Chinese subtitles for 10 short English talks and the revised versions of these drafts so as to examine the actual revising strategies and their effect on translation quality. According to the results obtained from the questionnaire and textual comparisons, this paper will provide in-depth analysis of quality assurance of the TED Open Translation Project. It is hoped that this research, through a detailed investigation of non-profit translation crowdsourcing, can enable translation researchers and practitioners to have a better understanding of quality control in translation crowdsourcing in the digital age.

Keywords: quality assurance, TED, translation crowdsourcing, volunteer translators

Procedia PDF Downloads 229
25101 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 408
25100 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 139
25099 Using Machine Learning Techniques to Extract Useful Information from Dark Data

Authors: Nigar Hussain

Abstract:

It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.

Keywords: big data, dark data, machine learning, heatmap, random forest

Procedia PDF Downloads 27
25098 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 392
25097 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 524
25096 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 472
25095 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 403
25094 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 640
25093 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 377
25092 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 161
25091 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 227
25090 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 167
25089 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 205