Search results for: clustering on flowing data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24965

Search results for: clustering on flowing data

24455 The Relationship Between Car Drivers' Background Information and Risky Events In I- Dreams Project

Authors: Dagim Dessalegn Haile

Abstract:

This study investigated the interaction between the drivers' socio-demographic background information (age, gender, and driving experience) and the risky events score in the i-DREAMS platform. Further, the relationship between the participants' background driving behavior and the i-DREAMS platform behavioral output scores of risky events was also investigated. The i-DREAMS acronym stands for Smart Driver and Road Environment Assessment and Monitoring System. It is a European Union Horizon 2020 funded project consisting of 13 partners, researchers, and industry partners from 8 countries. A total of 25 Belgian car drivers (16 male and nine female) were considered for analysis. Drivers' ages were categorized into ages 18-25, 26-45, 46-65, and 65 and older. Drivers' driving experience was also categorized into four groups: 1-15, 16-30, 31-45, and 46-60 years. Drivers are classified into two clusters based on the recorded score for risky events during phase 1 (baseline) using risky events; acceleration, deceleration, speeding, tailgating, overtaking, and lane discipline. Agglomerative hierarchical clustering using SPSS shows Cluster 1 drivers are safer drivers, and Cluster 2 drivers are identified as risky drivers. The analysis result indicated no significant relationship between age groups, gender, and experience groups except for risky events like acceleration, tailgating, and overtaking in a few phases. This is mainly because the fewer participants create less variability of socio-demographic background groups. Repeated measure ANOVA shows that cluster 2 drivers improved more than cluster 1 drivers for tailgating, lane discipline, and speeding events. A positive relationship between background drivers' behavior and i-DREAMS platform behavioral output scores is observed. It implies that car drivers who in the questionnaire data indicate committing more risky driving behavior demonstrate more risky driver behavior in the i-DREAMS observed driving data.

Keywords: i-dreams, car drivers, socio-demographic background, risky events

Procedia PDF Downloads 60
24454 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 410
24453 A Frictional-Collisional Closure Model for the Saturated Granular Flow: Experimental Evidence and Two Phase Modelling

Authors: Yunhui Sun, Qingquan Liu, Xiaoliang Wang

Abstract:

Dense granular flows widely exist in geological flows such as debris flow, landslide, or sheet flow, where both the interparticle and solid-liquid interactions are important to modify the flow. So, a two-phase approach with both phases correctly modelled is important for a better investigation of the saturated granular flows. However, a proper closure model covering a wide range of flowing states for the solid phase is still lacking. This study first employs a chute flow experiment based on the refractive index matching method, which makes it possible to obtain internal flow information such as velocity, shear rate, granular fluctuation, and volume fraction. The granular stress is obtained based on a steady assumption. The kinetic theory is found to describe the stress dependence on the flow state well. More importantly, the granular rheology is found to be frictionally dominated under weak shear and collisionally dominated under strong shear. The results presented thus provide direct experimental evidence on a possible frictional-collisional closure model for the granular phase. The data indicates that both frictional stresses exist over a wide range of the volume fraction, though traditional theory believes it vanishes below a critical volume fraction. Based on the findings, a two-phase model is used to simulate the chute flow. Both phases are modelled as continuum media, and the inter-phase interactions, such as drag force and pressure gradient force, are considered. The frictional-collisional model is used for the closure of the solid phase stress. The profiles of the kinematic properties agree well with the experiments. This model is further used to simulate immersed granular collapse, which is unsteady in nature, to study the applicability of this model, which is derived from steady flow.

Keywords: closure model, collision, friction, granular flow, two-phase model

Procedia PDF Downloads 48
24452 Access Control System for Big Data Application

Authors: Winfred Okoe Addy, Jean Jacques Dominique Beraud

Abstract:

Access control systems (ACs) are some of the most important components in safety areas. Inaccuracies of regulatory frameworks make personal policies and remedies more appropriate than standard models or protocols. This problem is exacerbated by the increasing complexity of software, such as integrated Big Data (BD) software for controlling large volumes of encrypted data and resources embedded in a dedicated BD production system. This paper proposes a general access control strategy system for the diffusion of Big Data domains since it is crucial to secure the data provided to data consumers (DC). We presented a general access control circulation strategy for the Big Data domain by describing the benefit of using designated access control for BD units and performance and taking into consideration the need for BD and AC system. We then presented a generic of Big Data access control system to improve the dissemination of Big Data.

Keywords: access control, security, Big Data, domain

Procedia PDF Downloads 121
24451 Detection and Quantification of Active Pharmaceutical Ingredients as Adulterants in Garcinia cambogia Slimming Preparations Using NIR Spectroscopy Combined with Chemometrics

Authors: Dina Ahmed Selim, Eman Shawky Anwar, Rasha Mohamed Abu El-Khair

Abstract:

A rapid, simple and efficient method with minimal sample treatment was developed for authentication of Garcinia cambogia fruit peel powder, along with determining undeclared active pharmaceutical ingredients (APIs) in its herbal slimming dietary supplements using near infrared spectroscopy combined with chemometrics. Five featured adulterants, including sibutramine, metformin, orlistat, ephedrine, and theophylline are selected as target compounds. The Near infrared spectral data matrix of authentic Garcinia cambogia fruit peel and specimens degraded by intentional contamination with the five selected APIs was subjected to hierarchical clustering analysis to investigate their bundling figure. SIMCA models were established to ensure the genuiness of Garcinia cambogia fruit peel which resulted in perfect classification of all tested specimens. Adulterated samples were utilized for construction of PLSR models based on different APIs contents at minute levels of fraud practices (LOQ < 0.2% w/w).The suggested approach can be applied to enhance and guarantee the safety and quality of Garcinia fruit peel powder as raw material and in dietary supplements.

Keywords: Garcinia cambogia, Quality control, NIR spectroscopy, Chemometrics

Procedia PDF Downloads 69
24450 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 37
24449 Modeling the Demand for the Healthcare Services Using Data Analysis Techniques

Authors: Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Roman D. Zaitsev

Abstract:

Rapidly evolving modern data analysis technologies in healthcare play a large role in understanding the operation of the system and its characteristics. Nowadays, one of the key tasks in urban healthcare is to optimize the resource allocation. Thus, the application of data analysis in medical institutions to solve optimization problems determines the significance of this study. The purpose of this research was to establish the dependence between the indicators of the effectiveness of the medical institution and its resources. Hospital discharges by diagnosis; hospital days of in-patients and in-patient average length of stay were selected as the performance indicators and the demand of the medical facility. The hospital beds by type of care, medical technology (magnetic resonance tomography, gamma cameras, angiographic complexes and lithotripters) and physicians characterized the resource provision of medical institutions for the developed models. The data source for the research was an open database of the statistical service Eurostat. The choice of the source is due to the fact that the databases contain complete and open information necessary for research tasks in the field of public health. In addition, the statistical database has a user-friendly interface that allows you to quickly build analytical reports. The study provides information on 28 European for the period from 2007 to 2016. For all countries included in the study, with the most accurate and complete data for the period under review, predictive models were developed based on historical panel data. An attempt to improve the quality and the interpretation of the models was made by cluster analysis of the investigated set of countries. The main idea was to assess the similarity of the joint behavior of the variables throughout the time period under consideration to identify groups of similar countries and to construct the separate regression models for them. Therefore, the original time series were used as the objects of clustering. The hierarchical agglomerate algorithm k-medoids was used. The sampled objects were used as the centers of the clusters obtained, since determining the centroid when working with time series involves additional difficulties. The number of clusters used the silhouette coefficient. After the cluster analysis it was possible to significantly improve the predictive power of the models: for example, in the one of the clusters, MAPE error was only 0,82%, which makes it possible to conclude that this forecast is highly reliable in the short term. The obtained predicted values of the developed models have a relatively low level of error and can be used to make decisions on the resource provision of the hospital by medical personnel. The research displays the strong dependencies between the demand for the medical services and the modern medical equipment variable, which highlights the importance of the technological component for the successful development of the medical facility. Currently, data analysis has a huge potential, which allows to significantly improving health services. Medical institutions that are the first to introduce these technologies will certainly have a competitive advantage.

Keywords: data analysis, demand modeling, healthcare, medical facilities

Procedia PDF Downloads 133
24448 Topological Analyses of Unstructured Peer to Peer Systems: A Survey

Authors: Hend Alrasheed

Abstract:

Due to their different properties that have led to avoid several limitations of classic client/server systems, there has been a great interest in the development and the improvement of different peer to peer systems. Understanding the properties of complex peer to peer networks is essential for their future improvements. It was shown that the performances of peer to peer protocols are directly related to their underlying topologies. Therefore, multiple efforts have analyzed the topologies of different peer to peer systems. This study presents an overview of major findings of close experimental analyses to different topologies of three unstructured peer to peer systems: BitTorrent, Gnutella, and FreeNet.

Keywords: peer to peer networks, network topology, graph diameter, clustering coefficient, small-world property, random graph, degree distribution

Procedia PDF Downloads 369
24447 Mass Polarization in Three-Body System with Two Identical Particles

Authors: Igor Filikhin, Vladimir M. Suslov, Roman Ya. Kezerashvili, Branislav Vlahivic

Abstract:

The mass-polarization term of the three-body kinetic energy operator is evaluated for different systems which include two identical particles: A+A+B. The term has to be taken into account for the analysis of AB- and AA-interactions based on experimental data for two- and three-body ground state energies. In this study, we present three-body calculations within the framework of a potential model for the kaonic clusters K−K−p and ppK−, nucleus 3H and hypernucleus 6 ΛΛHe. The systems are well clustering as A+ (A+B) with a ground state energy E2 for the pair A+B. The calculations are performed using the method of the Faddeev equations in configuration space. The phenomenological pair potentials were used. We show a correlation between the mass ratio mA/mB and the value δB of the mass-polarization term. For bosonic-like systems, this value is defined as δB = 2E2 − E3, where E3 is three-body energy when VAA = 0. For the systems including three particles with spin(isospin), the models with average AB-potentials are used. In this case, the Faddeev equations become a scalar one like for the bosonic-like system αΛΛ. We show that the additional energy conected with the mass-polarization term can be decomposite to a sum of the two parts: exchenge related and reduced mass related. The state of the system can be described as the following: the particle A1 is bound within the A + B pair with the energy E2, and the second particle A2 is bound with the pair with the energy E3 − E2. Due to the identity of A particles, the particles A1 and A2 are interchangeable in the pair A + B. We shown that the mass polarization δB correlates with a type of AB potential using the system αΛΛ as an example.

Keywords: three-body systems, mass polarization, Faddeev equations, nuclear interactions

Procedia PDF Downloads 358
24446 Cluster-Based Multi-Path Routing Algorithm in Wireless Sensor Networks

Authors: Si-Gwan Kim

Abstract:

Small-size and low-power sensors with sensing, signal processing and wireless communication capabilities is suitable for the wireless sensor networks. Due to the limited resources and battery constraints, complex routing algorithms used for the ad-hoc networks cannot be employed in sensor networks. In this paper, we propose node-disjoint multi-path hexagon-based routing algorithms in wireless sensor networks. We suggest the details of the algorithm and compare it with other works. Simulation results show that the proposed scheme achieves better performance in terms of efficiency and message delivery ratio.

Keywords: clustering, multi-path, routing protocol, sensor network

Procedia PDF Downloads 388
24445 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 51
24444 The Economic Limitations of Defining Data Ownership Rights

Authors: Kacper Tomasz Kröber-Mulawa

Abstract:

This paper will address the topic of data ownership from an economic perspective, and examples of economic limitations of data property rights will be provided, which have been identified using methods and approaches of economic analysis of law. To properly build a background for the economic focus, in the beginning a short perspective of data and data ownership in the EU’s legal system will be provided. It will include a short introduction to its political and social importance and highlight relevant viewpoints. This will stress the importance of a Single Market for data but also far-reaching regulations of data governance and privacy (including the distinction of personal and non-personal data, data held by public bodies and private businesses). The main discussion of this paper will build upon the briefly referred to legal basis as well as methods and approaches of economic analysis of law.

Keywords: antitrust, data, data ownership, digital economy, property rights

Procedia PDF Downloads 65
24443 Orphan Node Inclusion Protocol for Wireless Sensor Network

Authors: Sandeep Singh Waraich

Abstract:

Wireless sensor network (WSN ) consists of a large number of sensor nodes. The disparity in their energy consumption usually lead to the loss of equilibrium in wireless sensor network which may further results in an energy hole problem in wireless network. In this paper, we have considered the inclusion of orphan nodes which usually remain unutilized as intermediate nodes in multi-hop routing. The Orphan Node Inclusion (ONI) Protocol lets the cluster member to bring the orphan nodes into their clusters, thereby saving important resources and increasing network lifetime in critical applications of WSN.

Keywords: wireless sensor network, orphan node, clustering, ONI protocol

Procedia PDF Downloads 408
24442 Protecting the Cloud Computing Data Through the Data Backups

Authors: Abdullah Alsaeed

Abstract:

Virtualized computing and cloud computing infrastructures are no longer fuzz or marketing term. They are a core reality in today’s corporate Information Technology (IT) organizations. Hence, developing an effective and efficient methodologies for data backup and data recovery is required more than any time. The purpose of data backup and recovery techniques are to assist the organizations to strategize the business continuity and disaster recovery approaches. In order to accomplish this strategic objective, a variety of mechanism were proposed in the recent years. This research paper will explore and examine the latest techniques and solutions to provide data backup and restoration for the cloud computing platforms.

Keywords: data backup, data recovery, cloud computing, business continuity, disaster recovery, cost-effective, data encryption.

Procedia PDF Downloads 74
24441 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 503
24440 Numerical Study of Effects of Air Dam on the Flow Field and Pressure Distribution of a Passenger Car

Authors: Min Ye Koo, Ji Ho Ahn, Byung Il You, Gyo Woo Lee

Abstract:

Everything that is attached to the outside of the vehicle to improve the driving performance of the vehicle by changing the flow characteristics of the surrounding air or to pursue the external personality is called a tuning part. Typical tuning components include front or rear air dam, also known as spoilers, splitter, and side air dam. Particularly, the front air dam prevents the airflow flowing into the lower portion of the vehicle and increases the amount of air flow to the side and front of the vehicle body, thereby reducing lift force generation that lifts the vehicle body, and thus, improving the steering and driving performance of the vehicle. The purpose of this study was to investigate the role of anterior air dam in the flow around a sedan passenger car using computational fluid dynamics. The effects of flow velocity, trajectory of fluid particles on static pressure distribution and pressure distribution on body surface were investigated by varying flow velocity and size of air dam. As a result, it has been confirmed that the front air dam improves the flow characteristics, thereby reducing the generation of lift force of the vehicle, so it helps in steering and driving characteristics.

Keywords: numerical study, air dam, flow field, pressure distribution

Procedia PDF Downloads 195
24439 Customer Data Analysis Model Using Business Intelligence Tools in Telecommunication Companies

Authors: Monica Lia

Abstract:

This article presents a customer data analysis model using business intelligence tools for data modelling, transforming, data visualization and dynamic reports building. Economic organizational customer’s analysis is made based on the information from the transactional systems of the organization. The paper presents how to develop the data model starting for the data that companies have inside their own operational systems. The owned data can be transformed into useful information about customers using business intelligence tool. For a mature market, knowing the information inside the data and making forecast for strategic decision become more important. Business Intelligence tools are used in business organization as support for decision-making.

Keywords: customer analysis, business intelligence, data warehouse, data mining, decisions, self-service reports, interactive visual analysis, and dynamic dashboards, use cases diagram, process modelling, logical data model, data mart, ETL, star schema, OLAP, data universes

Procedia PDF Downloads 419
24438 Analysis of the Role of Population Ageing on Crosstown Roads' Traffic Accidents Using Latent Class Clustering

Authors: N. Casado-Sanz, B. Guirao

Abstract:

The population aged 65 and over is projected to double in the coming decades. Due to this increase, driver population is expected to grow and in the near future, all countries will be faced with population aging of varying intensity and in unique time frames. This is the greatest challenge facing industrialized nations and due to this fact, the study of the relationships of dependency between population aging and road safety is becoming increasingly relevant. Although the deterioration of driving skills in the elderly has been analyzed in depth, to our knowledge few research studies have focused on the road infrastructure and the mobility of this particular group of users. In Spain, crosstown roads have one of the highest fatality rates. These rural routes have a higher percentage of elderly people who are more dependent on driving due to the absence or limitations of urban public transportation. Analysing road safety in these routes is very complex because of the variety of the features, the dispersion of the data and the complete lack of related literature. The objective of this paper is to identify key factors that cause traffic accidents. The individuals under study were the accidents with killed or seriously injured in Spanish crosstown roads during the period 2006-2015. Latent cluster analysis was applied as a preliminary tool for segmentation of accidents, considering population aging as the main input among other socioeconomic indicators. Subsequently, a linear regression analysis was carried out to estimate the degree of dependence between the accident rate and the variables that define each group. The results show that segmenting the data is very interesting and provides further information. Additionally, the results revealed the clear influence of the aging variable in the clusters obtained. Other variables related to infrastructure and mobility levels, such as the crosstown roads layout and the traffic intensity aimed to be one of the key factors in the causality of road accidents.

Keywords: cluster analysis, population ageing, rural roads, road safety

Procedia PDF Downloads 98
24437 Structuring Highly Iterative Product Development Projects by Using Agile-Indicators

Authors: Guenther Schuh, Michael Riesener, Frederic Diels

Abstract:

Nowadays, manufacturing companies are faced with the challenge of meeting heterogeneous customer requirements in short product life cycles with a variety of product functions. So far, some of the functional requirements remain unknown until late stages of the product development. A way to handle these uncertainties is the highly iterative product development (HIP) approach. By structuring the development project as a highly iterative process, this method provides customer oriented and marketable products. There are first approaches for combined, hybrid models comprising deterministic-normative methods like the Stage-Gate process and empirical-adaptive development methods like SCRUM on a project management level. However, almost unconsidered is the question, which development scopes can preferably be realized with either empirical-adaptive or deterministic-normative approaches. In this context, a development scope constitutes a self-contained section of the overall development objective. Therefore, this paper focuses on a methodology that deals with the uncertainty of requirements within the early development stages and the corresponding selection of the most appropriate development approach. For this purpose, internal influencing factors like a company’s technology ability, the prototype manufacturability and the potential solution space as well as external factors like the market accuracy, relevance and volatility will be analyzed and combined into an Agile-Indicator. The Agile-Indicator is derived in three steps. First of all, it is necessary to rate each internal and external factor in terms of the importance for the overall development task. Secondly, each requirement has to be evaluated for every single internal and external factor appropriate to their suitability for empirical-adaptive development. Finally, the total sums of internal and external side are composed in the Agile-Indicator. Thus, the Agile-Indicator constitutes a company-specific and application-related criterion, on which the allocation of empirical-adaptive and deterministic-normative development scopes can be made. In a last step, this indicator will be used for a specific clustering of development scopes by application of the fuzzy c-means (FCM) clustering algorithm. The FCM-method determines sub-clusters within functional clusters based on the empirical-adaptive environmental impact of the Agile-Indicator. By means of the methodology presented in this paper, it is possible to classify requirements, which are uncertainly carried out by the market, into empirical-adaptive or deterministic-normative development scopes.

Keywords: agile, highly iterative development, agile-indicator, product development

Procedia PDF Downloads 234
24436 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: big data, open data, productivity, data governance

Procedia PDF Downloads 357
24435 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 97
24434 Corporate Social Responsibility and Corporate Reputation: A Bibliometric Analysis

Authors: Songdi Li, Louise Spry, Tony Woodall

Abstract:

Nowadays, Corporate Social responsibility (CSR) is becoming a buzz word, and more and more academics are putting efforts on CSR studies. It is believed that CSR could influence Corporate Reputation (CR), and they hold a favourable view that CSR leads to a positive CR. To be specific, the CSR related activities in the reputational context have been regarded as ways that associate to excellent financial performance, value creation, etc. Also, it is argued that CSR and CR are two sides of one coin; hence, to some extent, doing CSR is equal to establishing a good reputation. Still, there is no consensus of the CSR-CR relationship in the literature; thus, a systematic literature review is highly in need. This research conducts a systematic literature review with both bibliometric and content analysis. Data are selected from English language sources, and academic journal articles only, then, keyword combinations are applied to identify relevant sources. Data from Scopus and WoS are gathered for bibliometric analysis. Scopus search results were saved in RIS and CSV formats, and Web of Science (WoS) data were saved in TXT format and CSV formats in order to process data in the Bibexcel software for further analysis which later will be visualised by the software VOSviewer. Also, content analysis was applied to analyse the data clusters and the key articles. In terms of the topic of CSR-CR, this literature review with bibliometric analysis has made four achievements. First, this paper has developed a systematic study which quantitatively depicts the knowledge structure of CSR and CR by identifying terms closely related to CSR-CR (such as ‘corporate governance’) and clustering subtopics emerged in co-citation analysis. Second, content analysis is performed to acquire insight on the findings of bibliometric analysis in the discussion section. And it highlights some insightful implications for the future research agenda, for example, a psychological link between CSR-CR is identified from the result; also, emerging economies and qualitative research methods are new elements emerged in the CSR-CR big picture. Third, a multidisciplinary perspective presents through the whole bibliometric analysis mapping and co-word and co-citation analysis; hence, this work builds a structure of interdisciplinary perspective which potentially leads to an integrated conceptual framework in the future. Finally, Scopus and WoS are compared and contrasted in this paper; as a result, Scopus which has more depth and comprehensive data is suggested as a tool for future bibliometric analysis studies. Overall, this paper has fulfilled its initial purposes and contributed to the literature. To the author’s best knowledge, this paper conducted the first literature review of CSR-CR researches that applied both bibliometric analysis and content analysis; therefore, this paper achieves its methodological originality. And this dual approach brings advantages of carrying out a comprehensive and semantic exploration in the area of CSR-CR in a scientific and realistic method. Admittedly, its work might exist subjective bias in terms of search terms selection and paper selection; hence triangulation could reduce the subjective bias to some degree.

Keywords: corporate social responsibility, corporate reputation, bibliometric analysis, software program

Procedia PDF Downloads 116
24433 A Systematic Review on Challenges in Big Data Environment

Authors: Rimmy Yadav, Anmol Preet Kaur

Abstract:

Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.

Keywords: big data, privacy, data management, network and energy consumption

Procedia PDF Downloads 292
24432 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 506
24431 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 371
24430 Interaction between River and City Morphology

Authors: Ehsan Abshirini

Abstract:

Rivers as one of the most important topographic factors have played a strategic role not only on the appearance of cities but they also affect the structure and morphology of cities. In this paper author intends to find out how a city in its physical network interacts with a river flowing inside. The pilot study is Angers, a city in western France, in which it is influenced by the Maine River. To this purpose space syntax method integrating with GIS is used to extract the properties of physical form of cities in terms of global and local integration value, accessibility and choice value. Simulating the state of absence of river in this city and comparing the result to the current state of city according to the effect of river on the morphology of areas located in different banks of river is also part of interest in this paper. The results show that although a river is not comparable to the city based on size and the area occupied by, it has a significant effect on the form of the city in both global and local properties. In addition, this study endorses that tracking the effect of river-cities and their interaction to rivers in a hybrid of space syntax and GIS may lead researchers to improve their interpretation of physical form of these types of cities.

Keywords: river-cities, Physical form, space syntax properties, GIS, topographic factor

Procedia PDF Downloads 419
24429 An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing

Authors: Aleksandra Zysk, Pawel Badura

Abstract:

Recognizing and controlling vocal registers during singing is a difficult task for beginner vocalist. It requires among others identifying which part of natural resonators is being used when a sound propagates through the body. Thus, an application has been designed allowing for sound recording, automatic vocal register recognition (VRR), and a graphical user interface providing real-time visualization of the signal and recognition results. Six spectral features are determined for each time frame and passed to the support vector machine classifier yielding a binary decision on the head or chest register assignment of the segment. The classification training and testing data have been recorded by ten professional female singers (soprano, aged 19-29) performing sounds for both chest and head register. The classification accuracy exceeded 93% in each of various validation schemes. Apart from a hard two-class clustering, the support vector classifier returns also information on the distance between particular feature vector and the discrimination hyperplane in a feature space. Such an information reflects the level of certainty of the vocal register classification in a fuzzy way. Thus, the designed recognition and training application is able to assess and visualize the continuous trend in singing in a user-friendly graphical mode providing an easy way to control the vocal emission.

Keywords: classification, singing, spectral analysis, vocal emission, vocal register

Procedia PDF Downloads 292
24428 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 427
24427 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 167
24426 Effect of Hull-Less Barley Flakes and Malt Extract on Yoghurt Quality

Authors: Ilze Beitane, Evita Straumite

Abstract:

The aim of the research was to evaluate the influence of flakes from biologically activated hull-less barley grain and malt extract on quality of yoghurt during its storage. The results showed that the concentration of added malt extract and storage time influenced the changes of pH and lactic acid in yoghurt samples. Sensory properties-aroma, taste, consistency and appearance-of yoghurt enriched with flakes from biologically activated hull-less barley grain and malt extract changed significantly (p<0.05) during storage. Yoghurt with increased proportion of malt extract had sweeter taste and more flowing consistency. Sensory properties (taste, aroma, consistency, and appearance) of yoghurt samples enriched with 5% flakes from biologically activated hull-less barley grain (YFBG 5%) and 5% flakes from biologically activated hull-less barley grain and 2% malt extract (YFBG 5% ME 2%) did not change significantly during one week of storage.

Keywords: Barley flakes, malt extract, yoghurt, sensory analysis

Procedia PDF Downloads 290