Search results for: data clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24549

Search results for: data clustering

24129 Study on the Layout of 15-Minute Community-Life Circle in the State of “Community Segregation” Based on Poi: Shengwei Community and Other Two Communities in Chongqing

Authors: Siyuan Cai

Abstract:

This paper takes community segregation during major infectious diseases as the background, based on the physiological needs and safety needs of citizens during home segregation, and based on the selection of convenient facilities and medical facilities as the main research objects. Based on the POI data of public facilities in Chongqing, the spatial distribution characteristics of the convenience and medical facilities in the 15-minute living circle centered on three neighborhoods in Shapingba, namely Shengwei Community, Anju Commmunity and Fengtian Garden Community, were explored by means of GIS spatial analysis. The results show that the spatial distribution of convenience and medical facilities in this area has significant clustering characteristics, with a point-like distribution pattern of "dense in the west and sparse in the east", and a grouped and multi-polar spatial structure. The spatial structure is multi-polar and has an obvious tendency to the intersections and residential areas with dense pedestrian flow. This study provides a preliminary exploration of the distribution of medical and convenience facilities within the 15-minute living circle of a segregated community, which makes up for the lack of spatial research in this area.

Keywords: ArcGIS, community segregation, convenient facilities; distribution pattern, medical facilities, POI, 15-minute community life circle

Procedia PDF Downloads 92
24128 The Economic Limitations of Defining Data Ownership Rights

Authors: Kacper Tomasz Kröber-Mulawa

Abstract:

This paper will address the topic of data ownership from an economic perspective, and examples of economic limitations of data property rights will be provided, which have been identified using methods and approaches of economic analysis of law. To properly build a background for the economic focus, in the beginning a short perspective of data and data ownership in the EU’s legal system will be provided. It will include a short introduction to its political and social importance and highlight relevant viewpoints. This will stress the importance of a Single Market for data but also far-reaching regulations of data governance and privacy (including the distinction of personal and non-personal data, data held by public bodies and private businesses). The main discussion of this paper will build upon the briefly referred to legal basis as well as methods and approaches of economic analysis of law.

Keywords: antitrust, data, data ownership, digital economy, property rights

Procedia PDF Downloads 57
24127 The Relationship Between Car Drivers' Background Information and Risky Events In I- Dreams Project

Authors: Dagim Dessalegn Haile

Abstract:

This study investigated the interaction between the drivers' socio-demographic background information (age, gender, and driving experience) and the risky events score in the i-DREAMS platform. Further, the relationship between the participants' background driving behavior and the i-DREAMS platform behavioral output scores of risky events was also investigated. The i-DREAMS acronym stands for Smart Driver and Road Environment Assessment and Monitoring System. It is a European Union Horizon 2020 funded project consisting of 13 partners, researchers, and industry partners from 8 countries. A total of 25 Belgian car drivers (16 male and nine female) were considered for analysis. Drivers' ages were categorized into ages 18-25, 26-45, 46-65, and 65 and older. Drivers' driving experience was also categorized into four groups: 1-15, 16-30, 31-45, and 46-60 years. Drivers are classified into two clusters based on the recorded score for risky events during phase 1 (baseline) using risky events; acceleration, deceleration, speeding, tailgating, overtaking, and lane discipline. Agglomerative hierarchical clustering using SPSS shows Cluster 1 drivers are safer drivers, and Cluster 2 drivers are identified as risky drivers. The analysis result indicated no significant relationship between age groups, gender, and experience groups except for risky events like acceleration, tailgating, and overtaking in a few phases. This is mainly because the fewer participants create less variability of socio-demographic background groups. Repeated measure ANOVA shows that cluster 2 drivers improved more than cluster 1 drivers for tailgating, lane discipline, and speeding events. A positive relationship between background drivers' behavior and i-DREAMS platform behavioral output scores is observed. It implies that car drivers who in the questionnaire data indicate committing more risky driving behavior demonstrate more risky driver behavior in the i-DREAMS observed driving data.

Keywords: i-dreams, car drivers, socio-demographic background, risky events

Procedia PDF Downloads 47
24126 Protecting the Cloud Computing Data Through the Data Backups

Authors: Abdullah Alsaeed

Abstract:

Virtualized computing and cloud computing infrastructures are no longer fuzz or marketing term. They are a core reality in today’s corporate Information Technology (IT) organizations. Hence, developing an effective and efficient methodologies for data backup and data recovery is required more than any time. The purpose of data backup and recovery techniques are to assist the organizations to strategize the business continuity and disaster recovery approaches. In order to accomplish this strategic objective, a variety of mechanism were proposed in the recent years. This research paper will explore and examine the latest techniques and solutions to provide data backup and restoration for the cloud computing platforms.

Keywords: data backup, data recovery, cloud computing, business continuity, disaster recovery, cost-effective, data encryption.

Procedia PDF Downloads 60
24125 Design of a Fuzzy Luenberger Observer for Fault Nonlinear System

Authors: Mounir Bekaik, Messaoud Ramdani

Abstract:

We present in this work a new technique of stabilization for fault nonlinear systems. The approach we adopt focus on a fuzzy Luenverger observer. The T-S approximation of the nonlinear observer is based on fuzzy C-Means clustering algorithm to find local linear subsystems. The MOESP identification approach was applied to design an empirical model describing the subsystems state variables. The gain of the observer is given by the minimization of the estimation error through Lyapunov-krasovskii functional and LMI approach. We consider a three tank hydraulic system for an illustrative example.

Keywords: nonlinear system, fuzzy, faults, TS, Lyapunov-Krasovskii, observer

Procedia PDF Downloads 307
24124 Small and Medium Enterprises Owner-Managers/Entrepreneurs and Their Risk Perception in Songkhla Province, Thailand

Authors: Patraporn Kaewkhanitarak, Weerawan Marangkun

Abstract:

The objective of this study was to explore the establishment and to investigate the relationship between the gender (male or female) of SME owner-managers/ entrepreneurs and their risk perception in business activity. The study examines the data by interviewing 76 SME owner-managers/entrepreneurs’ responses (37 males, 39 females) in manufacturing, finance, human resources and marketing sector in the economic regions of Songkhla province, Thailand. This study found that four tools which were operation, cash flow, staff, and new market were perceived by the SME owner-managers/entrepreneurs at high level. However, male and female SME owner-managers/entrepreneurs perceived some factors such as the age of SME owner-managers/entrepreneurs, the duration of firm operation, type of firm, and type of business without significant differences. In contrast, the gender affected the risk perception about increasing cost, fierce competition, leapfrog development of firm, substandard staff, namely that male and female perceived these factors with significant differences. According to the research, SME owner-managers/entrepreneurs should develop their risk management competency to deal with the risk efficiently. Secondly, SME firms should gather into groups. Furthermore, it was shown that the five key tools used to manage these risky situations were the use of managerial competencies and clustering.

Keywords: risk perception, owner-managers/entrepreneurs, SME, Songkhla, Thailand

Procedia PDF Downloads 407
24123 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 494
24122 Customer Data Analysis Model Using Business Intelligence Tools in Telecommunication Companies

Authors: Monica Lia

Abstract:

This article presents a customer data analysis model using business intelligence tools for data modelling, transforming, data visualization and dynamic reports building. Economic organizational customer’s analysis is made based on the information from the transactional systems of the organization. The paper presents how to develop the data model starting for the data that companies have inside their own operational systems. The owned data can be transformed into useful information about customers using business intelligence tool. For a mature market, knowing the information inside the data and making forecast for strategic decision become more important. Business Intelligence tools are used in business organization as support for decision-making.

Keywords: customer analysis, business intelligence, data warehouse, data mining, decisions, self-service reports, interactive visual analysis, and dynamic dashboards, use cases diagram, process modelling, logical data model, data mart, ETL, star schema, OLAP, data universes

Procedia PDF Downloads 400
24121 Modeling the Demand for the Healthcare Services Using Data Analysis Techniques

Authors: Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Roman D. Zaitsev

Abstract:

Rapidly evolving modern data analysis technologies in healthcare play a large role in understanding the operation of the system and its characteristics. Nowadays, one of the key tasks in urban healthcare is to optimize the resource allocation. Thus, the application of data analysis in medical institutions to solve optimization problems determines the significance of this study. The purpose of this research was to establish the dependence between the indicators of the effectiveness of the medical institution and its resources. Hospital discharges by diagnosis; hospital days of in-patients and in-patient average length of stay were selected as the performance indicators and the demand of the medical facility. The hospital beds by type of care, medical technology (magnetic resonance tomography, gamma cameras, angiographic complexes and lithotripters) and physicians characterized the resource provision of medical institutions for the developed models. The data source for the research was an open database of the statistical service Eurostat. The choice of the source is due to the fact that the databases contain complete and open information necessary for research tasks in the field of public health. In addition, the statistical database has a user-friendly interface that allows you to quickly build analytical reports. The study provides information on 28 European for the period from 2007 to 2016. For all countries included in the study, with the most accurate and complete data for the period under review, predictive models were developed based on historical panel data. An attempt to improve the quality and the interpretation of the models was made by cluster analysis of the investigated set of countries. The main idea was to assess the similarity of the joint behavior of the variables throughout the time period under consideration to identify groups of similar countries and to construct the separate regression models for them. Therefore, the original time series were used as the objects of clustering. The hierarchical agglomerate algorithm k-medoids was used. The sampled objects were used as the centers of the clusters obtained, since determining the centroid when working with time series involves additional difficulties. The number of clusters used the silhouette coefficient. After the cluster analysis it was possible to significantly improve the predictive power of the models: for example, in the one of the clusters, MAPE error was only 0,82%, which makes it possible to conclude that this forecast is highly reliable in the short term. The obtained predicted values of the developed models have a relatively low level of error and can be used to make decisions on the resource provision of the hospital by medical personnel. The research displays the strong dependencies between the demand for the medical services and the modern medical equipment variable, which highlights the importance of the technological component for the successful development of the medical facility. Currently, data analysis has a huge potential, which allows to significantly improving health services. Medical institutions that are the first to introduce these technologies will certainly have a competitive advantage.

Keywords: data analysis, demand modeling, healthcare, medical facilities

Procedia PDF Downloads 118
24120 Using Two-Mode Network to Access the Connections of Film Festivals

Authors: Qiankun Zhong

Abstract:

In a global cultural context, film festival awards become authorities to define the aesthetic value of films. To study which genres and producing countries are valued by different film festivals and how those evaluations interact with each other, this research explored the interactions between the film festivals through their selection of movies and the factors that lead to the tendency of film festivals to nominate the same movies. To do this, the author employed a two-mode network on the movies that won the highest awards at five international film festivals with the highest attendance in the past ten years (the Venice Film Festival, the Cannes Film Festival, the Toronto International Film Festival, Sundance Film Festival, and the Berlin International Film Festival) and the film festivals that nominated those movies. The title, genre, producing country and language of 50 movies, and the range (regional, national or international) and organizing country or area of 129 film festivals were collected. These created networks connected by nominating the same films and awarding the same movies. The author then assessed the density and centrality of these networks to answer the question: What are the film festivals that tend to have more shared values with other festivals? Based on the Eigenvector centrality of the two-mode network, Palm Springs, Robert Festival, Toronto, Chicago, and San Sebastian are the festivals that tend to nominate commonly appreciated movies. In contrast, Black Movie Film Festival has the unique value of generally not sharing nominations with other film festivals. A homophily test was applied to access the clustering effects of film and film festivals. The result showed that movie genres (E-I index=0.55) and geographic location (E-I index=0.35) are possible indicators of film festival clustering. A blockmodel was also created to examine the structural roles of the film festivals and their meaning in real-world context. By analyzing the same blocks with film festival attributes, it was identified that film festivals either organized in the same area, with the same history, or with the same attitude on independent films would occupy the same structural roles in the network. Through the interpretation of the blocks, language was identified as an indicator that contributes to the role position of a film festival. Comparing the result of blockmodeling in the different periods, it is seen that international film festivals contrast with the Hollywood industry’s dominant value. The structural role dynamics provide evidence for a multi-value film festival network.

Keywords: film festivals, film studies, media industry studies, network analysis

Procedia PDF Downloads 294
24119 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: big data, open data, productivity, data governance

Procedia PDF Downloads 346
24118 Detection and Quantification of Active Pharmaceutical Ingredients as Adulterants in Garcinia cambogia Slimming Preparations Using NIR Spectroscopy Combined with Chemometrics

Authors: Dina Ahmed Selim, Eman Shawky Anwar, Rasha Mohamed Abu El-Khair

Abstract:

A rapid, simple and efficient method with minimal sample treatment was developed for authentication of Garcinia cambogia fruit peel powder, along with determining undeclared active pharmaceutical ingredients (APIs) in its herbal slimming dietary supplements using near infrared spectroscopy combined with chemometrics. Five featured adulterants, including sibutramine, metformin, orlistat, ephedrine, and theophylline are selected as target compounds. The Near infrared spectral data matrix of authentic Garcinia cambogia fruit peel and specimens degraded by intentional contamination with the five selected APIs was subjected to hierarchical clustering analysis to investigate their bundling figure. SIMCA models were established to ensure the genuiness of Garcinia cambogia fruit peel which resulted in perfect classification of all tested specimens. Adulterated samples were utilized for construction of PLSR models based on different APIs contents at minute levels of fraud practices (LOQ < 0.2% w/w).The suggested approach can be applied to enhance and guarantee the safety and quality of Garcinia fruit peel powder as raw material and in dietary supplements.

Keywords: Garcinia cambogia, Quality control, NIR spectroscopy, Chemometrics

Procedia PDF Downloads 59
24117 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 88
24116 A Systematic Review on Challenges in Big Data Environment

Authors: Rimmy Yadav, Anmol Preet Kaur

Abstract:

Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.

Keywords: big data, privacy, data management, network and energy consumption

Procedia PDF Downloads 281
24115 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 492
24114 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 361
24113 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 410
24112 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 157
24111 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 37
24110 Mass Polarization in Three-Body System with Two Identical Particles

Authors: Igor Filikhin, Vladimir M. Suslov, Roman Ya. Kezerashvili, Branislav Vlahivic

Abstract:

The mass-polarization term of the three-body kinetic energy operator is evaluated for different systems which include two identical particles: A+A+B. The term has to be taken into account for the analysis of AB- and AA-interactions based on experimental data for two- and three-body ground state energies. In this study, we present three-body calculations within the framework of a potential model for the kaonic clusters K−K−p and ppK−, nucleus 3H and hypernucleus 6 ΛΛHe. The systems are well clustering as A+ (A+B) with a ground state energy E2 for the pair A+B. The calculations are performed using the method of the Faddeev equations in configuration space. The phenomenological pair potentials were used. We show a correlation between the mass ratio mA/mB and the value δB of the mass-polarization term. For bosonic-like systems, this value is defined as δB = 2E2 − E3, where E3 is three-body energy when VAA = 0. For the systems including three particles with spin(isospin), the models with average AB-potentials are used. In this case, the Faddeev equations become a scalar one like for the bosonic-like system αΛΛ. We show that the additional energy conected with the mass-polarization term can be decomposite to a sum of the two parts: exchenge related and reduced mass related. The state of the system can be described as the following: the particle A1 is bound within the A + B pair with the energy E2, and the second particle A2 is bound with the pair with the energy E3 − E2. Due to the identity of A particles, the particles A1 and A2 are interchangeable in the pair A + B. We shown that the mass polarization δB correlates with a type of AB potential using the system αΛΛ as an example.

Keywords: three-body systems, mass polarization, Faddeev equations, nuclear interactions

Procedia PDF Downloads 343
24109 The Various Legal Dimensions of Genomic Data

Authors: Amy Gooden

Abstract:

When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.

Keywords: artificial intelligence, data, law, genomics, rights

Procedia PDF Downloads 123
24108 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 217
24107 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 568
24106 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo

Abstract:

This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 375
24105 Topological Analyses of Unstructured Peer to Peer Systems: A Survey

Authors: Hend Alrasheed

Abstract:

Due to their different properties that have led to avoid several limitations of classic client/server systems, there has been a great interest in the development and the improvement of different peer to peer systems. Understanding the properties of complex peer to peer networks is essential for their future improvements. It was shown that the performances of peer to peer protocols are directly related to their underlying topologies. Therefore, multiple efforts have analyzed the topologies of different peer to peer systems. This study presents an overview of major findings of close experimental analyses to different topologies of three unstructured peer to peer systems: BitTorrent, Gnutella, and FreeNet.

Keywords: peer to peer networks, network topology, graph diameter, clustering coefficient, small-world property, random graph, degree distribution

Procedia PDF Downloads 358
24104 Analysis of the Role of Population Ageing on Crosstown Roads' Traffic Accidents Using Latent Class Clustering

Authors: N. Casado-Sanz, B. Guirao

Abstract:

The population aged 65 and over is projected to double in the coming decades. Due to this increase, driver population is expected to grow and in the near future, all countries will be faced with population aging of varying intensity and in unique time frames. This is the greatest challenge facing industrialized nations and due to this fact, the study of the relationships of dependency between population aging and road safety is becoming increasingly relevant. Although the deterioration of driving skills in the elderly has been analyzed in depth, to our knowledge few research studies have focused on the road infrastructure and the mobility of this particular group of users. In Spain, crosstown roads have one of the highest fatality rates. These rural routes have a higher percentage of elderly people who are more dependent on driving due to the absence or limitations of urban public transportation. Analysing road safety in these routes is very complex because of the variety of the features, the dispersion of the data and the complete lack of related literature. The objective of this paper is to identify key factors that cause traffic accidents. The individuals under study were the accidents with killed or seriously injured in Spanish crosstown roads during the period 2006-2015. Latent cluster analysis was applied as a preliminary tool for segmentation of accidents, considering population aging as the main input among other socioeconomic indicators. Subsequently, a linear regression analysis was carried out to estimate the degree of dependence between the accident rate and the variables that define each group. The results show that segmenting the data is very interesting and provides further information. Additionally, the results revealed the clear influence of the aging variable in the clusters obtained. Other variables related to infrastructure and mobility levels, such as the crosstown roads layout and the traffic intensity aimed to be one of the key factors in the causality of road accidents.

Keywords: cluster analysis, population ageing, rural roads, road safety

Procedia PDF Downloads 86
24103 Corporate Social Responsibility and Corporate Reputation: A Bibliometric Analysis

Authors: Songdi Li, Louise Spry, Tony Woodall

Abstract:

Nowadays, Corporate Social responsibility (CSR) is becoming a buzz word, and more and more academics are putting efforts on CSR studies. It is believed that CSR could influence Corporate Reputation (CR), and they hold a favourable view that CSR leads to a positive CR. To be specific, the CSR related activities in the reputational context have been regarded as ways that associate to excellent financial performance, value creation, etc. Also, it is argued that CSR and CR are two sides of one coin; hence, to some extent, doing CSR is equal to establishing a good reputation. Still, there is no consensus of the CSR-CR relationship in the literature; thus, a systematic literature review is highly in need. This research conducts a systematic literature review with both bibliometric and content analysis. Data are selected from English language sources, and academic journal articles only, then, keyword combinations are applied to identify relevant sources. Data from Scopus and WoS are gathered for bibliometric analysis. Scopus search results were saved in RIS and CSV formats, and Web of Science (WoS) data were saved in TXT format and CSV formats in order to process data in the Bibexcel software for further analysis which later will be visualised by the software VOSviewer. Also, content analysis was applied to analyse the data clusters and the key articles. In terms of the topic of CSR-CR, this literature review with bibliometric analysis has made four achievements. First, this paper has developed a systematic study which quantitatively depicts the knowledge structure of CSR and CR by identifying terms closely related to CSR-CR (such as ‘corporate governance’) and clustering subtopics emerged in co-citation analysis. Second, content analysis is performed to acquire insight on the findings of bibliometric analysis in the discussion section. And it highlights some insightful implications for the future research agenda, for example, a psychological link between CSR-CR is identified from the result; also, emerging economies and qualitative research methods are new elements emerged in the CSR-CR big picture. Third, a multidisciplinary perspective presents through the whole bibliometric analysis mapping and co-word and co-citation analysis; hence, this work builds a structure of interdisciplinary perspective which potentially leads to an integrated conceptual framework in the future. Finally, Scopus and WoS are compared and contrasted in this paper; as a result, Scopus which has more depth and comprehensive data is suggested as a tool for future bibliometric analysis studies. Overall, this paper has fulfilled its initial purposes and contributed to the literature. To the author’s best knowledge, this paper conducted the first literature review of CSR-CR researches that applied both bibliometric analysis and content analysis; therefore, this paper achieves its methodological originality. And this dual approach brings advantages of carrying out a comprehensive and semantic exploration in the area of CSR-CR in a scientific and realistic method. Admittedly, its work might exist subjective bias in terms of search terms selection and paper selection; hence triangulation could reduce the subjective bias to some degree.

Keywords: corporate social responsibility, corporate reputation, bibliometric analysis, software program

Procedia PDF Downloads 107
24102 Cluster-Based Multi-Path Routing Algorithm in Wireless Sensor Networks

Authors: Si-Gwan Kim

Abstract:

Small-size and low-power sensors with sensing, signal processing and wireless communication capabilities is suitable for the wireless sensor networks. Due to the limited resources and battery constraints, complex routing algorithms used for the ad-hoc networks cannot be employed in sensor networks. In this paper, we propose node-disjoint multi-path hexagon-based routing algorithms in wireless sensor networks. We suggest the details of the algorithm and compare it with other works. Simulation results show that the proposed scheme achieves better performance in terms of efficiency and message delivery ratio.

Keywords: clustering, multi-path, routing protocol, sensor network

Procedia PDF Downloads 377
24101 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: biological ontology, linked data, semantic data integration, semantic web

Procedia PDF Downloads 422
24100 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 100