Search results for: data clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24549

Search results for: data clustering

24159 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 308
24158 Career Guidance System Using Machine Learning

Authors: Mane Darbinyan, Lusine Hayrapetyan, Elen Matevosyan

Abstract:

Artificial Intelligence in Education (AIED) has been created to help students get ready for the workforce, and over the past 25 years, it has grown significantly, offering a variety of technologies to support academic, institutional, and administrative services. However, this is still challenging, especially considering the labor market's rapid change. While choosing a career, people face various obstacles because they do not take into consideration their own preferences, which might lead to many other problems like shifting jobs, work stress, occupational infirmity, reduced productivity, and manual error. Besides preferences, people should evaluate properly their technical and non-technical skills, as well as their personalities. Professional counseling has become a difficult undertaking for counselors due to the wide range of career choices brought on by changing technological trends. It is necessary to close this gap by utilizing technology that makes sophisticated predictions about a person's career goals based on their personality. Hence, there is a need to create an automated model that would help in decision-making based on user inputs. Improving career guidance can be achieved by embedding machine learning into the career consulting ecosystem. There are various systems of career guidance that work based on the same logic, such as the classification of applicants, matching applications with appropriate departments or jobs, making predictions, and providing suitable recommendations. Methodologies like KNN, neural networks, K-means clustering, D-Tree, and many other advanced algorithms are applied in the fields of data and compute some data, which is helpful to predict the right careers. Besides helping users with their career choice, these systems provide numerous opportunities which are very useful while making this hard decision. They help the candidate to recognize where he/she specifically lacks sufficient skills so that the candidate can improve those skills. They are also capable of offering an e-learning platform, taking into account the user's lack of knowledge. Furthermore, users can be provided with details on a particular job, such as the abilities required to excel in that industry.

Keywords: career guidance system, machine learning, career prediction, predictive decision, data mining, technical and non-technical skills

Procedia PDF Downloads 48
24157 A Methodology of Using Fuzzy Logics and Data Analytics to Estimate the Life Cycle Indicators of Solar Photovoltaics

Authors: Thor Alexis Sazon, Alexander Guzman-Urbina, Yasuhiro Fukushima

Abstract:

This study outlines the method of how to develop a surrogate life cycle model based on fuzzy logic using three fuzzy inference methods: (1) the conventional Fuzzy Inference System (FIS), (2) the hybrid system of Data Analytics and Fuzzy Inference (DAFIS), which uses data clustering for defining the membership functions, and (3) the Adaptive-Neuro Fuzzy Inference System (ANFIS), a combination of fuzzy inference and artificial neural network. These methods were demonstrated with a case study where the Global Warming Potential (GWP) and the Levelized Cost of Energy (LCOE) of solar photovoltaic (PV) were estimated using Solar Irradiation, Module Efficiency, and Performance Ratio as inputs. The effects of using different fuzzy inference types, either Sugeno- or Mamdani-type, and of changing the number of input membership functions to the error between the calibration data and the model-generated outputs were also illustrated. The solution spaces of the three methods were consequently examined with a sensitivity analysis. ANFIS exhibited the lowest error while DAFIS gave slightly lower errors compared to FIS. Increasing the number of input membership functions helped with error reduction in some cases but, at times, resulted in the opposite. Sugeno-type models gave errors that are slightly lower than those of the Mamdani-type. While ANFIS is superior in terms of error minimization, it could generate solutions that are questionable, i.e. the negative GWP values of the Solar PV system when the inputs were all at the upper end of their range. This shows that the applicability of the ANFIS models highly depends on the range of cases at which it was calibrated. FIS and DAFIS generated more intuitive trends in the sensitivity runs. DAFIS demonstrated an optimal design point wherein increasing the input values does not improve the GWP and LCOE anymore. In the absence of data that could be used for calibration, conventional FIS presents a knowledge-based model that could be used for prediction. In the PV case study, conventional FIS generated errors that are just slightly higher than those of DAFIS. The inherent complexity of a Life Cycle study often hinders its widespread use in the industry and policy-making sectors. While the methodology does not guarantee a more accurate result compared to those generated by the Life Cycle Methodology, it does provide a relatively simpler way of generating knowledge- and data-based estimates that could be used during the initial design of a system.

Keywords: solar photovoltaic, fuzzy logic, inference system, artificial neural networks

Procedia PDF Downloads 140
24156 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 351
24155 Conjunctive Management of Surface and Groundwater Resources under Uncertainty: A Retrospective Optimization Approach

Authors: Julius M. Ndambuki, Gislar E. Kifanyi, Samuel N. Odai, Charles Gyamfi

Abstract:

Conjunctive management of surface and groundwater resources is a challenging task due to the spatial and temporal variability nature of hydrology as well as hydrogeology of the water storage systems. Surface water-groundwater hydrogeology is highly uncertain; thus it is imperative that this uncertainty is explicitly accounted for, when managing water resources. Various methodologies have been developed and applied by researchers in an attempt to account for the uncertainty. For example, simulation-optimization models are often used for conjunctive water resources management. However, direct application of such an approach in which all realizations are considered at each iteration of the optimization process leads to a very expensive optimization in terms of computational time, particularly when the number of realizations is large. The aim of this paper, therefore, is to introduce and apply an efficient approach referred to as Retrospective Optimization Approximation (ROA) that can be used for optimizing conjunctive use of surface water and groundwater over a multiple hydrogeological model simulations. This work is based on stochastic simulation-optimization framework using a recently emerged technique of sample average approximation (SAA) which is a sampling based method implemented within the Retrospective Optimization Approximation (ROA) approach. The ROA approach solves and evaluates a sequence of generated optimization sub-problems in an increasing number of realizations (sample size). Response matrix technique was used for linking simulation model with optimization procedure. The k-means clustering sampling technique was used to map the realizations. The methodology is demonstrated through the application to a hypothetical example. In the example, the optimization sub-problems generated were solved and analysed using “Active-Set” core optimizer implemented under MATLAB 2014a environment. Through k-means clustering sampling technique, the ROA – Active Set procedure was able to arrive at a (nearly) converged maximum expected total optimal conjunctive water use withdrawal rate within a relatively few number of iterations (6 to 7 iterations). Results indicate that the ROA approach is a promising technique for optimizing conjunctive water use of surface water and groundwater withdrawal rates under hydrogeological uncertainty.

Keywords: conjunctive water management, retrospective optimization approximation approach, sample average approximation, uncertainty

Procedia PDF Downloads 207
24154 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 60
24153 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 395
24152 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 159
24151 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 829
24150 Analysis of Travel Behavior Patterns of Frequent Passengers after the Section Shutdown of Urban Rail Transit - Taking the Huaqiao Section of Shanghai Metro Line 11 Shutdown During the COVID-19 Epidemic as an Example

Authors: Hongyun Li, Zhibin Jiang

Abstract:

The travel of passengers in the urban rail transit network is influenced by changes in network structure and operational status, and the response of individual travel preferences to these changes also varies. Firstly, the influence of the suspension of urban rail transit line sections on passenger travel along the line is analyzed. Secondly, passenger travel trajectories containing multi-dimensional semantics are described based on network UD data. Next, passenger panel data based on spatio-temporal sequences is constructed to achieve frequent passenger clustering. Then, the Graph Convolutional Network (GCN) is used to model and identify the changes in travel modes of different types of frequent passengers. Finally, taking Shanghai Metro Line 11 as an example, the travel behavior patterns of frequent passengers after the Huaqiao section shutdown during the COVID-19 epidemic are analyzed. The results showed that after the section shutdown, most passengers would transfer to the nearest Anting station for boarding, while some passengers would transfer to other stations for boarding or cancel their travels directly. Among the passengers who transferred to Anting station for boarding, most of passengers maintained the original normalized travel mode, a small number of passengers waited for a few days before transferring to Anting station for boarding, and only a few number of passengers stopped traveling at Anting station or transferred to other stations after a few days of boarding on Anting station. The results can provide a basis for understanding urban rail transit passenger travel patterns and improving the accuracy of passenger flow prediction in abnormal operation scenarios.

Keywords: urban rail transit, section shutdown, frequent passenger, travel behavior pattern

Procedia PDF Downloads 55
24149 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 57
24148 Heuristic Classification of Hydrophone Recordings

Authors: Daniel M. Wolff, Patricia Gray, Rafael de la Parra Venegas

Abstract:

An unsupervised machine listening system is constructed and applied to a dataset of 17,195 30-second marine hydrophone recordings. The system is then heuristically supplemented with anecdotal listening, contextual recording information, and supervised learning techniques to reduce the number of false positives. Features for classification are assembled by extracting the following data from each of the audio files: the spectral centroid, root-mean-squared values for each frequency band of a 10-octave filter bank, and mel-frequency cepstral coefficients in 5-second frames. In this way both time- and frequency-domain information are contained in the features to be passed to a clustering algorithm. Classification is performed using the k-means algorithm and then a k-nearest neighbors search. Different values of k are experimented with, in addition to different combinations of the available feature sets. Hypothesized class labels are 'primarily anthrophony' and 'primarily biophony', where the best class result conforming to the former label has 104 members after heuristic pruning. This demonstrates how a large audio dataset has been made more tractable with machine learning techniques, forming the foundation of a framework designed to acoustically monitor and gauge biological and anthropogenic activity in a marine environment.

Keywords: anthrophony, hydrophone, k-means, machine learning

Procedia PDF Downloads 137
24147 Examining Social Connectivity through Email Network Analysis: Study of Librarians' Emailing Groups in Pakistan

Authors: Muhammad Arif Khan, Haroon Idrees, Imran Aziz, Sidra Mushtaq

Abstract:

Social platforms like online discussion and mailing groups are well aligned with academic as well as professional learning spaces. Professional communities are increasingly moving to online forums for sharing and capturing the intellectual abilities. This study investigated dynamics of social connectivity of yahoo mailing groups of Pakistani Library and Information Science (LIS) professionals using Graph Theory technique. Design/Methodology: Social Network Analysis is the increasingly concerned domain for scientists in identifying whether people grow together through online social interaction or, whether they just reflect connectivity. We have conducted a longitudinal study using Network Graph Theory technique to analyze the large data-set of email communication. The data was collected from three yahoo mailing groups using network analysis software over a period of six months i.e. January to June 2016. Findings of the network analysis were reviewed through focus group discussion with LIS experts and selected respondents of the study. Data were analyzed in Microsoft Excel and network diagrams were visualized using NodeXL and ORA-Net Scene package. Findings: Findings demonstrate that professionals and students exhibit intellectual growth the more they get tied within a network by interacting and participating in communication through online forums. The study reports on dynamics of the large network by visualizing the email correspondence among group members in a network consisting vertices (members) and edges (randomized correspondence). The model pair wise relationship between group members was illustrated to show characteristics, reasons, and strength of ties. Connectivity of nodes illustrated the frequency of communication among group members through examining node coupling, diffusion of networks, and node clustering has been demonstrated in-depth. Network analysis was found to be a useful technique in investigating the dynamics of the large network.

Keywords: emailing networks, network graph theory, online social platforms, yahoo mailing groups

Procedia PDF Downloads 218
24146 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 87
24145 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 159
24144 Machine Learning Analysis of Eating Disorders Risk, Physical Activity and Psychological Factors in Adolescents: A Community Sample Study

Authors: Marc Toutain, Pascale Leconte, Antoine Gauthier

Abstract:

Introduction: Eating Disorders (ED), such as anorexia, bulimia, and binge eating, are psychiatric illnesses that mostly affect young people. The main symptoms concern eating (restriction, excessive food intake) and weight control behaviors (laxatives, vomiting). Psychological comorbidities (depression, executive function disorders, etc.) and problematic behaviors toward physical activity (PA) are commonly associated with ED. Acquaintances on ED risk factors are still lacking, and more community sample studies are needed to improve prevention and early detection. To our knowledge, studies are needed to specifically investigate the link between ED risk level, PA, and psychological risk factors in a community sample of adolescents. The aim of this study is to assess the relation between ED risk level, exercise (type, frequency, and motivations for engaging in exercise), and psychological factors based on the Jacobi risk factors model. We suppose that a high risk of ED will be associated with the practice of high caloric cost PA, motivations oriented to weight and shape control, and psychological disturbances. Method: An online survey destined for students has been sent to several middle schools and colleges in northwest France. This survey combined several questionnaires, the Eating Attitude Test-26 assessing ED risk; the Exercise Motivation Inventory–2 assessing motivations toward PA; the Hospital Anxiety and Depression Scale assessing anxiety and depression, the Contour Drawing Rating Scale; and the Body Esteem Scale assessing body dissatisfaction, Rosenberg Self-esteem Scale assessing self-esteem, the Exercise Dependence Scale-Revised assessing PA dependence, the Multidimensional Assessment of Interoceptive Awareness assessing interoceptive awareness and the Frost Multidimensional Perfectionism Scale assessing perfectionism. Machine learning analysis will be performed in order to constitute groups with a tree-based model clustering method, extract risk profile(s) with a bootstrap method comparison, and predict ED risk with a prediction method based on a decision tree-based model. Expected results: 1044 complete records have already been collected, and the survey will be closed at the end of May 2022. Records will be analyzed with a clustering method and a bootstrap method in order to reveal risk profile(s). Furthermore, a predictive tree decision method will be done to extract an accurate predictive model of ED risk. This analysis will confirm typical main risk factors and will give more data on presumed strong risk factors such as exercise motivations and interoceptive deficit. Furthermore, it will enlighten particular risk profiles with a strong level of proof and greatly contribute to improving the early detection of ED and contribute to a better understanding of ED risk factors.

Keywords: eating disorders, risk factors, physical activity, machine learning

Procedia PDF Downloads 64
24143 Microbial Biogeography of Greek Olive Varieties Assessed by Amplicon-Based Metagenomics Analysis

Authors: Lena Payati, Maria Kazou, Effie Tsakalidou

Abstract:

Table olives are one of the most popular fermented vegetables worldwide, which along with olive oil, have a crucial role in the world economy. They are highly appreciated by the consumers for their characteristic taste and pleasant aromas, while several health and nutritional benefits have been reported as well. Until recently, microbial biogeography, i.e., the study of microbial diversity over time and space, has been mainly associated with wine. However, nowadays, the term 'terroir' has been extended to other crops and food products so as to link the geographical origin and environmental conditions to quality aspects of fermented foods. Taking the above into consideration, the present study focuses on the microbial fingerprinting of the most important olive varieties of Greece with the state-of-the-art amplicon-based metagenomics analysis. Towards this, in 2019, 61 samples from 38 different olive varieties were collected at the final stage of ripening from 13 well spread geographical regions in Greece. For the metagenomics analysis, total DNA was extracted from the olive samples, and the 16S rRNA gene and ITS DNA region were sequenced and analyzed using bioinformatics tools for the identification of bacterial and yeasts/fungal diversity, respectively. Furthermore, principal component analysis (PCA) was also performed for data clustering based on the average microbial composition of all samples from each region of origin. According to the composition, results obtained, when samples were analyzed separately, the majority of both bacteria (such as Pantoea, Enterobacter, Roserbergiella, and Pseudomonas) and yeasts/fungi (such as Aureobasidium, Debaromyces, Candida, and Cladosporium) genera identified were found in all 61 samples. Even though interesting differences were observed at the relative abundance level of the identified genera, the bacterial genus Pantoea and the yeast/fungi genus Aureobasidium were the dominant ones in 35 and 40 samples, respectively. Of note, olive samples collected from the same region had similar fingerprint (genera identified and relative abundance level) regardless of the variety, indicating a potential association between the relative abundance of certain taxa and the geographical region. When samples were grouped by region of origin, distinct bacterial profiles per region were observed, which was also evident from the PCA analysis. This was not the case for the yeast/fungi profiles since 10 out of the 13 regions were grouped together mainly due to the dominance of the genus Aureobasidium. A second cluster was formed for the islands Crete and Rhodes, both of which are located in the Southeast Aegean Sea. These two regions clustered together mainly due to the identification of the genus Toxicocladosporium in relatively high abundances. Finally, the Agrinio region was separated from the others as it showed a completely different microbial fingerprinting. However, due to the limited number of olive samples from some regions, a subsequent PCA analysis with more samples from these regions is expected to yield in a more clear clustering. The present study is part of a bigger project, the first of its kind in Greece, with the ultimate goal to analyze a larger set of olive samples of different varieties and from different regions in Greece in order to have a reliable olives’ microbial biogeography.

Keywords: amplicon-based metagenomics analysis, bacteria, microbial biogeography, olive microbiota, yeasts/fungi

Procedia PDF Downloads 96
24142 Development of a Multi-Locus DNA Metabarcoding Method for Endangered Animal Species Identification

Authors: Meimei Shi

Abstract:

Objectives: The identification of endangered species, especially simultaneous detection of multiple species in complex samples, plays a critical role in alleged wildlife crime incidents and prevents illegal trade. This study was to develop a multi-locus DNA metabarcoding method for endangered animal species identification. Methods: Several pairs of universal primers were designed according to the mitochondria conserved gene regions. Experimental mixtures were artificially prepared by mixing well-defined species, including endangered species, e.g., forest musk, bear, tiger, pangolin, and sika deer. The artificial samples were prepared with 1-16 well-characterized species at 1% to 100% DNA concentrations. After multiplex-PCR amplification and parameter modification, the amplified products were analyzed by capillary electrophoresis and used for NGS library preparation. The DNA metabarcoding was carried out based on Illumina MiSeq amplicon sequencing. The data was processed with quality trimming, reads filtering, and OTU clustering; representative sequences were blasted using BLASTn. Results: According to the parameter modification and multiplex-PCR amplification results, five primer sets targeting COI, Cytb, 12S, and 16S, respectively, were selected as the NGS library amplification primer panel. High-throughput sequencing data analysis showed that the established multi-locus DNA metabarcoding method was sensitive and could accurately identify all species in artificial mixtures, including endangered animal species Moschus berezovskii, Ursus thibetanus, Panthera tigris, Manis pentadactyla, Cervus nippon at 1% (DNA concentration). In conclusion, the established species identification method provides technical support for customs and forensic scientists to prevent the illegal trade of endangered animals and their products.

Keywords: DNA metabarcoding, endangered animal species, mitochondria nucleic acid, multi-locus

Procedia PDF Downloads 110
24141 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 430
24140 Study of Mobile Game Addiction Using Electroencephalography Data Analysis

Authors: Arsalan Ansari, Muhammad Dawood Idrees, Maria Hafeez

Abstract:

Use of mobile phones has been increasing considerably over the past decade. Currently, it is one of the main sources of communication and information. Initially, mobile phones were limited to calls and messages, but with the advent of new technology smart phones were being used for many other purposes including video games. Despite of positive outcomes, addiction to video games on mobile phone has become a leading cause of psychological and physiological problems among many people. Several researchers examined the different aspects of behavior addiction with the use of different scales. Objective of this study is to examine any distinction between mobile game addicted and non-addicted players with the use of electroencephalography (EEG), based upon psycho-physiological indicators. The mobile players were asked to play a mobile game and EEG signals were recorded by BIOPAC equipment with AcqKnowledge as data acquisition software. Electrodes were places, following the 10-20 system. EEG was recorded at sampling rate of 200 samples/sec (12,000samples/min). EEG recordings were obtained from the frontal (Fp1, Fp2), parietal (P3, P4), and occipital (O1, O2) lobes of the brain. The frontal lobe is associated with behavioral control, personality, and emotions. The parietal lobe is involved in perception, understanding logic, and arithmetic. The occipital lobe plays a role in visual tasks. For this study, a 60 second time window was chosen for analysis. Preliminary analysis of the signals was carried out with Acqknowledge software of BIOPAC Systems. From the survey based on CGS manual study 2010, it was concluded that five participants out of fifteen were in addictive category. This was used as prior information to group the addicted and non-addicted by physiological analysis. Statistical analysis showed that by applying clustering analysis technique authors were able to categorize the addicted and non-addicted players specifically on theta frequency range of occipital area.

Keywords: mobile game, addiction, psycho-physiology, EEG analysis

Procedia PDF Downloads 138
24139 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 476
24138 Study on the Characteristics of Chinese Urban Network Space from the Perspective of Innovative Collaboration

Authors: Wei Wang, Yilun Xu

Abstract:

With the development of knowledge economy era, deepening the mechanism of cooperation and adhering to sharing and win-win cooperation has become new direction of urban development nowadays. In recent years, innovative collaborations between cities are becoming more and more frequent, whose influence on urban network space has aroused many scholars' attention. Taking 46 cities in China as the research object, the paper builds the connectivity of innovative network between cities and the linkages of urban external innovation using patent cooperation data among cities, and explores urban network space in China by the application of GIS, which is a beneficial exploration to the study of social network space in China in the era of information network. The result shows that the urban innovative network space and geographical entity space exist differences, and the linkages of external innovation are not entirely related to the city innovative capacity and the level of economy development. However, urban innovative network space and geographical entity space are similar in hierarchical clustering. They have both formed Beijing-Tianjin-Hebei, Yangtze River Delta, Pearl River Delta three metropolitan areas and Beijing-Shenzhen-Shanghai-Hangzhou four core cities, which lead the development of innovative network space in China.

Keywords: innovative collaboration, urban network space, the connectivity of innovative network, the linkages of external innovation

Procedia PDF Downloads 155
24137 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques

Authors: Tosin Ige

Abstract:

Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.

Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique

Procedia PDF Downloads 136
24136 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 288
24135 Application of Latent Class Analysis and Self-Organizing Maps for the Prediction of Treatment Outcomes for Chronic Fatigue Syndrome

Authors: Ben Clapperton, Daniel Stahl, Kimberley Goldsmith, Trudie Chalder

Abstract:

Chronic fatigue syndrome (CFS) is a condition characterised by chronic disabling fatigue and other symptoms that currently can't be explained by any underlying medical condition. Although clinical trials support the effectiveness of cognitive behaviour therapy (CBT), the success rate for individual patients is modest. Patients vary in their response and little is known which factors predict or moderate treatment outcomes. The aim of the project is to develop a prediction model from baseline characteristics of patients, such as demographics, clinical and psychological variables, which may predict likely treatment outcome and provide guidance for clinical decision making and help clinicians to recommend the best treatment. The project is aimed at identifying subgroups of patients with similar baseline characteristics that are predictive of treatment effects using modern cluster analyses and data mining machine learning algorithms. The characteristics of these groups will then be used to inform the types of individuals who benefit from a specific treatment. In addition, results will provide a better understanding of for whom the treatment works. The suitability of different clustering methods to identify subgroups and their response to different treatments of CFS patients is compared.

Keywords: chronic fatigue syndrome, latent class analysis, prediction modelling, self-organizing maps

Procedia PDF Downloads 201
24134 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 399
24133 Screening of Risk Phenotypes among Metabolic Syndrome Subjects in Adult Pakistani Population

Authors: Muhammad Fiaz, Muhammad Saqlain, Abid Mahmood, S. M. Saqlan Naqvi, Rizwan Aziz Qazi, Ghazala Kaukab Raja

Abstract:

Background: Metabolic Syndrome is a clustering of multiple risk factors including central obesity, hypertension, dyslipidemia and hyperglycemia. These risk phenotypes of metabolic syndrome (MetS) prevalent world-wide, Therefore we aimed to identify the frequency of risk phenotypes among metabolic syndrome subjects in local adult Pakistani population. Methods: Screening of subjects visiting out-patient department of medicine, Shaheed Zulfiqar Ali Bhutto Medical University, Islamabad was performed to assess the occurrence of risk phenotypes among MetS subjects in Pakistani population. The Metabolic Syndrome was defined based on International Diabetes Federation (IDF) criteria. Anthropometric and biochemical assay results were recorded. Data was analyzed using SPSS software (16.0). Results: Our results showed that dyslipidemia (31.50%) and hyperglycemia (30.50%) was most population specific risk phenotypes of MetS. The results showed the order of association of metabolic risk phenotypes to MetS as follows hyperglycemia>dyslipidemia>obesity >hypertension. Conclusion: The hyperglycemia and dyslipidemia were found be the major risk phenotypes among the MetS subjects and have greater chances of deceloping MetS among Pakistani Population.

Keywords: dyslipidemia, hypertention, metabolic syndrome, obesity

Procedia PDF Downloads 192
24132 Access Control System for Big Data Application

Authors: Winfred Okoe Addy, Jean Jacques Dominique Beraud

Abstract:

Access control systems (ACs) are some of the most important components in safety areas. Inaccuracies of regulatory frameworks make personal policies and remedies more appropriate than standard models or protocols. This problem is exacerbated by the increasing complexity of software, such as integrated Big Data (BD) software for controlling large volumes of encrypted data and resources embedded in a dedicated BD production system. This paper proposes a general access control strategy system for the diffusion of Big Data domains since it is crucial to secure the data provided to data consumers (DC). We presented a general access control circulation strategy for the Big Data domain by describing the benefit of using designated access control for BD units and performance and taking into consideration the need for BD and AC system. We then presented a generic of Big Data access control system to improve the dissemination of Big Data.

Keywords: access control, security, Big Data, domain

Procedia PDF Downloads 112
24131 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 24
24130 Geographic Mapping of Tourism in Rural Areas: A Case Study of Cumbria, United Kingdom

Authors: Emma Pope, Demos Parapanos

Abstract:

Rural tourism has become more obvious and prevalent, with tourists’ increasingly seeking authentic experiences. This movement accelerated post-Covid, putting destinations in danger of reaching levels of saturation called ‘overtourism’. Whereas the phenomenon of overtourism has been frequently discussed in the urban context by academics and practitioners over recent years, it has hardly been referred to in the context of rural tourism, where perhaps it is even more difficult to manage. Rural tourism was historically considered small-scale, marked by its traditional character and by having little impact on nature and rural society. The increasing number of rural areas experiencing overtourism, however, demonstrates the need for new approaches, especially as the impacts and enablers of overtourism are context specific. Cumbria, with approximately 47 million visitors each year, and 23,000 operational enterprises, is one of these rural areas experiencing overtourism in the UK. Using the county of Cumbria as an example, this paper aims to explore better planning and management in rural destinations by clustering the area into rural and ‘urban-rural’ tourism zones. To achieve the aim, this study uses secondary data from a variety of sources to identify variables relating to visitor economy development and demand. These data include census data relating to population and employment, tourism industry-specific data including tourism revenue, visitor activities, and accommodation stock, and big data sources such as Trip Advisor and All Trails. The combination of these data sources provides a breadth of tourism-related variables. The subsequent analysis of this data draws upon various validated models. For example, tourism and hospitality employment density, territorial tourism pressure, and accommodation density. In addition to these statistical calculations, other data are utilized to further understand the context of these zones, for example, tourist services, attractions, and activities. The data was imported into ARCGIS where the density of the different variables is visualized on maps. This study aims to provide an understanding of the geographical context of visitor economy development and tourist behavior in rural areas. The findings contribute to an understanding of the spatial dynamics of tourism within the region of Cumbria through the creation of thematized maps. Different zones of tourism industry clusters are identified, which include elements relating to attractions, enterprises, infrastructure, tourism employment and economic impact. These maps visualize hot and cold spots relating to a variety of tourism contexts. It is believed that the strategy used to provide a visual overview of tourism development and demand in Cumbria could provide a strategic tool for rural areas to better plan marketing opportunities and avoid overtourism. These findings can inform future sustainability policy and destination management strategies within the areas through an understanding of the processes behind the emergence of both hot and cold spots. It may mean that attract and disperse needs to be reviewed in terms of a strategic option. In other words, to use sector or zonal policies for the individual hot or cold areas with transitional zones dependent upon local economic, social and environmental factors.

Keywords: overtourism, rural tourism, sustainable tourism, tourism planning, tourism zones

Procedia PDF Downloads 52