Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 24446

Search results for: EEG raw data

23516 Spatial Variability of Brahmaputra River Flow Characteristics

Abstract:

Brahmaputra River is known according to the Hindu mythology the son of the Lord Brahma. According to this name, the river Brahmaputra creates mass destruction during the monsoon season in Assam, India. It is a state situated in North-East part of India. This is one of the essential states out of the seven countries of eastern India, where almost all entire Brahmaputra flow carried out. The other states carry their tributaries. In the present case study, the spatial analysis performed in this specific case the number of MODIS data are acquired. In the method of detecting the change, the spray content was found during heavy rainfall and in the flooded monsoon season. By this method, particularly the analysis over the Brahmaputra outflow determines the flooded season. The charged particle-associated in aerosol content genuinely verifies the heavy water content below the ground surface, which is validated by trend analysis through rainfall spectrum data. This is confirmed by in-situ sampled view data from a different position of Brahmaputra River. Further, a Hyperion Hyperspectral 30 m resolution data were used to scan the sediment deposits, which is also confirmed by in-situ sampled view data from a different position.

Keywords: aerosol, change detection, spatial analysis, trend analysis

Procedia PDF Downloads 133

23515 Data Mining Model for Predicting the Status of HIV Patients during Drug Regimen Change

Authors: Ermias A. Tegegn, Million Meshesha

Abstract:

Human Immunodeficiency Virus and Acquired Immunodeficiency Syndrome (HIV/AIDS) is a major cause of death for most African countries. Ethiopia is one of the seriously affected countries in sub Saharan Africa. Previously in Ethiopia, having HIV/AIDS was almost equivalent to a death sentence. With the introduction of Antiretroviral Therapy (ART), HIV/AIDS has become chronic, but manageable disease. The study focused on a data mining technique to predict future living status of HIV/AIDS patients at the time of drug regimen change when the patients become toxic to the currently taking ART drug combination. The data is taken from University of Gondar Hospital ART program database. Hybrid methodology is followed to explore the application of data mining on ART program dataset. Data cleaning, handling missing values and data transformation were used for preprocessing the data. WEKA 3.7.9 data mining tools, classification algorithms, and expertise are utilized as means to address the research problem. By using four different classification algorithms, (i.e., J48 Classifier, PART rule induction, Naïve Bayes and Neural network) and by adjusting their parameters thirty-two models were built on the pre-processed University of Gondar ART program dataset. The performances of the models were evaluated using the standard metrics of accuracy, precision, recall, and F-measure. The most effective model to predict the status of HIV patients with drug regimen substitution is pruned J48 decision tree with a classification accuracy of 98.01%. This study extracts interesting attributes such as Ever taking Cotrim, Ever taking TbRx, CD4 count, Age, Weight, and Gender so as to predict the status of drug regimen substitution. The outcome of this study can be used as an assistant tool for the clinician to help them make more appropriate drug regimen substitution. Future research directions are forwarded to come up with an applicable system in the area of the study.

Keywords: HIV drug regimen, data mining, hybrid methodology, predictive model

Procedia PDF Downloads 131

23514 Internal Cycles from Hydrometric Data and Variability Detected Through Hydrological Modelling Results, on the Niger River, over 1901-2020

Authors: Salif Koné

Abstract:

We analyze hydrometric data at the Koulikoro station on the Niger River; this basin drains 120600 km2 and covers three countries in West Africa, Guinea, Mali, and Ivory Coast. Two subsequent decadal cycles are highlighted (1925-1936 and 1929-1939) instead of the presumed single decadal one from literature. Moreover, the observed hydrometric data shows a multidecadal 40-year period that is confirmed when graphing a spatial coefficient of variation of runoff over decades (starting at 1901-1910). Spatial runoff data are produced on 48 grids (0.5 degree by 0.5 degree) and through semi-distributed versions of both SimulHyd model and GR2M model - variants of a French Hydrologic model – standing for Genie Rural of 2 parameters at monthly time step. Both extremal decades in terms of runoff coefficient of variation are confronted: 1951-1960 has minimal coefficient of variation, and 1981-1990 shows the maximal value of it during the three months of high-water level (August, September, and October). The mapping of the relative variation of these two decadal situations allows hypothesizing as following: the scale of variation between both extremal situations could serve to fix boundary conditions for further simulations using data from climate scenario.

Keywords: internal cycles, hydrometric data, niger river, gr2m and simulhyd framework, runoff coefficient of variation

Procedia PDF Downloads 79

23513 A Novel Probabilistic Spatial Locality of Reference Technique for Automatic Cleansing of Digital Maps

Authors: A. Abdullah, S. Abushalmat, A. Bakshwain, A. Basuhail, A. Aslam

Abstract:

GIS (Geographic Information System) applications require geo-referenced data, this data could be available as databases or in the form of digital or hard-copy agro-meteorological maps. These parameter maps are color-coded with different regions corresponding to different parameter values, converting these maps into a database is not very difficult. However, text and different planimetric elements overlaid on these maps makes an accurate image to database conversion a challenging problem. The reason being, it is almost impossible to exactly replace what was underneath the text or icons; thus, pointing to the need for inpainting. In this paper, we propose a probabilistic inpainting approach that uses the probability of spatial locality of colors in the map for replacing overlaid elements with underlying color. We tested the limits of our proposed technique using non-textual simulated data and compared text removing results with a popular image editing tool using public domain data with promising results.

Keywords: noise, image, GIS, digital map, inpainting

Procedia PDF Downloads 332

23512 Evaluation of Urban Parks Based on POI Data: Taking Futian District of Shenzhen as an Example

Authors: Juanling Lin

Abstract:

The construction of urban parks is an important part of eco-city construction, and the intervention of big data provides a more scientific and rational platform for the assessment of urban parks by identifying and correcting the irrationality of urban park planning from the macroscopic level and then promoting the rational planning of urban parks. The study builds an urban park assessment system based on urban road network data and POI data, taking Futian District of Shenzhen as the research object, and utilizes the GIS geographic information system to assess the park system of Futian District in five aspects: park spatial distribution, accessibility, service capacity, demand, and supply-demand relationship. The urban park assessment system can effectively reflect the current situation of urban park construction and provide a useful exploration for realizing the rationality and fairness of urban park planning.

Keywords: urban parks, assessment system, POI, supply and demand

Procedia PDF Downloads 27

23511 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 55

23510 Reversible Information Hitting in Encrypted JPEG Bitstream by LSB Based on Inherent Algorithm

Authors: Vaibhav Barve

Abstract:

Reversible information hiding has drawn a lot of interest as of late. Being reversible, we can restore unique computerized data totally. It is a plan where mystery data is put away in digital media like image, video, audio to maintain a strategic distance from unapproved access and security reason. By and large JPEG bit stream is utilized to store this key data, first JPEG bit stream is encrypted into all around sorted out structure and then this secret information or key data is implanted into this encrypted region by marginally changing the JPEG bit stream. Valuable pixels suitable for information implanting are computed and as indicated by this key subtle elements are implanted. In our proposed framework we are utilizing RC4 algorithm for encrypting JPEG bit stream. Encryption key is acknowledged by framework user which, likewise, will be used at the time of decryption. We are executing enhanced least significant bit supplanting steganography by utilizing genetic algorithm. At first, the quantity of bits that must be installed in a guaranteed coefficient is versatile. By utilizing proper parameters, we can get high capacity while ensuring high security. We are utilizing logistic map for shuffling of bits and utilization GA (Genetic Algorithm) to find right parameters for the logistic map. Information embedding key is utilized at the time of information embedding. By utilizing precise picture encryption and information embedding key, the beneficiary can, without much of a stretch, concentrate the incorporated secure data and totally recoup the first picture and also the original secret information. At the point when the embedding key is truant, the first picture can be recouped pretty nearly with sufficient quality without getting the embedding key of interest.

Keywords: data embedding, decryption, encryption, reversible data hiding, steganography

Procedia PDF Downloads 277

23509 Streamlining .NET Data Access: Leveraging JSON for Data Operations in .NET

Authors: Tyler T. Procko, Steve Collins

Abstract:

New features in .NET (6 and above) permit streamlined access to information residing in JSON-capable relational databases, such as SQL Server (2016 and above). Traditional methods of data access now comparatively involve unnecessary steps which compromise system performance. This work posits that the established ORM (Object Relational Mapping) based methods of data access in applications and APIs result in common issues, e.g., object-relational impedance mismatch. Recent developments in C# and .NET Core combined with a framework of modern SQL Server coding conventions have allowed better technical solutions to the problem. As an amelioration, this work details the language features and coding conventions which enable this streamlined approach, resulting in an open-source .NET library implementation called Codeless Data Access (CODA). Canonical approaches rely on ad-hoc mapping code to perform type conversions between the client and back-end database; with CODA, no mapping code is needed, as JSON is freely mapped to SQL and vice versa. CODA streamlines API data access by improving on three aspects of immediate concern to web developers, database engineers and cybersecurity professionals: Simplicity, Speed and Security. Simplicity is engendered by cutting out the “middleman” steps, effectively making API data access a whitebox, whereas traditional methods are blackbox. Speed is improved because of the fewer translational steps taken, and security is improved as attack surfaces are minimized. An empirical evaluation of the speed of the CODA approach in comparison to ORM approaches ] is provided and demonstrates that the CODA approach is significantly faster. CODA presents substantial benefits for API developer workflows by simplifying data access, resulting in better speed and security and allowing developers to focus on productive development rather than being mired in data access code. Future considerations include a generalization of the CODA method and extension outside of the .NET ecosystem to other programming languages.

Keywords: API data access, database, JSON, .NET core, SQL server

Procedia PDF Downloads 52

23508 Blockchain for IoT Security and Privacy in Healthcare Sector

Authors: Umair Shafique, Hafiz Usman Zia, Fiaz Majeed, Samina Naz, Javeria Ahmed, Maleeha Zainab

Abstract:

The Internet of Things (IoT) has become a hot topic for the last couple of years. This innovative technology has shown promising progress in various areas, and the world has witnessed exponential growth in multiple application domains. Researchers are working to investigate its aptitudes to get the best from it by harnessing its true potential. But at the same time, IoT networks open up a new aspect of vulnerability and physical threats to data integrity, privacy, and confidentiality. It's is due to centralized control, data silos approach for handling information, and a lack of standardization in the IoT networks. As we know, blockchain is a new technology that involves creating secure distributed ledgers to store and communicate data. Some of the benefits include resiliency, integrity, anonymity, decentralization, and autonomous control. The potential for blockchain technology to provide the key to managing and controlling IoT has created a new wave of excitement around the idea of putting that data back into the hands of the end-users. In this manuscript, we have proposed a model that combines blockchain and IoT networks to address potential security and privacy issues in the healthcare domain. Then we try to describe various application areas, challenges, and future directions in the healthcare sector where blockchain platforms merge with IoT networks.

Keywords: IoT, blockchain, cryptocurrency, healthcare, consensus, data

Procedia PDF Downloads 155

23507 Vision-Based Daily Routine Recognition for Healthcare with Transfer Learning

Authors: Bruce X. B. Yu, Yan Liu, Keith C. C. Chan

Abstract:

We propose to record Activities of Daily Living (ADLs) of elderly people using a vision-based system so as to provide better assistive and personalization technologies. Current ADL-related research is based on data collected with help from non-elderly subjects in laboratory environments and the activities performed are predetermined for the sole purpose of data collection. To obtain more realistic datasets for the application, we recorded ADLs for the elderly with data collected from real-world environment involving real elderly subjects. Motivated by the need to collect data for more effective research related to elderly care, we chose to collect data in the room of an elderly person. Specifically, we installed Kinect, a vision-based sensor on the ceiling, to capture the activities that the elderly subject performs in the morning every day. Based on the data, we identified 12 morning activities that the elderly person performs daily. To recognize these activities, we created a HARELCARE framework to investigate into the effectiveness of existing Human Activity Recognition (HAR) algorithms and propose the use of a transfer learning algorithm for HAR. We compared the performance, in terms of accuracy, and training progress. Although the collected dataset is relatively small, the proposed algorithm has a good potential to be applied to all daily routine activities for healthcare purposes such as evidence-based diagnosis and treatment.

Keywords: daily activity recognition, healthcare, IoT sensors, transfer learning

Procedia PDF Downloads 119

23506 Design and Implementation of Security Middleware for Data Warehouse Signature, Framework

Authors: Mayada Al Meghari

Abstract:

Recently, grid middlewares have provided large integrated use of network resources as the shared data and the CPU to become a virtual supercomputer. In this work, we present the design and implementation of the middleware for Data Warehouse Signature, DWS Framework. The aim of using the middleware in our DWS framework is to achieve the high performance by the parallel computing. This middleware is developed on Alchemi.Net framework to increase the security among the network nodes through the authentication and group-key distribution model. This model achieves the key security and prevents any intermediate attacks in the middleware. This paper presents the flow process structures of the middleware design. In addition, the paper ensures the implementation of security for DWS middleware enhancement with the authentication and group-key distribution model. Finally, from the analysis of other middleware approaches, the developed middleware of DWS framework is the optimal solution of a complete covering of security issues.

Keywords: middleware, parallel computing, data warehouse, security, group-key, high performance

Procedia PDF Downloads 98

23505 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 381

23504 Corporate Governance and Bank Performance: A Study of Selected Deposit Money Banks in Nigeria

Authors: Ayodele Ajayi, John Ajayi

Abstract:

This paper investigates the effect of corporate governance with a view to determining the relationship between board size and bank performance. Data for the study were obtained from the audited financial statements of five sampled banks listed on the Nigerian Stock Exchange. Panel data technique was adopted and analysis was carried out with the use of multiple regression and pooled ordinary least square. Results from the study show that the larger the board size, the greater the profit implying that corporate governance is positively correlated with bank performance.

Keywords: corporate governance, banks performance, board size, pooled data

Procedia PDF Downloads 339

23503 Empowering a New Frontier in Heart Disease Detection: Unleashing Quantum Machine Learning

Authors: Sadia Nasrin Tisha, Mushfika Sharmin Rahman, Javier Orduz

Abstract:

Machine learning is applied in a variety of fields throughout the world. The healthcare sector has benefited enormously from it. One of the most effective approaches for predicting human heart diseases is to use machine learning applications to classify data and predict the outcome as a classification. However, with the rapid advancement of quantum technology, quantum computing has emerged as a potential game-changer for many applications. Quantum algorithms have the potential to execute substantially faster than their classical equivalents, which can lead to significant improvements in computational performance and efficiency. In this study, we applied quantum machine learning concepts to predict coronary heart diseases from text data. We experimented thrice with three different features; and three feature sets. The data set consisted of 100 data points. We pursue to do a comparative analysis of the two approaches, highlighting the potential benefits of quantum machine learning for predicting heart diseases.

Keywords: quantum machine learning, SVM, QSVM, matrix product state

Procedia PDF Downloads 74

23502 Blockchain’s Feasibility in Military Data Networks

Authors: Brenden M. Shutt, Lubjana Beshaj, Paul L. Goethals, Ambrose Kam

Abstract:

Communication security is of particular interest to military data networks. A relatively novel approach to network security is blockchain, a cryptographically secured distribution ledger with a decentralized consensus mechanism for data transaction processing. Recent advances in blockchain technology have proposed new techniques for both data validation and trust management, as well as different frameworks for managing dataflow. The purpose of this work is to test the feasibility of different blockchain architectures as applied to military command and control networks. Various architectures are tested through discrete-event simulation and the feasibility is determined based upon a blockchain design’s ability to maintain long-term stable performance at industry standards of throughput, network latency, and security. This work proposes a consortium blockchain architecture with a computationally inexpensive consensus mechanism, one that leverages a Proof-of-Identity (PoI) concept and a reputation management mechanism.

Keywords: blockchain, consensus mechanism, discrete-event simulation, fog computing

Procedia PDF Downloads 122

23501 Verification & Validation of Map Reduce Program Model for Parallel K-Mediod Algorithm on Hadoop Cluster

Authors: Trapti Sharma, Devesh Kumar Srivastava

Abstract:

This paper is basically a analysis study of above MapReduce implementation and also to verify and validate the MapReduce solution model for Parallel K-Mediod algorithm on Hadoop Cluster. MapReduce is a programming model which authorize the managing of huge amounts of data in parallel, on a large number of devices. It is specially well suited to constant or moderate changing set of data since the implementation point of a position is usually high. MapReduce has slowly become the framework of choice for “big data”. The MapReduce model authorizes for systematic and instant organizing of large scale data with a cluster of evaluate nodes. One of the primary affect in Hadoop is how to minimize the completion length (i.e. makespan) of a set of MapReduce duty. In this paper, we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Mediod clustering algorithm. We have found that as the amount of nodes increases the completion time decreases.

Keywords: hadoop, mapreduce, k-mediod, validation, verification

Procedia PDF Downloads 352

23500 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization

Procedia PDF Downloads 176

23499 "Revolutionizing Geographic Data: CADmapper's Automated Precision in CAD Drawing Transformation"

Authors: Toleen Alaqqad, Kadi Alshabramiy, Suad Zaafarany, Basma Musallam

Abstract:

CADmapper is a significant tool of software for transforming geographic data into realistic CAD drawings. It speeds up and simplifies the conversion process by automating it. This allows architects, urban planners, engineers, and geographic information system (GIS) experts to solely concentrate on the imaginative and scientific parts of their projects. While the future incorporation of AI has the potential for further improvements, CADmapper's current capabilities make it an indispensable asset in the business. It covers a combination of 2D and 3D city and urban area models. The user can select a specific square section of the map to view, and the fee is based on the dimensions of the area being viewed. The procedure is straightforward: you choose the area you want, then pick whether or not to include topography. 3D architectural data (if available), followed by selecting whatever design program or CAD style you want to publish the document which contains more than 200 free broad town plans in DXF format. If you desire to specify a bespoke area, it's free up to 1 km2.

Keywords: cadmaper, gdata, 2d and 3d data conversion, automated cad drawing, urban planning software

Procedia PDF Downloads 47

23498 An IoT-Enabled Crop Recommendation System Utilizing Message Queuing Telemetry Transport (MQTT) for Efficient Data Transmission to AI/ML Models

Authors: Prashansa Singh, Rohit Bajaj, Manjot Kaur

Abstract:

In the modern agricultural landscape, precision farming has emerged as a pivotal strategy for enhancing crop yield and optimizing resource utilization. This paper introduces an innovative Crop Recommendation System (CRS) that leverages the Internet of Things (IoT) technology and the Message Queuing Telemetry Transport (MQTT) protocol to collect critical environmental and soil data via sensors deployed across agricultural fields. The system is designed to address the challenges of real-time data acquisition, efficient data transmission, and dynamic crop recommendation through the application of advanced Artificial Intelligence (AI) and Machine Learning (ML) models. The CRS architecture encompasses a network of sensors that continuously monitor environmental parameters such as temperature, humidity, soil moisture, and nutrient levels. This sensor data is then transmitted to a central MQTT server, ensuring reliable and low-latency communication even in bandwidth-constrained scenarios typical of rural agricultural settings. Upon reaching the server, the data is processed and analyzed by AI/ML models trained to correlate specific environmental conditions with optimal crop choices and cultivation practices. These models consider historical crop performance data, current agricultural research, and real-time field conditions to generate tailored crop recommendations. This implementation gets 99% accuracy.

Keywords: Iot, MQTT protocol, machine learning, sensor, publish, subscriber, agriculture, humidity

Procedia PDF Downloads 43

23497 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout

Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.

Keywords: metabolic network, gene knockout, flux balance analysis, microarray data, integration

Procedia PDF Downloads 566

23496 Extracting Opinions from Big Data of Indonesian Customer Reviews Using Hadoop MapReduce

Authors: Veronica S. Moertini, Vinsensius Kevin, Gede Karya

Abstract:

Customer reviews have been collected by many kinds of e-commerce websites selling products, services, hotel rooms, tickets and so on. Each website collects its own customer reviews. The reviews can be crawled, collected from those websites and stored as big data. Text analysis techniques can be used to analyze that data to produce summarized information, such as customer opinions. Then, these opinions can be published by independent service provider websites and used to help customers in choosing the most suitable products or services. As the opinions are analyzed from big data of reviews originated from many websites, it is expected that the results are more trusted and accurate. Indonesian customers write reviews in Indonesian language, which comes with its own structures and uniqueness. We found that most of the reviews are expressed with “daily language”, which is informal, do not follow the correct grammar, have many abbreviations and slangs or non-formal words. Hadoop is an emerging platform aimed for storing and analyzing big data in distributed systems. A Hadoop cluster consists of master and slave nodes/computers operated in a network. Hadoop comes with distributed file system (HDFS) and MapReduce framework for supporting parallel computation. However, MapReduce has weakness (i.e. inefficient) for iterative computations, specifically, the cost of reading/writing data (I/O cost) is high. Given this fact, we conclude that MapReduce function is best adapted for “one-pass” computation. In this research, we develop an efficient technique for extracting or mining opinions from big data of Indonesian reviews, which is based on MapReduce with one-pass computation. In designing the algorithm, we avoid iterative computation and instead adopt a “look up table” technique. The stages of the proposed technique are: (1) Crawling the data reviews from websites; (2) cleaning and finding root words from the raw reviews; (3) computing the frequency of the meaningful opinion words; (4) analyzing customers sentiments towards defined objects. The experiments for evaluating the performance of the technique were conducted on a Hadoop cluster with 14 slave nodes. The results show that the proposed technique (stage 2 to 4) discovers useful opinions, is capable of processing big data efficiently and scalable.

Keywords: big data analysis, Hadoop MapReduce, analyzing text data, mining Indonesian reviews

Procedia PDF Downloads 188

23495 Global City Typologies: 300 Cities and Over 100 Datasets

Authors: M. Novak, E. Munoz, A. Jana, M. Nelemans

Abstract:

Cities and local governments the world over are interested to employ circular strategies as a means to bring about food security, create employment and increase resilience. The selection and implementation of circular strategies is facilitated by modeling the effects of strategies locally and understanding the impacts such strategies have had in other (comparable) cities and how that would translate locally. Urban areas are heterogeneous because of their geographic, economic, social characteristics, governance, and culture. In order to better understand the effect of circular strategies on urban systems, we create a dataset for over 300 cities around the world designed to facilitate circular strategy scenario modeling. This new dataset integrates data from over 20 prominent global national and urban data sources, such as the Global Human Settlements layer and International Labour Organisation, as well as incorporating employment data from over 150 cities collected bottom up from local departments and data providers. The dataset is made to be reproducible. Various clustering techniques are explored in the paper. The result is sets of clusters of cities, which can be used for further research, analysis, and support comparative, regional, and national policy making on circular cities.

Keywords: data integration, urban innovation, cluster analysis, circular economy, city profiles, scenario modelling

Procedia PDF Downloads 166

23494 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 244

23493 Structural Equation Modeling Semiparametric Truncated Spline Using Simulation Data

Authors: Adji Achmad Rinaldo Fernandes

Abstract:

SEM analysis is a complex multivariate analysis because it involves a number of exogenous and endogenous variables that are interconnected to form a model. The measurement model is divided into two, namely, the reflective model (reflecting) and the formative model (forming). Before carrying out further tests on SEM, there are assumptions that must be met, namely the linearity assumption, to determine the form of the relationship. There are three modeling approaches to path analysis, including parametric, nonparametric and semiparametric approaches. The aim of this research is to develop semiparametric SEM and obtain the best model. The data used in the research is secondary data as the basis for the process of obtaining simulation data. Simulation data was generated with various sample sizes of 100, 300, and 500. In the semiparametric SEM analysis, the form of the relationship studied was determined, namely linear and quadratic and determined one and two knot points with various levels of error variance (EV=0.5; 1; 5). There are three levels of closeness of relationship for the analysis process in the measurement model consisting of low (0.1-0.3), medium (0.4-0.6) and high (0.7-0.9) levels of closeness. The best model lies in the form of the relationship X1Y1 linear, and. In the measurement model, a characteristic of the reflective model is obtained, namely that the higher the closeness of the relationship, the better the model obtained. The originality of this research is the development of semiparametric SEM, which has not been widely studied by researchers.

Keywords: semiparametric SEM, measurement model, structural model, reflective model, formative model

Procedia PDF Downloads 18

23492 Quality Assurance for the Climate Data Store

Authors: Judith Klostermann, Miguel Segura, Wilma Jans, Dragana Bojovic, Isadora Christel Jimenez, Francisco Doblas-Reyees, Judit Snethlage

Abstract:

The Climate Data Store (CDS), developed by the Copernicus Climate Change Service (C3S) implemented by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Union, is intended to become a key instrument for exploring climate data. The CDS contains both raw and processed data to provide information to the users about the past, present and future climate of the earth. It allows for easy and free access to climate data and indicators, presenting an important asset for scientists and stakeholders on the path for achieving a more sustainable future. The C3S Evaluation and Quality Control (EQC) is assessing the quality of the CDS by undertaking a comprehensive user requirement assessment to measure the users’ satisfaction. Recommendations will be developed for the improvement and expansion of the CDS datasets and products. User requirements will be identified on the fitness of the datasets, the toolbox, and the overall CDS service. The EQC function of the CDS will help C3S to make the service more robust: integrated by validated data that follows high-quality standards while being user-friendly. This function will be closely developed with the users of the service. Through their feedback, suggestions, and contributions, the CDS can become more accessible and meet the requirements for a diverse range of users. Stakeholders and their active engagement are thus an important aspect of CDS development. This will be achieved with direct interactions with users such as meetings, interviews or workshops as well as different feedback mechanisms like surveys or helpdesk services at the CDS. The results provided by the users will be categorized as a function of CDS products so that their specific interests will be monitored and linked to the right product. Through this procedure, we will identify the requirements and criteria for data and products in order to build the correspondent recommendations for the improvement and expansion of the CDS datasets and products.

Keywords: climate data store, Copernicus, quality, user engagement

Procedia PDF Downloads 134

23491 Quantifying the Methods of Monitoring Timers in Electric Water Heater for Grid Balancing on Demand-Side Management: A Systematic Mapping Review

Authors: Yamamah Abdulrazaq, Lahieb A. Abrahim, Samuel E. Davies, Iain Shewring

Abstract:

An electric water heater (EWH) is a powerful appliance that uses electricity in residential, commercial, and industrial settings, and the ability to control them properly will result in cost savings and the prevention of blackouts on the national grid. This article discusses the usage of timers in EWH control strategies for demand-side management (DSM). Up to the authors' knowledge, there is no systematic mapping review focusing on the utilisation of EWH control strategies in DSM has yet been conducted. Consequently, the purpose of this research is to identify and examine main papers exploring EWH procedures in DSM by quantifying and categorising information with regard to publication year and source, kind of methods, and source of data for monitoring control techniques. In order to answer the research questions, a total of 31 publications published between 1999 and 2023 were selected depending on specific inclusion and exclusion criteria. The data indicate that direct load control (DLC) has been somewhat more prevalent than indirect load control (ILC). Additionally, the mixing method is much lower than the other techniques, and the proportion of Real-time data (RTD) to non-real-time data (NRTD) is about equal.

Keywords: demand side management, direct load control, electric water heater, indirect load control, non real-time data, real-time data

Procedia PDF Downloads 67

23490 Implications of Circular Economy on Users Data Privacy: A Case Study on Android Smartphones Second-Hand Market

Authors: Mariia Khramova, Sergio Martinez, Duc Nguyen

Abstract:

Modern electronic devices, particularly smartphones, are characterised by extremely high environmental footprint and short product lifecycle. Every year manufacturers release new models with even more superior performance, which pushes the customers towards new purchases. As a result, millions of devices are being accumulated in the urban mine. To tackle these challenges the concept of circular economy has been introduced to promote repair, reuse and recycle of electronics. In this case, electronic devices, that previously ended up in landfills or households, are getting the second life, therefore, reducing the demand for new raw materials. Smartphone reuse is gradually gaining wider adoption partly due to the price increase of flagship models, consequently, boosting circular economy implementation. However, along with reuse of communication device, circular economy approach needs to ensure the data of the previous user have not been 'reused' together with a device. This is especially important since modern smartphones are comparable with computers in terms of performance and amount of data stored. These data vary from pictures, videos, call logs to social security numbers, passport and credit card details, from personal information to corporate confidential data. To assess how well the data privacy requirements are followed on smartphones second-hand market, a sample of 100 Android smartphones has been purchased from IT Asset Disposition (ITAD) facilities responsible for data erasure and resell. Although devices should not have stored any user data by the time they leave ITAD, it has been possible to retrieve the data from 19% of the sample. Applied techniques varied from manual device inspection to sophisticated equipment and tools. These findings indicate significant barrier in implementation of circular economy and a limitation of smartphone reuse. Therefore, in order to motivate the users to donate or sell their old devices and make electronic use more sustainable, data privacy on second-hand smartphone market should be significantly improved. Presented research has been carried out in the framework of sustainablySMART project, which is part of Horizon 2020 EU Framework Programme for Research and Innovation.

Keywords: android, circular economy, data privacy, second-hand phones

Procedia PDF Downloads 116

23489 Development of Muay Thai Competition Management for Promoting Sport Tourism in the next Decade (2015-2024)

Authors: Supasak Ngaoprasertwong

Abstract:

The purpose of this research was to develop a model for Muay Thai competition management for promoting sport tourism in the next decade. Moreover, the model was appropriately initiated for practical use. This study also combined several methodologies, both quantitative research and qualitative research, to entirely cover all aspects of data, especially the tourists’ satisfaction toward Muay Thai competition. The data were collected from 400 tourists watching Muay Thai competition in 4 stadiums to create the model for Muay Thai competition to support the sport tourism in the next decade. Besides, Ethnographic Delphi Futures Research (EDFR) was applied to gather the data from certain experts in boxing industry or having significant role in Muay Thai competition in both public sector and private sector. The first step of data collection was an in-depth interview with 27 experts associated with Muay Thai competition, Muay Thai management, and tourism. The second step and the third step of data collection were conducted to confirm the experts’ opinions toward various elements. When the 3 steps of data collection were completely accomplished, all data were assembled to draft the model. Then the model was proposed to 8 experts to conduct a brainstorming to affirm it. According to the results of quantitative research, it found that the tourists were satisfied with personnel of competition at high level (x=3.87), followed by facilities, services, and safe high level (x=3.67). Furthermore, they were satisfied with operation in competition field at high level (x=3.62).Regarding the qualitative methodology including literature review, theories, concepts and analysis of qualitative research development of the model for Muay Thai competition to promote the sport tourism in the next decade, the findings indicated that there were 2 data sets as follows: The first one was related to Muay Thai competition to encourage the sport tourism and the second one was associated with Muay Thai stadium management to support the sport tourism. After the brain storming, “EE Muay Thai Model” was finally developed for promoting the sport tourism in the next decade (2015-2024).

Keywords: Muay Thai competition management, Muay Thai sport tourism, Muay Thai, Muay Thai for sport tourism management

Procedia PDF Downloads 302

23488 Interpretation and Clustering Framework for Analyzing ECG Survey Data

Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif

Abstract:

As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.

Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix

Procedia PDF Downloads 454

23487 LiDAR Based Real Time Multiple Vehicle Detection and Tracking

Authors: Zhongzhen Luo, Saeid Habibi, Martin v. Mohrenschildt

Abstract:

Self-driving vehicle require a high level of situational awareness in order to maneuver safely when driving in real world condition. This paper presents a LiDAR based real time perception system that is able to process sensor raw data for multiple target detection and tracking in dynamic environment. The proposed algorithm is nonparametric and deterministic that is no assumptions and priori knowledge are needed from the input data and no initializations are required. Additionally, the proposed method is working on the three-dimensional data directly generated by LiDAR while not scarifying the rich information contained in the domain of 3D. Moreover, a fast and efficient for real time clustering algorithm is applied based on a radially bounded nearest neighbor (RBNN). Hungarian algorithm procedure and adaptive Kalman filtering are used for data association and tracking algorithm. The proposed algorithm is able to run in real time with average run time of 70ms per frame.

Keywords: lidar, segmentation, clustering, tracking

Procedia PDF Downloads 398