Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 26240

Search results for: data recognition

25190 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators

Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros

Abstract:

Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.

Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis

Procedia PDF Downloads 138

25189 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm

Procedia PDF Downloads 139

25188 Canopy Temperature Acquired from Daytime and Nighttime Aerial Data as an Indicator of Trees’ Health Status

Authors: Agata Zakrzewska, Dominik Kopeć, Adrian Ochtyra

Abstract:

The growing number of new cameras, sensors, and research methods allow for a broader application of thermal data in remote sensing vegetation studies. The aim of this research was to check whether it is possible to use thermal infrared data with a spectral range (3.6-4.9 μm) obtained during the day and the night to assess the health condition of selected species of deciduous trees in an urban environment. For this purpose, research was carried out in the city center of Warsaw (Poland) in 2020. During the airborne data acquisition, thermal data, laser scanning, and orthophoto map images were collected. Synchronously with airborne data, ground reference data were obtained for 617 studied species (Acer platanoides, Acer pseudoplatanus, Aesculus hippocastanum, Tilia cordata, and Tilia × euchlora) in different health condition states. The results were as follows: (i) healthy trees are cooler than trees in poor condition and dying both in the daytime and nighttime data; (ii) the difference in the canopy temperatures between healthy and dying trees was 1.06oC of mean value on the nighttime data and 3.28oC of mean value on the daytime data; (iii) condition classes significantly differentiate on both daytime and nighttime thermal data, but only on daytime data all condition classes differed statistically significantly from each other. In conclusion, the aerial thermal data can be considered as an alternative to hyperspectral data, a method of assessing the health condition of trees in an urban environment. Especially data obtained during the day, which can differentiate condition classes better than data obtained at night. The method based on thermal infrared and laser scanning data fusion could be a quick and efficient solution for identifying trees in poor health that should be visually checked in the field.

Keywords: middle wave infrared, thermal imagery, tree discoloration, urban trees

Procedia PDF Downloads 114

25187 Social Media Marketing and Blog Usage in Business Schools: An Exploratory Study

Authors: Grzegorz Mazurek, Michal Kucia

Abstract:

The following study of a preliminary character, presents a first step of multifaceted study on the usage of social media in HEIs. It examines a significance, potential, and managerial implications of social media marketing and blogs usage in HEIs – namely in the sphere of business schools. Social media – particularly: blogging and virtual platforms such as Facebook, Twitter or Instagram have been covered at length in publications of both theoretical and practical nature as of late. Still, the amount of information related to the framework of application of social media in HEIs is rather limited. A pre-designed observation matrix has been used to collect primary data found at websites of different HEIs and to include blog observations. Additionally, a pilot study based on on-line questionnaires with marketing officers of HEI schools has been conducted. The main aim of the study was to identify and elaborate on matters like the scope of social media usage (and blogs in particular) in practice, recognition of the functions fulfilled by social media and blogs, or the anticipated potential of social media for HEIs. The study reveals that the majority of business schools highly ranked in Financial Times rankings use social media and interactive functionalities of their web sites, however, mostly for promotional reasons, and they are targeted at new students. The usage of blogs, though, is not so common and in most cases, blogs are independent platforms, not managed but supported by organizations. Managers and specialists point to lack of resources, insufficient users’ engagement and lack of strategic approach to social media as the main reasons of not advancing in the usage of blogs and social media platforms.

Keywords: blogs, social media marketing, higher education institutions, business schools, value co-creation

Procedia PDF Downloads 263

25186 Towards Kurdish Internet Linguistics: A Case Study on the Impact of Social Media on Kurdish Language

Authors: Karwan K. Abdalrahman

Abstract:

Due to the impacts of the internet and social media, new words and expressions enter the Kurdish language, and a number of familiarized words get new meanings. The case is especially true when the technique of transliteration is taken into consideration. Through transliteration, a number of selected words widely used on social media are entering the Kurdish media discourse. In addition, a selected number of Kurdish words get new cultural and psychological meanings. The significance of this study is to delve into the process of word formation in the Kurdish language and explore how new words and expressions are formed by social media users and got public recognition. First, the study investigates the English words that enter the Kurdish language through different social media platforms. All of these words are transliterated and are used in spoken and written discourses. Second, there are a specific number of Kurdish words that got new meanings in social media. As for these words, there are psychological and cultural factors that make people use these expressions for specific political reasons. It can be argued that they have an indirect political message along with their new linguistic usages. This is a qualitative study analyzing video content that was published in the last two years on social media platforms, including Facebook and YouTube. The collected data was analyzed based on the themes discussed above. The findings of the research can be summarized as follows: the widely used transliterated words have entered both the spoken and written discourses. Authors in online and offline newspapers, TV presenters, literary writers, columnists are using these new expressions in their writings. As for the Kurdish words with new meanings, they are also widely used for psychological, cultural, and political reasons.

Keywords: Kurdish language, social media, new meanings, transliteration, vocabulary

Procedia PDF Downloads 180

25185 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 883

25184 Translation And Cultural Adaptation Of The Rivermead Behavioural Memory Test–3rd Edition Into the Arabic Language

Authors: Mai Alharthy, Agnes Shiel, Hynes Sinead

Abstract:

Objectives: The objectives of the study are to translate and culturally adapt the RBMT-3 to be appropriate for use within an Arabic-speaking population and to achieve maximum equivalency between the translated and original versions and to evaluate the psychometric properties of the Arabic version of the RBMT-3. Participants' numbers are 16 (10 females and 6 males). All participants are bilingual speakers of Arabic and English, above 18 years old and with no current nor past memory impairment. Methods: The study was conducted in two stages: Translation and cultural adaptation stage: Forward and backward translations were completed by professional translators. Five out of the 14 RBMT-3 subtests required cultural adaptations. Half of the faces in the face recognition subtests were replaced with Arabic faces by a professional photographer. Pictures that are irrelevant to the Arabic culture in the picture recognition subtests were replaced. Names, story and orientations subtests were also adapted to suit the Arabic culture. An expert committee was formed to compare the translated and original versions and to advise on further changes required for test materials. Validation of the Arabic RBMT-3- pilot: 16 Participants were tested on version 1 of the English version and the two versions of the Arabic RBMT-3 ( counterbalanced ). The assessment period was 6 weeks long, with two weeks gap between tests. All assessments took place in a quiet room in the National University of Ireland Galway. Two qualified occupational therapists completed the assessments. Results: Wilcox signed-rank test was used to compare between subtest scores. Significant differences were found in the story, orientation and names subtests between the English and Arabic versions. No significant differences were found in subtests from both Arabic versions except for the story subtest. Conclusion: The story and orientation subtests should be revised by the expert committee members to make further adaptations. The rest of the Arabic RBMT-3 subtests are equivalent to the subtests of the English version. The psychometric properties of the Arabic RBMT-3 will be investigated in a larger Arabic-speaking sample in Saudi Arabia. The outcome of this research is to provide clinicians and researchers with a reliable tool to assess memory problems in Arabic speaking population.

Keywords: memory impairment, neuropsychological assessment, cultural adaptation, cognitive assessment

Procedia PDF Downloads 255

25183 End to End Monitoring in Oracle Fusion Middleware for Data Verification

Authors: Syed Kashif Ali, Usman Javaid, Abdullah Chohan

Abstract:

In large enterprises multiple departments use different sort of information systems and databases according to their needs. These systems are independent and heterogeneous in nature and sharing information/data between these systems is not an easy task. The usage of middleware technologies have made data sharing between systems very easy. However, monitoring the exchange of data/information for verification purposes between target and source systems is often complex or impossible for maintenance department due to security/access privileges on target and source systems. In this paper, we are intended to present our experience of an end to end data monitoring approach at middle ware level implemented in Oracle BPEL for data verification without any help of monitoring tool.

Keywords: service level agreement, SOA, BPEL, oracle fusion middleware, web service monitoring

Procedia PDF Downloads 478

25182 Emotion-Convolutional Neural Network for Perceiving Stress from Audio Signals: A Brain Chemistry Approach

Authors: Anup Anand Deshmukh, Catherine Soladie, Renaud Seguier

Abstract:

Emotion plays a key role in many applications like healthcare, to gather patients’ emotional behavior. Unlike typical ASR (Automated Speech Recognition) problems which focus on 'what was said', it is equally important to understand 'how it was said.' There are certain emotions which are given more importance due to their effectiveness in understanding human feelings. In this paper, we propose an approach that models human stress from audio signals. The research challenge in speech emotion detection is finding the appropriate set of acoustic features corresponding to an emotion. Another difficulty lies in defining the very meaning of emotion and being able to categorize it in a precise manner. Supervised Machine Learning models, including state of the art Deep Learning classification methods, rely on the availability of clean and labelled data. One of the problems in affective computation is the limited amount of annotated data. The existing labelled emotions datasets are highly subjective to the perception of the annotator. We address the first issue of feature selection by exploiting the use of traditional MFCC (Mel-Frequency Cepstral Coefficients) features in Convolutional Neural Network. Our proposed Emo-CNN (Emotion-CNN) architecture treats speech representations in a manner similar to how CNN’s treat images in a vision problem. Our experiments show that Emo-CNN consistently and significantly outperforms the popular existing methods over multiple datasets. It achieves 90.2% categorical accuracy on the Emo-DB dataset. We claim that Emo-CNN is robust to speaker variations and environmental distortions. The proposed approach achieves 85.5% speaker-dependant categorical accuracy for SAVEE (Surrey Audio-Visual Expressed Emotion) dataset, beating the existing CNN based approach by 10.2%. To tackle the second problem of subjectivity in stress labels, we use Lovheim’s cube, which is a 3-dimensional projection of emotions. Monoamine neurotransmitters are a type of chemical messengers in the brain that transmits signals on perceiving emotions. The cube aims at explaining the relationship between these neurotransmitters and the positions of emotions in 3D space. The learnt emotion representations from the Emo-CNN are mapped to the cube using three component PCA (Principal Component Analysis) which is then used to model human stress. This proposed approach not only circumvents the need for labelled stress data but also complies with the psychological theory of emotions given by Lovheim’s cube. We believe that this work is the first step towards creating a connection between Artificial Intelligence and the chemistry of human emotions.

Keywords: deep learning, brain chemistry, emotion perception, Lovheim's cube

Procedia PDF Downloads 153

25181 Emulation Model in Architectural Education

Authors: Ö. Şenyiğit, A. Çolak

Abstract:

It is of great importance for an architectural student to know the parameters through which he/she can conduct his/her design and makes his/her design effective in architectural education. Therefore; an empirical application study was carried out through the designing activity using the emulation model to support the design and design approaches of architectural students. During the investigation period, studies were done on the basic design elements and principles of the fall semester, and the emulation model, one of the designing methods that constitute the subject of the study, was fictionalized as three phased “recognition-interpretation-application”. As a result of the study, it was observed that when students were given a key method during the design process, their awareness increased and their aspects improved as well.

Keywords: basic design, design education, design methods, emulation

Procedia PDF Downloads 234

25180 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 160

25179 WiFi Data Offloading: Bundling Method in a Canvas Business Model

Authors: Majid Mokhtarnia, Alireza Amini

Abstract:

Mobile operators deal with increasing in the data traffic as a critical issue. As a result, a vital responsibility of the operators is to deal with such a trend in order to create added values. This paper addresses a bundling method in a Canvas business model in a WiFi Data Offloading (WDO) strategy by which some elements of the model may be affected. In the proposed method, it is supposed to sell a number of data packages for subscribers in which there are some packages with a free given volume of data-offloaded WiFi complimentary. The paper on hands analyses this method in the views of attractiveness and profitability. The results demonstrate that the quality of implementation of the WDO strongly affects the final result and helps the decision maker to make the best one.

Keywords: bundling, canvas business model, telecommunication, WiFi data offloading

Procedia PDF Downloads 198

25178 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 431

25177 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks

Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam

Abstract:

In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.

Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion

Procedia PDF Downloads 122

25176 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: data mining, textile production, decision trees, classification

Procedia PDF Downloads 349

25175 Investigation of Delivery of Triple Play Data in GE-PON Fiber to the Home Network

Authors: Ashima Anurag Sharma

Abstract:

Optical fiber based networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This research paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparison between various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be decreases due to increase in bit error rate.

Keywords: BER, PON, TDMPON, GPON, CWDM, OLT, ONT

Procedia PDF Downloads 527

25174 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 559

25173 Accounting Quality and The Adoption of IFRS: Evidence from China

Authors: Khaldoon G. Albitar, Hassan Y. Kikhia, Jin P. Zhang

Abstract:

Since 2007, all companies listed on both Shanghai Stock Exchange and Shenzhen Stock Exchange are required to prepare their consolidated financial statements in accordance with International Financial Reporting Standards (IFRS). This study investigates the impact of adopting IFRS on accounting quality for a sample of listed on Chinese companies during the period 2003-2013 with sample of 10846 observations over a four-year period before and a five-year period after the adoption of IFRS. This study tests whether the level of earnings management is significantly lower after the adoption of IFRS, and reported earnings is more value relevant during the IFRS period by using the Ohlson model and Jones model, as modified by Dechow. The empirical results show that accounting quality improved with lower earnings management and higher value relevant after the adoption of IFRS in China. The current study contributes to the literature on IFRS adoption and earning quality in two ways. First, As most of the existing studies on earnings quality and IFRS have been conducted on data from the U.S and European countries, this study fills a gap in the existing literature by studying the effect of adoption of IFRS on earnings quality in an emerging market. Second, the findings of our study have important implications for policymakers, auditors, multinational firms, and users of financial reports. As the rapid growth of China's economy gains global recognition, the Chinese stock market is capturing the attention of international investor.

Keywords: international financial reporting standards (ifrs), accounting quality, earnings management, value relevance, china

Procedia PDF Downloads 333

25172 Nation Branding: Guidelines for Identity Development and Image Perception of Thailand Brand in Health and Wellness Tourism

Authors: Jiraporn Prommaha

Abstract:

The purpose of this research is to study the development of Thailand Brand Identity and the perception of its image in order to find any guidelines for the identity development and the image perception of Thailand Brand in Health and Wellness Tourism. The paper is conducted through mixed methods research, both the qualitative and quantitative researches. The qualitative focuses on the in-depth interview of executive administrations from public and private sectors involved scholars and experts in identity and image issue, main 11 people. The quantitative research was done by the questionnaires to collect data from foreign tourists 800; Chinese tourists 400 and UK tourists 400. The technique used for this was the Exploratory Factor Analysis (EFA), this was to determine the relation between the structures of the variables by categorizing the variables into group by applying the Varimax rotation technique. This technique showed recognition the Thailand brand image related to the 2 countries, China and UK. The results found that guidelines for brand identity development and image perception of health and wellness tourism in Thailand; as following (1) Develop communication in order to understanding of the meaning of the word 'Health and beauty tourism' throughout the country, (2) Develop human resources as a national agenda, (3) Develop awareness rising in the conservation and preservation of natural resources of the country, (4) Develop the cooperation of all stakeholders in Health and Wellness Businesses, (5) Develop digital communication throughout the country and (6) Develop safety in Tourism.

Keywords: brand identity, image perception, nation branding, health and wellness tourism, mixed methods research

Procedia PDF Downloads 200

25171 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic

Authors: Fei Gao, Rodolfo C. Raga Jr.

Abstract:

This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.

Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle

Procedia PDF Downloads 73

25170 Intelligent Prediction System for Diagnosis of Heart Attack

Authors: Oluwaponmile David Alao

Abstract:

Due to an increase in the death rate as a result of heart attack. There is need to develop a system that can be useful in the diagnosis of the disease at the medical centre. This system will help in preventing misdiagnosis that may occur from the medical practitioner or the physicians. In this research work, heart disease dataset obtained from UCI repository has been used to develop an intelligent prediction diagnosis system. The system is modeled on a feedforwad neural network and trained with back propagation neural network. A recognition rate of 86% is obtained from the testing of the network.

Keywords: heart disease, artificial neural network, diagnosis, prediction system

Procedia PDF Downloads 448

25169 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0

Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini

Abstract:

Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.

Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling

Procedia PDF Downloads 93

25168 Enabling Oral Communication and Accelerating Recovery: The Creation of a Novel Low-Cost Electroencephalography-Based Brain-Computer Interface for the Differently Abled

Authors: Rishabh Ambavanekar

Abstract:

Expressive Aphasia (EA) is an oral disability, common among stroke victims, in which the Broca’s area of the brain is damaged, interfering with verbal communication abilities. EA currently has no technological solutions and its only current viable solutions are inefficient or only available to the affluent. This prompts the need for an affordable, innovative solution to facilitate recovery and assist in speech generation. This project proposes a novel concept: using a wearable low-cost electroencephalography (EEG) device-based brain-computer interface (BCI) to translate a user’s inner dialogue into words. A low-cost EEG device was developed and found to be 10 to 100 times less expensive than any current EEG device on the market. As part of the BCI, a machine learning (ML) model was developed and trained using the EEG data. Two stages of testing were conducted to analyze the effectiveness of the device: a proof-of-concept and a final solution test. The proof-of-concept test demonstrated an average accuracy of above 90% and the final solution test demonstrated an average accuracy of above 75%. These two successful tests were used as a basis to demonstrate the viability of BCI research in developing lower-cost verbal communication devices. Additionally, the device proved to not only enable users to verbally communicate but has the potential to also assist in accelerated recovery from the disorder.

Keywords: neurotechnology, brain-computer interface, neuroscience, human-machine interface, BCI, HMI, aphasia, verbal disability, stroke, low-cost, machine learning, ML, image recognition, EEG, signal analysis

Procedia PDF Downloads 118

25167 Classification of Coughing and Breathing Activities Using Wearable and a Light-Weight DL Model

Authors: Subham Ghosh, Arnab Nandi

Abstract:

Background: The proliferation of Wireless Body Area Networks (WBAN) and Internet of Things (IoT) applications demonstrates the potential for continuous monitoring of physical changes in the body. These technologies are vital for health monitoring tasks, such as identifying coughing and breathing activities, which are necessary for disease diagnosis and management. Monitoring activities such as coughing and deep breathing can provide valuable insights into a variety of medical issues. Wearable radio-based antenna sensors, which are lightweight and easy to incorporate into clothing or portable goods, provide continuous monitoring. This mobility gives it a substantial advantage over stationary environmental sensors like as cameras and radar, which are constrained to certain places. Furthermore, using compressive techniques provides benefits such as reduced data transmission speeds and memory needs. These wearable sensors offer more advanced and diverse health monitoring capabilities. Methodology: This study analyzes the feasibility of using a semi-flexible antenna operating at 2.4 GHz (ISM band) and positioned around the neck and near the mouth to identify three activities: coughing, deep breathing, and idleness. Vector network analyzer (VNA) is used to collect time-varying complex reflection coefficient data from perturbed antenna nearfield. The reflection coefficient (S11) conveys nuanced information caused by simultaneous variations in the nearfield radiation of three activities across time. The signatures are sparsely represented with gaussian windowed Gabor spectrograms. The Gabor spectrogram is used as a sparse representation approach, which reassigns the ridges of the spectrogram images to improve their resolution and focus on essential components. The antenna is biocompatible in terms of specific absorption rate (SAR). The sparsely represented Gabor spectrogram pictures are fed into a lightweight deep learning (DL) model for feature extraction and classification. Two antenna locations are investigated in order to determine the most effective localization for three different activities. Findings: Cross-validation techniques were used on data from both locations. Due to the complex form of the recorded S11, separate analyzes and assessments were performed on the magnitude, phase, and their combination. The combination of magnitude and phase fared better than the separate analyses. Various sliding window sizes, ranging from 1 to 5 seconds, were tested to find the best window for activity classification. It was discovered that a neck-mounted design was effective at detecting the three unique behaviors.

Keywords: activity recognition, antenna, deep-learning, time-frequency

Procedia PDF Downloads 3

25166 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption

Authors: Waziri Victor Onomza, John K. Alhassan, Idris Ismaila, Noel Dogonyaro Moses

Abstract:

This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute theoretical presentations in high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.

Keywords: big data analytics, security, privacy, bootstrapping, homomorphic, homomorphic encryption scheme

Procedia PDF Downloads 376

25165 Protecting Privacy and Data Security in Online Business

Authors: Bilquis Ferdousi

Abstract:

With the exponential growth of the online business, the threat to consumers’ privacy and data security has become a serious challenge. This literature review-based study focuses on a better understanding of those threats and what legislative measures have been taken to address those challenges. Research shows that people are increasingly involved in online business using different digital devices and platforms, although this practice varies based on age groups. The threat to consumers’ privacy and data security is a serious hindrance in developing trust among consumers in online businesses. There are some legislative measures taken at the federal and state level to protect consumers’ privacy and data security. The study was based on an extensive review of current literature on protecting consumers’ privacy and data security and legislative measures that have been taken.

Keywords: privacy, data security, legislation, online business

Procedia PDF Downloads 104

25164 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm

Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan

Abstract:

This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.

Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data

Procedia PDF Downloads 219

25163 An Analysis of Privacy and Security for Internet of Things Applications

Authors: Dhananjay Singh, M. Abdullah-Al-Wadud

Abstract:

The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.

Keywords: Internet of Things (IoT), message authentication, privacy, security

Procedia PDF Downloads 381

25162 Prosodic Realization of Focus in the Public Speeches Delivered by Spanish Learners of English and English Native Speakers

Authors: Raúl Jiménez Vilches

Abstract:

Native (L1) speakers can mark prosodically one part of an utterance and make it more relevant as opposed to the rest of the constituents. Conversely, non-native (L2) speakers encounter problems when it comes to marking prosodically information structure in English. In fact, the L2 speaker’s choice for the prosodic realization of focus is not so clear and often obscures the intended pragmatic meaning and the communicative value in general. This paper reports some of the findings obtained in an L2 prosodic training course for Spanish learners of English within the context of public speaking. More specifically, it analyses the effects of the course experiment in relation to the non-native production of the tonic syllable to mark focus and compares it with the public speeches delivered by native English speakers. The whole experimental training was executed throughout eighteen input sessions (1,440 minutes total time) and all the sessions took place in the classroom. In particular, the first part of the course provided explicit instruction on the recognition and production of the tonic syllable and how the tonic syllable is used to express focus. The non-native and native oral presentations were acoustically analyzed using Praat software for speech analysis (7,356 words in total). The investigation adopted mixed and embedded methodologies. Quantitative information is needed when measuring acoustically the phonetic realization of focus. Qualitative data such as questionnaires, interviews, and observations were also used to interpret the quantitative data. The embedded experiment design was implemented through the analysis of the public speeches before and after the intervention. Results indicate that, even after the L2 prosodic training course, Spanish learners of English still show some major inconsistencies in marking focus effectively. Although there was occasional improvement regarding the choice for location and word classes, Spanish learners were, in general, far from achieving similar results to the ones obtained by the English native speakers in the two types of focus. The prosodic realization of focus seems to be one of the hardest areas of the English prosodic system to be mastered by Spanish learners. A funded research project is in the process of moving the present classroom-based experiment to an online environment (mobile app) and determining whether there is a more effective focus usage through CAPT (Computer-Assisted Pronunciation) tools.

Keywords: focus, prosody, public speaking, Spanish learners of English

Procedia PDF Downloads 98

25161 Thermo-Hydro-Mechanical-Chemical Coupling in Enhanced Geothermal Systems: Challenges and Opportunities

Authors: Esmael Makarian, Ayub Elyasi, Fatemeh Saberi, Olusegun Stanley Tomomewo

Abstract:

Geothermal reservoirs (GTRs) have garnered global recognition as a sustainable energy source. The Thermo-Hydro-Mechanical-Chemical (THMC) integration coupling proves to be a practical and effective method for optimizing production in GTRs. The study outcomes demonstrate that THMC coupling serves as a versatile and valuable tool, offering in-depth insights into GTRs and enhancing their operational efficiency. This is achieved through temperature analysis and pressure changes and their impacts on mechanical properties, structural integrity, fracture aperture, permeability, and heat extraction efficiency. Moreover, THMC coupling facilitates potential benefits assessment and risks associated with different geothermal technologies, considering the complex thermal, hydraulic, mechanical, and chemical interactions within the reservoirs. However, THMC-coupling utilization in GTRs presents a multitude of challenges. These challenges include accurately modeling and predicting behavior due to the interconnected nature of processes, limited data availability leading to uncertainties, induced seismic events risks to nearby communities, scaling and mineral deposition reducing operational efficiency, and reservoirs' long-term sustainability. In addition, material degradation, environmental impacts, technical challenges in monitoring and control, accurate assessment of resource potential, and regulatory and social acceptance further complicate geothermal projects. Addressing these multifaceted challenges is crucial for successful geothermal energy resources sustainable utilization. This paper aims to illuminate the challenges and opportunities associated with THMC coupling in enhanced geothermal systems. Practical solutions and strategies for mitigating these challenges are discussed, emphasizing the need for interdisciplinary approaches, improved data collection and modeling techniques, and advanced monitoring and control systems. Overcoming these challenges is imperative for unlocking the full potential of geothermal energy making a substantial contribution to the global energy transition and sustainable development.

Keywords: geothermal reservoirs, THMC coupling, interdisciplinary approaches, challenges and opportunities, sustainable utilization

Procedia PDF Downloads 68