Search results for: dataset quality
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 10496

Search results for: dataset quality

10436 PaSA: A Dataset for Patent Sentiment Analysis to Highlight Patent Paragraphs

Authors: Renukswamy Chikkamath, Vishvapalsinhji Ramsinh Parmar, Christoph Hewel, Markus Endres

Abstract:

Given a patent document, identifying distinct semantic annotations is an interesting research aspect. Text annotation helps the patent practitioners such as examiners and patent attorneys to quickly identify the key arguments of any invention, successively providing a timely marking of a patent text. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This semantic annotation process is laborious and time-consuming. To alleviate such a problem, we proposed a dataset to train machine learning algorithms to automate the highlighting process. The contributions of this work are: i) we developed a multi-class dataset of size 150k samples by traversing USPTO patents over a decade, ii) articulated statistics and distributions of data using imperative exploratory data analysis, iii) baseline Machine Learning models are developed to utilize the dataset to address patent paragraph highlighting task, and iv) future path to extend this work using Deep Learning and domain-specific pre-trained language models to develop a tool to highlight is provided. This work assists patent practitioners in highlighting semantic information automatically and aids in creating a sustainable and efficient patent analysis using the aptitude of machine learning.

Keywords: machine learning, patents, patent sentiment analysis, patent information retrieval

Procedia PDF Downloads 80
10435 Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli

Abstract:

Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are playing an increasingly important role in automated customer service. These models, which are able to recognize complex relationships between input and output sequences, are crucial for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the focus of the model during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the domain of chatbots using the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Utilizing the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k=3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k=3). These results emphasize the crucial influence of selecting an appropriate attention-scoring function in improving the performance of seq2seq models for chatbots. Particularly, the model that integrates tanh activation proves to be a promising approach to improve the quality of chatbots in the customer support context.

Keywords: attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence

Procedia PDF Downloads 71
10434 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka

Procedia PDF Downloads 290
10433 Hydroinformatics of Smart Cities: Real-Time Water Quality Prediction Model Using a Hybrid Approach

Authors: Elisa Coraggio, Dawei Han, Weiru Liu, Theo Tryfonas

Abstract:

Water is one of the most important resources for human society. The world is currently undergoing a wave of urban growth, and pollution problems are of a great impact. Monitoring water quality is a key task for the future of the environment and human species. In recent times, researchers, using Smart Cities technologies are trying to mitigate the problems generated by the population growth in urban areas. The availability of huge amounts of data collected by a pervasive urban IoT can increase the transparency of decision making. Several services have already been implemented in Smart Cities, but more and more services will be involved in the future. Water quality monitoring can successfully be implemented in the urban IoT. The combination of water quality sensors, cloud computing, smart city infrastructure, and IoT technology can lead to a bright future for environmental monitoring. In the past decades, lots of effort has been put on monitoring and predicting water quality using traditional approaches based on manual collection and laboratory-based analysis, which are slow and laborious. The present study proposes a methodology for implementing a water quality prediction model using artificial intelligence techniques and comparing the results obtained with different algorithms. Furthermore, a 3D numerical model will be created using the software D-Water Quality, and simulation results will be used as a training dataset for the artificial intelligence algorithm. This study derives the methodology and demonstrates its implementation based on information and data collected at the floating harbour in the city of Bristol (UK). The city of Bristol is blessed with the Bristol-Is-Open infrastructure that includes Wi-Fi network and virtual machines. It was also named the UK ’s smartest city in 2017.In recent times, researchers, using Smart Cities technologies are trying to mitigate the problems generated by the population growth in urban areas. The availability of huge amounts of data collected by a pervasive urban IoT can increase the transparency of decision making. Several services have already been implemented in Smart Cities, but more and more services will be involved in the future. Water quality monitoring can successfully be implemented in the urban IoT. The combination of water quality sensors, cloud computing, smart city infrastructure, and IoT technology can lead to a bright future for the environment monitoring. In the past decades, lots of effort has been put on monitoring and predicting water quality using traditional approaches based on manual collection and laboratory-based analysis, which are slow and laborious. The present study proposes a new methodology for implementing a water quality prediction model using artificial intelligence techniques and comparing the results obtained with different algorithms. Furthermore, a 3D numerical model will be created using the software D-Water Quality, and simulation results will be used as a training dataset for the Artificial Intelligence algorithm. This study derives the methodology and demonstrate its implementation based on information and data collected at the floating harbour in the city of Bristol (UK). The city of Bristol is blessed with the Bristol-Is-Open infrastructure that includes Wi-Fi network and virtual machines. It was also named the UK ’s smartest city in 2017.

Keywords: artificial intelligence, hydroinformatics, numerical modelling, smart cities, water quality

Procedia PDF Downloads 173
10432 Measurements of Service Quality vs Customer Satisfaction in Government Owned Retail Store at Kochi

Authors: N. S. Ajisha

Abstract:

In today’s competitive world the quality of the service you deliver is one of the important factor that determine customer satisfaction. Service quality is considered to be one important determinant to evaluate customer satisfaction and the relationship between service quality and customer satisfaction is considered as the foundation in researches on customer satisfaction. This research examines to do a gap analysis between the perception and expectation of the services delivered and find relation between the service quality and customer satisfaction. Service quality is found out here using the SERVQUAL model. And it finds out the dimension of service quality which is more important to measure customer satisfaction. The dimensions which we measure using SERVQUAL include the tangibles, reliability, responsiveness, assurance, and empathy. This study involves primary data collection like market survey.

Keywords: customer satisfaction, service quality, retail service quality, Kochi

Procedia PDF Downloads 541
10431 Improving Library Service Quality in Local City of Indonesia

Authors: Prima Fithri, Afri Adnan, Verra Syahmer

Abstract:

Library as a public service should be able to provide excellent and quality service. The criteria that should be available in the library is having the collection which relevant, actual and reliable, qualified and professional employee, delivery system that prompt and appropriate as well as supported by proper infrastructure. The aim of this study is to show the performance as an effort to provide quality of services that appropriate with the needs and desires of user. Then, in this research has been carried out the calculation of the gap between the perceptions and expectations of user about the services of the library. The Sevqual and QFD methods are used in this study. Servqual method for measuring the value of the gap that occurs in the dimensions of service quality and QFD method for determine priority repairment that need to be done to improve the quality of services that occur in the dimensions of service quality. From 97 questionaires, shows that value of the gap that occurs in the dimensions of service quality using by Servqual is 27.7% dimensions of responsiveness. It show how much user expectations are not met by the quality of existing services. Construction of the library and standard library becomes priority improvements that need to be done to improve the quality of service that occurs in the dimensions of service quality using the QFD.

Keywords: library, service quality, service quality, QFD

Procedia PDF Downloads 560
10430 Unattended Crowdsensing Method to Monitor the Quality Condition of Dirt Roads

Authors: Matias Micheletto, Rodrigo Santos, Sergio F. Ochoa

Abstract:

In developing countries, the most roads in rural areas are dirt road. They require frequent maintenance since are affected by erosive events, such as rain or wind, and the transit of heavy-weight trucks and machinery. Early detection of damages on the road condition is a key aspect, since it allows to reduce the main-tenance time and cost, and also the limitations for other vehicles to travel through. Most proposals that help address this problem require the explicit participation of drivers, a permanent internet connection, or important instrumentation in vehicles or roads. These constraints limit the suitability of these proposals when applied into developing regions, like in Latin America. This paper proposes an alternative method, based on unattended crowdsensing, to determine the quality of dirt roads in rural areas. This method involves the use of a mobile application that complements the road condition surveys carried out by organizations in charge of the road network maintenance, giving them early warnings about road areas that could be requiring maintenance. Drivers can also take advantage of the early warnings while they move through these roads. The method was evaluated using information from a public dataset. Although they are preliminary, the results indicate the proposal is potentially suitable to provide awareness about dirt roads condition to drivers, transportation authority and road maintenance companies.

Keywords: dirt roads automatic quality assessment, collaborative system, unattended crowdsensing method, roads quality awareness provision

Procedia PDF Downloads 192
10429 Empirical Exploration of Correlations between Software Design Measures: A Replication Study

Authors: Jehad Al Dallal

Abstract:

Software engineers apply different measures to quantify the quality of software design. These measures consider artifacts developed at low or high level software design phases. The results are used to point to design weaknesses and to indicate design points that have to be restructured. Understanding the relationship among the quality measures and among the design quality aspects considered by these measures is important to interpreting the impact of a measure for a quality aspect on other potentially related aspects. In addition, exploring the relationship between quality measures helps to explain the impact of different quality measures on external quality aspects, such as reliability and maintainability. In this paper, we report a replication study that empirically explores the correlation between six well known and commonly applied design quality measures. These measures consider several quality aspects, including complexity, cohesion, coupling, and inheritance. The results indicate that inheritance measures are weakly correlated to other measures, whereas complexity, coupling, and cohesion measures are mostly strongly correlated.  

Keywords: quality attribute, quality measure, software design quality, Spearman correlation

Procedia PDF Downloads 290
10428 Enhancing Cultural Heritage Data Retrieval by Mapping COURAGE to CIDOC Conceptual Reference Model

Authors: Ghazal Faraj, Andras Micsik

Abstract:

The CIDOC Conceptual Reference Model (CRM) is an extensible ontology that provides integrated access to heterogeneous and digital datasets. The CIDOC-CRM offers a “semantic glue” intended to promote accessibility to several diverse and dispersed sources of cultural heritage data. That is achieved by providing a formal structure for the implicit and explicit concepts and their relationships in the cultural heritage field. The COURAGE (“Cultural Opposition – Understanding the CultuRal HeritAGE of Dissent in the Former Socialist Countries”) project aimed to explore methods about socialist-era cultural resistance during 1950-1990 and planned to serve as a basis for further narratives and digital humanities (DH) research. This project highlights the diversity of flourished alternative cultural scenes in Eastern Europe before 1989. Moreover, the dataset of COURAGE is an online RDF-based registry that consists of historical people, organizations, collections, and featured items. For increasing the inter-links between different datasets and retrieving more relevant data from various data silos, a shared federated ontology for reconciled data is needed. As a first step towards these goals, a full understanding of the CIDOC CRM ontology (target ontology), as well as the COURAGE dataset, was required to start the work. Subsequently, the queries toward the ontology were determined, and a table of equivalent properties from COURAGE and CIDOC CRM was created. The structural diagrams that clarify the mapping process and construct queries are on progress to map person, organization, and collection entities to the ontology. Through mapping the COURAGE dataset to CIDOC-CRM ontology, the dataset will have a common ontological foundation with several other datasets. Therefore, the expected results are: 1) retrieving more detailed data about existing entities, 2) retrieving new entities’ data, 3) aligning COURAGE dataset to a standard vocabulary, 4) running distributed SPARQL queries over several CIDOC-CRM datasets and testing the potentials of distributed query answering using SPARQL. The next plan is to map CIDOC-CRM to other upper-level ontologies or large datasets (e.g., DBpedia, Wikidata), and address similar questions on a wide variety of knowledge bases.

Keywords: CIDOC CRM, cultural heritage data, COURAGE dataset, ontology alignment

Procedia PDF Downloads 138
10427 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier

Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh

Abstract:

This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.

Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems

Procedia PDF Downloads 32
10426 Plant Identification Using Convolution Neural Network and Vision Transformer-Based Models

Authors: Virender Singh, Mathew Rees, Simon Hampton, Sivaram Annadurai

Abstract:

Plant identification is a challenging task that aims to identify the family, genus, and species according to plant morphological features. Automated deep learning-based computer vision algorithms are widely used for identifying plants and can help users narrow down the possibilities. However, numerous morphological similarities between and within species render correct classification difficult. In this paper, we tested custom convolution neural network (CNN) and vision transformer (ViT) based models using the PyTorch framework to classify plants. We used a large dataset of 88,000 provided by the Royal Horticultural Society (RHS) and a smaller dataset of 16,000 images from the PlantClef 2015 dataset for classifying plants at genus and species levels, respectively. Our results show that for classifying plants at the genus level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420 and other state-of-the-art CNN-based models suggested in previous studies on a similar dataset. ViT model achieved top accuracy of 83.3% for classifying plants at the genus level. For classifying plants at the species level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420, with a top accuracy of 92.5%. We show that the correct set of augmentation techniques plays an important role in classification success. In conclusion, these results could help end users, professionals and the general public alike in identifying plants quicker and with improved accuracy.

Keywords: plant identification, CNN, image processing, vision transformer, classification

Procedia PDF Downloads 88
10425 Logistics Model for Improving Quality in Railway Transport

Authors: Eva Nedeliakova, Juraj Camaj, Jaroslav Masek

Abstract:

This contribution is focused on the methodology for identifying levels of quality and improving quality through new logistics model in railway transport. It is oriented on the application of dynamic quality models, which represent an innovative method of evaluation quality services. Through this conception, time factor, expected, and perceived quality in each moment of the transportation process within logistics chain can be taken into account. Various models describe the improvement of the quality which emphases the time factor throughout the whole transportation logistics chain. Quality of services in railway transport can be determined by the existing level of service quality, by detecting the causes of dissatisfaction employees but also customers, to uncover strengths and weaknesses. This new logistics model is able to recognize critical processes in logistic chain. It includes service quality rating that must respect its specific properties, which are unrepeatability, impalpability, their use right at the time they are provided and particularly changeability, which is significant factor in the conditions of rail transport as well. These peculiarities influence the quality of service regarding the constantly increasing requirements and that result in new ways of finding progressive attitudes towards the service quality rating.

Keywords: logistics model, quality, railway transport

Procedia PDF Downloads 555
10424 PatchMix: Learning Transferable Semi-Supervised Representation by Predicting Patches

Authors: Arpit Rai

Abstract:

In this work, we propose PatchMix, a semi-supervised method for pre-training visual representations. PatchMix mixes patches of two images and then solves an auxiliary task of predicting the label of each patch in the mixed image. Our experiments on the CIFAR-10, 100 and the SVHN dataset show that the representations learned by this method encodes useful information for transfer to new tasks and outperform the baseline Residual Network encoders by on CIFAR 10 by 12% on ResNet 101 and 2% on ResNet-56, by 4% on CIFAR-100 on ResNet101 and by 6% on SVHN dataset on the ResNet-101 baseline model.

Keywords: self-supervised learning, representation learning, computer vision, generalization

Procedia PDF Downloads 81
10423 Rd-PLS Regression: From the Analysis of Two Blocks of Variables to Path Modeling

Authors: E. Tchandao Mangamana, V. Cariou, E. Vigneau, R. Glele Kakai, E. M. Qannari

Abstract:

A new definition of a latent variable associated with a dataset makes it possible to propose variants of the PLS2 regression and the multi-block PLS (MB-PLS). We shall refer to these variants as Rd-PLS regression and Rd-MB-PLS respectively because they are inspired by both Redundancy analysis and PLS regression. Usually, a latent variable t associated with a dataset Z is defined as a linear combination of the variables of Z with the constraint that the length of the loading weights vector equals 1. Formally, t=Zw with ‖w‖=1. Denoting by Z' the transpose of Z, we define herein, a latent variable by t=ZZ’q with the constraint that the auxiliary variable q has a norm equal to 1. This new definition of a latent variable entails that, as previously, t is a linear combination of the variables in Z and, in addition, the loading vector w=Z’q is constrained to be a linear combination of the rows of Z. More importantly, t could be interpreted as a kind of projection of the auxiliary variable q onto the space generated by the variables in Z, since it is collinear to the first PLS1 component of q onto Z. Consider the situation in which we aim to predict a dataset Y from another dataset X. These two datasets relate to the same individuals and are assumed to be centered. Let us consider a latent variable u=YY’q to which we associate the variable t= XX’YY’q. Rd-PLS consists in seeking q (and therefore u and t) so that the covariance between t and u is maximum. The solution to this problem is straightforward and consists in setting q to the eigenvector of YY’XX’YY’ associated with the largest eigenvalue. For the determination of higher order components, we deflate X and Y with respect to the latent variable t. Extending Rd-PLS to the context of multi-block data is relatively easy. Starting from a latent variable u=YY’q, we consider its ‘projection’ on the space generated by the variables of each block Xk (k=1, ..., K) namely, tk= XkXk'YY’q. Thereafter, Rd-MB-PLS seeks q in order to maximize the average of the covariances of u with tk (k=1, ..., K). The solution to this problem is given by q, eigenvector of YY’XX’YY’, where X is the dataset obtained by horizontally merging datasets Xk (k=1, ..., K). For the determination of latent variables of order higher than 1, we use a deflation of Y and Xk with respect to the variable t= XX’YY’q. In the same vein, extending Rd-MB-PLS to the path modeling setting is straightforward. Methods are illustrated on the basis of case studies and performance of Rd-PLS and Rd-MB-PLS in terms of prediction is compared to that of PLS2 and MB-PLS.

Keywords: multiblock data analysis, partial least squares regression, path modeling, redundancy analysis

Procedia PDF Downloads 134
10422 Automated Evaluation Approach for Time-Dependent Question Answering Pairs on Web Crawler Based Question Answering System

Authors: Shraddha Chaudhary, Raksha Agarwal, Niladri Chatterjee

Abstract:

This work demonstrates a web crawler-based generalized end-to-end open domain Question Answering (QA) system. An efficient QA system requires a significant amount of domain knowledge to answer any question with the aim to find an exact and correct answer in the form of a number, a noun, a short phrase, or a brief piece of text for the user's questions. Analysis of the question, searching the relevant document, and choosing an answer are three important steps in a QA system. This work uses a web scraper (Beautiful Soup) to extract K-documents from the web. The value of K can be calibrated on the basis of a trade-off between time and accuracy. This is followed by a passage ranking process using the MS-Marco dataset trained on 500K queries to extract the most relevant text passage, to shorten the lengthy documents. Further, a QA system is used to extract the answers from the shortened documents based on the query and return the top 3 answers. For evaluation of such systems, accuracy is judged by the exact match between predicted answers and gold answers. But automatic evaluation methods fail due to the linguistic ambiguities inherent in the questions. Moreover, reference answers are often not exhaustive or are out of date. Hence correct answers predicted by the system are often judged incorrect according to the automated metrics. One such scenario arises from the original Google Natural Question (GNQ) dataset which was collected and made available in the year 2016. Use of any such dataset proves to be inefficient with respect to any questions that have time-varying answers. For illustration, if the query is where will be the next Olympics? Gold Answer for the above query as given in the GNQ dataset is “Tokyo”. Since the dataset was collected in the year 2016, and the next Olympics after 2016 were in 2020 that was in Tokyo which is absolutely correct. But if the same question is asked in 2022 then the answer is “Paris, 2024”. Consequently, any evaluation based on the GNQ dataset will be incorrect. Such erroneous predictions are usually given to human evaluators for further validation which is quite expensive and time-consuming. To address this erroneous evaluation, the present work proposes an automated approach for evaluating time-dependent question-answer pairs. In particular, it proposes a metric using the current timestamp along with top-n predicted answers from a given QA system. To test the proposed approach GNQ dataset has been used and the system achieved an accuracy of 78% for a test dataset comprising 100 QA pairs. This test data was automatically extracted using an analysis-based approach from 10K QA pairs of the GNQ dataset. The results obtained are encouraging. The proposed technique appears to have the possibility of developing into a useful scheme for gathering precise, reliable, and specific information in a real-time and efficient manner. Our subsequent experiments will be guided towards establishing the efficacy of the above system for a larger set of time-dependent QA pairs.

Keywords: web-based information retrieval, open domain question answering system, time-varying QA, QA evaluation

Procedia PDF Downloads 95
10421 Analysis, Design, and Implementation of Quality Management System for KSA Software Company

Authors: Omar Said Almushyt

Abstract:

Quality management, in all countries all over the world, has become recently necessary to face challenges among companies. Software companies in KSA suffer from two problems, namely, low customer satisfaction, and low product quality. Implementation of quality management for a software company can solve these problems, by improving the quality of products and enhancing customer satisfaction. This will lead the company to be competitive. Introducing quality management system onto system analysis followed by system design and finally implementing that system can achieve these goals. Results of the present work showed that the proposed method can increase both the product quality by 10 % and the customer satisfaction by 20 %.

Keywords: quality, management, software, information engineering

Procedia PDF Downloads 429
10420 Cosmetic Recommendation Approach Using Machine Learning

Authors: Shakila N. Senarath, Dinesh Asanka, Janaka Wijayanayake

Abstract:

The necessity of cosmetic products is arising to fulfill consumer needs of personality appearance and hygiene. A cosmetic product consists of various chemical ingredients which may help to keep the skin healthy or may lead to damages. Every chemical ingredient in a cosmetic product does not perform on every human. The most appropriate way to select a healthy cosmetic product is to identify the texture of the body first and select the most suitable product with safe ingredients. Therefore, the selection process of cosmetic products is complicated. Consumer surveys have shown most of the time, the selection process of cosmetic products is done in an improper way by consumers. From this study, a content-based system is suggested that recommends cosmetic products for the human factors. To such an extent, the skin type, gender and price range will be considered as human factors. The proposed system will be implemented by using Machine Learning. Consumer skin type, gender and price range will be taken as inputs to the system. The skin type of consumer will be derived by using the Baumann Skin Type Questionnaire, which is a value-based approach that includes several numbers of questions to derive the user’s skin type to one of the 16 skin types according to the Bauman Skin Type indicator (BSTI). Two datasets are collected for further research proceedings. The user data set was collected using a questionnaire given to the public. Those are the user dataset and the cosmetic dataset. Product details are included in the cosmetic dataset, which belongs to 5 different kinds of product categories (Moisturizer, Cleanser, Sun protector, Face Mask, Eye Cream). An alternate approach of TF-IDF (Term Frequency – Inverse Document Frequency) is applied to vectorize cosmetic ingredients in the generic cosmetic products dataset and user-preferred dataset. Using the IF-IPF vectors, each user-preferred products dataset and generic cosmetic products dataset can be represented as sparse vectors. The similarity between each user-preferred product and generic cosmetic product will be calculated using the cosine similarity method. For the recommendation process, a similarity matrix can be used. Higher the similarity, higher the match for consumer. Sorting a user column from similarity matrix in a descending order, the recommended products can be retrieved in ascending order. Even though results return a list of similar products, and since the user information has been gathered, such as gender and the price ranges for product purchasing, further optimization can be done by considering and giving weights for those parameters once after a set of recommended products for a user has been retrieved.

Keywords: content-based filtering, cosmetics, machine learning, recommendation system

Procedia PDF Downloads 128
10419 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 184
10418 An Enhanced Support Vector Machine Based Approach for Sentiment Classification of Arabic Tweets of Different Dialects

Authors: Gehad S. Kaseb, Mona F. Ahmed

Abstract:

Arabic Sentiment Analysis (SA) is one of the most common research fields with many open areas. Few studies apply SA to Arabic dialects. This paper proposes different pre-processing steps and a modified methodology to improve the accuracy using normal Support Vector Machine (SVM) classification. The paper works on two datasets, Arabic Sentiment Tweets Dataset (ASTD) and Extended Arabic Tweets Sentiment Dataset (Extended-AATSD), which are publicly available for academic use. The results show that the classification accuracy approaches 86%.

Keywords: Arabic, classification, sentiment analysis, tweets

Procedia PDF Downloads 140
10417 Using Machine Learning to Build a Real-Time COVID-19 Mask Safety Monitor

Authors: Yash Jain

Abstract:

The US Center for Disease Control has recommended wearing masks to slow the spread of the virus. The research uses a video feed from a camera to conduct real-time classifications of whether or not a human is correctly wearing a mask, incorrectly wearing a mask, or not wearing a mask at all. Utilizing two distinct datasets from the open-source website Kaggle, a mask detection network had been trained. The first dataset that was used to train the model was titled 'Face Mask Detection' on Kaggle, where the dataset was retrieved from and the second dataset was titled 'Face Mask Dataset, which provided the data in a (YOLO Format)' so that the TinyYoloV3 model could be trained. Based on the data from Kaggle, two machine learning models were implemented and trained: a Tiny YoloV3 Real-time model and a two-stage neural network classifier. The two-stage neural network classifier had a first step of identifying distinct faces within the image, and the second step was a classifier to detect the state of the mask on the face and whether it was worn correctly, incorrectly, or no mask at all. The TinyYoloV3 was used for the live feed as well as for a comparison standpoint against the previous two-stage classifier and was trained using the darknet neural network framework. The two-stage classifier attained a mean average precision (MAP) of 80%, while the model trained using TinyYoloV3 real-time detection had a mean average precision (MAP) of 59%. Overall, both models were able to correctly classify stages/scenarios of no mask, mask, and incorrectly worn masks.

Keywords: datasets, classifier, mask-detection, real-time, TinyYoloV3, two-stage neural network classifier

Procedia PDF Downloads 149
10416 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 412
10415 Discerning Divergent Nodes in Social Networks

Authors: Mehran Asadi, Afrand Agah

Abstract:

In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.

Keywords: online social networks, data mining, social cloud computing, interaction and collaboration

Procedia PDF Downloads 144
10414 The Affect of Total Quality Management on Firm's Innovation Performance: A Literature Review

Authors: Omer Akkaya, Nurullah Ekmekcı, Muammer Zerenler

Abstract:

Innovation for businesses means a new product and service and sometimes a new implementation. Total Quality Management is a management philosophy which focus on customer, process and system.There is a certain relationship between principles of Total Quality Management and innovation performance. Main aim of this study is to show how the implementation and principles of Total Quality Management (TQM) affect a firm's innovation performance. Also, this paper discusses positive and negative affects of Total Quality Management on innovation performance and demonstrates some examples.

Keywords: innovation, innovation types, total quality management, principles of total quality management

Procedia PDF Downloads 616
10413 The Contemporary Issues of Quality Management: Relationship between Total Quality Management and Knowledge Management

Authors: Mehrnoosh Askarizadeh

Abstract:

To meet the challenges of the new global environment, companies have started paying great attention towards quality management as an integral part of their strategic business plans. The purpose of this article is to investigate the relationship between total quality management (TQM) and knowledge management (KM). Successful total quality management implementation throughout the organizations requires major changes in the main four aspects of knowledge management, namely: Creating, storage, sharing and application. Skill, knowledge and productivity are important factors in organization’s success and have important role. Therefore, TQM management system pays special attention to it. However, knowledge as the source is essential for organization’s survival. Our study points out how the quality management and knowledge management have been incorporated into each other for the development of the quality culture within the organization.

Keywords: knowledge management (KM), total quality management (TQM), organizational performance (OP), deming cycle

Procedia PDF Downloads 471
10412 The Quality Health Services and Patient Satisfaction in Hospital

Authors: Nadia Fatima Zahra Malki

Abstract:

Quality is one of the most important modern management patterns that organizations seek to achieve in all areas and sectors in order to meet the needs and desires of customers and to remain and continuity, as they constitute a competitive advantage for the organization. and among the most prominent organizations that must be available on the quality factor are health organizations as they relate to the most valuable component of production. It is a person, and his health, and any error in it threatens his life and may lead to death, so she must provide health services of high quality to achieve the highest degree of satisfaction for the patient. This research aims to study the quality of health services and the extent of their impact on patient satisfaction, and this is through an applied study that relied on measuring the level of quality of health services in the university hospital center of Algeria and the extent of their impact on patient satisfaction according to the dimensions of the quality of health services, and we reached a conclusion that the determinants of the quality of health services It affects patient satisfaction, which necessitates developing health services according to patients' requirements and improving their quality to obtain patient satisfaction.

Keywords: health service, health quality, quality determinants, patient satisfaction

Procedia PDF Downloads 42
10411 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: classification algorithms, data mining, knowledge discovery, tourism

Procedia PDF Downloads 286
10410 Implementation of Total Quality Management in Public Sector: Case of Tunisia

Authors: Rafla Hchaichi

Abstract:

The public administration is currently experiencing in the field of quality unprecedented effervescence. However, in a globalized world more and more competitive, public services are confronted with the need to improve their performances which push public companies to implement quality approaches. Quality approaches have taken diverse forms such as service commitment, labels, certifications and the Common Assessment Framework. This paper provides an overview on the strategy for administrative development in Tunisia since the Carthaginian civilization until today. It outlines the evolution of quality management in the Tunisian public context while focusing on the National Referential of Quality of Administrative Services.

Keywords: quality approach, the common assessment framework, service commitment, label, certification, quality of public service, performance of public service, Tunisian Public Service

Procedia PDF Downloads 540
10409 Comparison of Deep Convolutional Neural Networks Models for Plant Disease Identification

Authors: Megha Gupta, Nupur Prakash

Abstract:

Identification of plant diseases has been performed using machine learning and deep learning models on the datasets containing images of healthy and diseased plant leaves. The current study carries out an evaluation of some of the deep learning models based on convolutional neural network (CNN) architectures for identification of plant diseases. For this purpose, the publicly available New Plant Diseases Dataset, an augmented version of PlantVillage dataset, available on Kaggle platform, containing 87,900 images has been used. The dataset contained images of 26 diseases of 14 different plants and images of 12 healthy plants. The CNN models selected for the study presented in this paper are AlexNet, ZFNet, VGGNet (four models), GoogLeNet, and ResNet (three models). The selected models are trained using PyTorch, an open-source machine learning library, on Google Colaboratory. A comparative study has been carried out to analyze the high degree of accuracy achieved using these models. The highest test accuracy and F1-score of 99.59% and 0.996, respectively, were achieved by using GoogLeNet with Mini-batch momentum based gradient descent learning algorithm.

Keywords: comparative analysis, convolutional neural networks, deep learning, plant disease identification

Procedia PDF Downloads 189
10408 K-Means Clustering-Based Infinite Feature Selection Method

Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Abstract:

Infinite Feature Selection (IFS) algorithm is an efficient feature selection algorithm that selects a subset of features of all sizes (including infinity). In this paper, we present an improved version of it, called clustering IFS (CIFS), by clustering the dataset in advance. To do so, first, we apply the K-means algorithm to cluster the dataset, then we apply IFS. In the CIFS method, the spatial and temporal complexities are reduced compared to the IFS method. Experimental results on 6 datasets show the superiority of CIFS compared to IFS in terms of accuracy, running time, and memory consumption.

Keywords: feature selection, infinite feature selection, clustering, graph

Procedia PDF Downloads 116
10407 Evalutaion of the Surface Water Quality Using the Water Quality Index and Discriminant Analysis Method

Authors: Lazhar Belkhiri, Ammar Tiri, Lotfi Mouni

Abstract:

Water resources present to the public order of the world a very important problem for the protection and management of water quality given the complexity of water quality data sets. In this study, the water quality index (WQI) and irrigation water quality index (IWQI) were calculated in order to evaluate the surface water quality for drinking and irrigation purposes based on nine hydrochemical parameters. In order to separate the variables that are the most responsible for the spatial differentiation, the discriminant analysis (DA) was applied. The results show that the surface water quality for drinking is poor quality and very poor quality based on WQI values, however, the values of IWQI reflect that this water is acceptable for irrigation with a restriction for sensitive plants. Consequently, the discriminant analysis DA method has shown that the following parameters pH, potassium, chloride, sulfate, and bicarbonate are significant discrimination between the different stations with the spatial variation of the surface water quality, therefore, the results obtained in this study provide very useful information to decision-makers

Keywords: surface water quality, drinking and irrigation purposes, water quality index, discriminant analysis

Procedia PDF Downloads 77