Search results for: label annotation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 343

Search results for: label annotation

283 Privacy Label: An Alternative Approach to Present Privacy Policies from Online Services to the User

Authors: Diego Roberto Goncalves De Pontes, Sergio Donizetti Zorzo

Abstract:

Studies show that most users do not read privacy policies from the online services they use. Some authors claim that one of the main causes of this is that policies are long and usually hard to understand, which make users lose interest in reading them. In this scenario, users may agree with terms without knowing what kind of data is being collected and why. Given that, we aimed to develop a model that would present the privacy policies contents in an easy and graphical way for the user to understand. We call it the Privacy Label. Using information recovery techniques, we propose an architecture that is able to extract information about what kind of data is being collected and to what end in the policies and show it to the user in an automated way. To assess our model, we calculated the precision, recall and f-measure metrics on the information extracted by our technique. The results for each metric were 68.53%, 85.61% e 76,13%, respectively, making it possible for the final user to understand which data was being collected without reading the whole policy. Also, our proposal can facilitate the notice-and-choice by presenting privacy policy information in an alternative way for online users.

Keywords: privacy, policies, user behavior, computer human interaction

Procedia PDF Downloads 303
282 PaSA: A Dataset for Patent Sentiment Analysis to Highlight Patent Paragraphs

Authors: Renukswamy Chikkamath, Vishvapalsinhji Ramsinh Parmar, Christoph Hewel, Markus Endres

Abstract:

Given a patent document, identifying distinct semantic annotations is an interesting research aspect. Text annotation helps the patent practitioners such as examiners and patent attorneys to quickly identify the key arguments of any invention, successively providing a timely marking of a patent text. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This semantic annotation process is laborious and time-consuming. To alleviate such a problem, we proposed a dataset to train machine learning algorithms to automate the highlighting process. The contributions of this work are: i) we developed a multi-class dataset of size 150k samples by traversing USPTO patents over a decade, ii) articulated statistics and distributions of data using imperative exploratory data analysis, iii) baseline Machine Learning models are developed to utilize the dataset to address patent paragraph highlighting task, and iv) future path to extend this work using Deep Learning and domain-specific pre-trained language models to develop a tool to highlight is provided. This work assists patent practitioners in highlighting semantic information automatically and aids in creating a sustainable and efficient patent analysis using the aptitude of machine learning.

Keywords: machine learning, patents, patent sentiment analysis, patent information retrieval

Procedia PDF Downloads 87
281 Domain-Specific Deep Neural Network Model for Classification of Abnormalities on Chest Radiographs

Authors: Nkechinyere Joy Olawuyi, Babajide Samuel Afolabi, Bola Ibitoye

Abstract:

This study collected a preprocessed dataset of chest radiographs and formulated a deep neural network model for detecting abnormalities. It also evaluated the performance of the formulated model and implemented a prototype of the formulated model. This was with the view to developing a deep neural network model to automatically classify abnormalities in chest radiographs. In order to achieve the overall purpose of this research, a large set of chest x-ray images were sourced for and collected from the CheXpert dataset, which is an online repository of annotated chest radiographs compiled by the Machine Learning Research Group, Stanford University. The chest radiographs were preprocessed into a format that can be fed into a deep neural network. The preprocessing techniques used were standardization and normalization. The classification problem was formulated as a multi-label binary classification model, which used convolutional neural network architecture to make a decision on whether an abnormality was present or not in the chest radiographs. The classification model was evaluated using specificity, sensitivity, and Area Under Curve (AUC) score as the parameter. A prototype of the classification model was implemented using Keras Open source deep learning framework in Python Programming Language. The AUC ROC curve of the model was able to classify Atelestasis, Support devices, Pleural effusion, Pneumonia, A normal CXR (no finding), Pneumothorax, and Consolidation. However, Lung opacity and Cardiomegaly had a probability of less than 0.5 and thus were classified as absent. Precision, recall, and F1 score values were 0.78; this implies that the number of False Positive and False Negative is the same, revealing some measure of label imbalance in the dataset. The study concluded that the developed model is sufficient to classify abnormalities present in chest radiographs into present or absent.

Keywords: transfer learning, convolutional neural network, radiograph, classification, multi-label

Procedia PDF Downloads 124
280 A Genetic Algorithm Based Ensemble Method with Pairwise Consensus Score on Malware Cacophonous Labels

Authors: Shih-Yu Wang, Shun-Wen Hsiao

Abstract:

In the field of cybersecurity, there exists many vendors giving malware samples classified results, namely naming after the label that contains some important information which is also called AV label. Lots of researchers relay on AV labels for research. Unfortunately, AV labels are too cluttered. They do not have a fixed format and fixed naming rules because the naming results were based on each classifiers' viewpoints. A way to fix the problem is taking a majority vote. However, voting can sometimes create problems of bias. Thus, we create a novel ensemble approach which does not rely on the cacophonous naming result but depend on group identification to aggregate everyone's opinion. To achieve this purpose, we develop an scoring system called Pairwise Consensus Score (PCS) to calculate result similarity. The entire method architecture combine Genetic Algorithm and PCS to find maximum consensus in the group. Experimental results revealed that our method outperformed the majority voting by 10% in term of the score.

Keywords: genetic algorithm, ensemble learning, malware family, malware labeling, AV labels

Procedia PDF Downloads 84
279 Automatic Reporting System for Transcriptome Indel Identification and Annotation Based on Snapshot of Next-Generation Sequencing Reads Alignment

Authors: Shuo Mu, Guangzhi Jiang, Jinsa Chen

Abstract:

The analysis of Indel for RNA sequencing of clinical samples is easily affected by sequencing experiment errors and software selection. In order to improve the efficiency and accuracy of analysis, we developed an automatic reporting system for Indel recognition and annotation based on image snapshot of transcriptome reads alignment. This system includes sequence local-assembly and realignment, target point snapshot, and image-based recognition processes. We integrated high-confidence Indel dataset from several known databases as a training set to improve the accuracy of image processing and added a bioinformatical processing module to annotate and filter Indel artifacts. Subsequently, the system will automatically generate data, including data quality levels and images results report. Sanger sequencing verification of the reference Indel mutation of cell line NA12878 showed that the process can achieve 83% sensitivity and 96% specificity. Analysis of the collected clinical samples showed that the interpretation accuracy of the process was equivalent to that of manual inspection, and the processing efficiency showed a significant improvement. This work shows the feasibility of accurate Indel analysis of clinical next-generation sequencing (NGS) transcriptome. This result may be useful for RNA study for clinical samples with microsatellite instability in immunotherapy in the future.

Keywords: automatic reporting, indel, next-generation sequencing, NGS, transcriptome

Procedia PDF Downloads 190
278 TARF: Web Toolkit for Annotating RNA-Related Genomic Features

Authors: Jialin Ma, Jia Meng

Abstract:

Genomic features, the genome-based coordinates, are commonly used for the representation of biological features such as genes, RNA transcripts and transcription factor binding sites. For the analysis of RNA-related genomic features, such as RNA modification sites, a common task is to correlate these features with transcript components (5'UTR, CDS, 3'UTR) to explore their distribution characteristics in terms of transcriptomic coordinates, e.g., to examine whether a specific type of biological feature is enriched near transcription start sites. Existing approaches for performing these tasks involve the manipulation of a gene database, conversion from genome-based coordinate to transcript-based coordinate, and visualization methods that are capable of showing RNA transcript components and distribution of the features. These steps are complicated and time consuming, and this is especially true for researchers who are not familiar with relevant tools. To overcome this obstacle, we develop a dedicated web app TARF, which represents web toolkit for annotating RNA-related genomic features. TARF web tool intends to provide a web-based way to easily annotate and visualize RNA-related genomic features. Once a user has uploaded the features with BED format and specified a built-in transcript database or uploaded a customized gene database with GTF format, the tool could fulfill its three main functions. First, it adds annotation on gene and RNA transcript components. For every features provided by the user, the overlapping with RNA transcript components are identified, and the information is combined in one table which is available for copy and download. Summary statistics about ambiguous belongings are also carried out. Second, the tool provides a convenient visualization method of the features on single gene/transcript level. For the selected gene, the tool shows the features with gene model on genome-based view, and also maps the features to transcript-based coordinate and show the distribution against one single spliced RNA transcript. Third, a global transcriptomic view of the genomic features is generated utilizing the Guitar R/Bioconductor package. The distribution of features on RNA transcripts are normalized with respect to RNA transcript landmarks and the enrichment of the features on different RNA transcript components is demonstrated. We tested the newly developed TARF toolkit with 3 different types of genomics features related to chromatin H3K4me3, RNA N6-methyladenosine (m6A) and RNA 5-methylcytosine (m5C), which are obtained from ChIP-Seq, MeRIP-Seq and RNA BS-Seq data, respectively. TARF successfully revealed their respective distribution characteristics, i.e. H3K4me3, m6A and m5C are enriched near transcription starting sites, stop codons and 5’UTRs, respectively. Overall, TARF is a useful web toolkit for annotation and visualization of RNA-related genomic features, and should help simplify the analysis of various RNA-related genomic features, especially those related RNA modifications.

Keywords: RNA-related genomic features, annotation, visualization, web server

Procedia PDF Downloads 204
277 Neighborhood Graph-Optimized Preserving Discriminant Analysis for Image Feature Extraction

Authors: Xiaoheng Tan, Xianfang Li, Tan Guo, Yuchuan Liu, Zhijun Yang, Hongye Li, Kai Fu, Yufang Wu, Heling Gong

Abstract:

The image data collected in reality often have high dimensions, and it contains noise and redundant information. Therefore, it is necessary to extract the compact feature expression of the original perceived image. In this process, effective use of prior knowledge such as data structure distribution and sample label is the key to enhance image feature discrimination and robustness. Based on the above considerations, this paper proposes a local preserving discriminant feature learning model based on graph optimization. The model has the following characteristics: (1) Locality preserving constraint can effectively excavate and preserve the local structural relationship between data. (2) The flexibility of graph learning can be improved by constructing a new local geometric structure graph using label information and the nearest neighbor threshold. (3) The L₂,₁ norm is used to redefine LDA, and the diagonal matrix is introduced as the scale factor of LDA, and the samples are selected, which improves the robustness of feature learning. The validity and robustness of the proposed algorithm are verified by experiments in two public image datasets.

Keywords: feature extraction, graph optimization local preserving projection, linear discriminant analysis, L₂, ₁ norm

Procedia PDF Downloads 147
276 3D Vision Transformer for Cervical Spine Fracture Detection and Classification

Authors: Obulesh Avuku, Satwik Sunnam, Sri Charan Mohan Janthuka, Keerthi Yalamaddi

Abstract:

In the United States alone, there are over 1.5 million spine fractures per year, resulting in about 17,730 spinal cord injuries. The cervical spine is where fractures in the spine most frequently occur. The prevalence of spinal fractures in the elderly has increased, and in this population, fractures may be harder to see on imaging because of coexisting degenerative illness and osteoporosis. Nowadays, computed tomography (CT) is almost completely used instead of radiography for the imaging diagnosis of adult spine fractures (x-rays). To stop neurologic degeneration and paralysis following trauma, it is vital to trace any vertebral fractures at the earliest. Many approaches have been proposed for the classification of the cervical spine [2d models]. We are here in this paper trying to break the bounds and use the vision transformers, a State-Of-The-Art- Model in image classification, by making minimal changes possible to the architecture of ViT and making it 3D-enabled architecture and this is evaluated using a weighted multi-label logarithmic loss. We have taken this problem statement from a previously held Kaggle competition, i.e., RSNA 2022 Cervical Spine Fracture Detection.

Keywords: cervical spine, spinal fractures, osteoporosis, computed tomography, 2d-models, ViT, multi-label logarithmic loss, Kaggle, public score, private score

Procedia PDF Downloads 113
275 Mobi-DiQ: A Pervasive Sensing System for Delirium Risk Assessment in Intensive Care Unit

Authors: Subhash Nerella, Ziyuan Guan, Azra Bihorac, Parisa Rashidi

Abstract:

Intensive care units (ICUs) provide care to critically ill patients in severe and life-threatening conditions. However, patient monitoring in the ICU is limited by the time and resource constraints imposed on healthcare providers. Many critical care indices such as mobility are still manually assessed, which can be subjective, prone to human errors, and lack granularity. Other important aspects, such as environmental factors, are not monitored at all. For example, critically ill patients often experience circadian disruptions due to the absence of effective environmental “timekeepers” such as the light/dark cycle and the systemic effect of acute illness on chronobiologic markers. Although the occurrence of delirium is associated with circadian disruption risk factors, these factors are not routinely monitored in the ICU. Hence, there is a critical unmet need to develop systems for precise and real-time assessment through novel enabling technologies. We have developed the mobility and circadian disruption quantification system (Mobi-DiQ) by augmenting biomarker and clinical data with pervasive sensing data to generate mobility and circadian cues related to mobility, nightly disruptions, and light and noise exposure. We hypothesize that Mobi-DiQ can provide accurate mobility and circadian cues that correlate with bedside clinical mobility assessments and circadian biomarkers, ultimately important for delirium risk assessment and prevention. The collected multimodal dataset consists of depth images, Electromyography (EMG) data, patient extremity movement captured by accelerometers, ambient light levels, Sound Pressure Level (SPL), and indoor air quality measured by volatile organic compounds, and the equivalent CO₂ concentration. For delirium risk assessment, the system recognizes mobility cues (axial body movement features and body key points) and circadian cues, including nightly disruptions, ambient SPL, and light intensity, as well as other environmental factors such as indoor air quality. The Mobi-DiQ system consists of three major components: the pervasive sensing system, a data storage and analysis server, and a data annotation system. For data collection, six local pervasive sensing systems were deployed, including a local computer and sensors. A video recording tool with graphical user interface (GUI) developed in python was used to capture depth image frames for analyzing patient mobility. All sensor data is encrypted, then automatically uploaded to the Mobi-DiQ server through a secured VPN connection. Several data pipelines are developed to automate the data transfer, curation, and data preparation for annotation and model training. The data curation and post-processing are performed on the server. A custom secure annotation tool with GUI was developed to annotate depth activity data. The annotation tool is linked to the MongoDB database to record the data annotation and to provide summarization. Docker containers are also utilized to manage services and pipelines running on the server in an isolated manner. The processed clinical data and annotations are used to train and develop real-time pervasive sensing systems to augment clinical decision-making and promote targeted interventions. In the future, we intend to evaluate our system as a clinical implementation trial, as well as to refine and validate it by using other data sources, including neurological data obtained through continuous electroencephalography (EEG).

Keywords: deep learning, delirium, healthcare, pervasive sensing

Procedia PDF Downloads 91
274 Real Time Traffic Performance Study over MPLS VPNs with DiffServ

Authors: Naveed Ghani

Abstract:

With the arrival of higher speed communication links and mature application running over the internet, the requirement for reliable, efficient and robust network designs rising day by day. Multi-Protocol Label Switching technology (MPLS) Virtual Private Networks (VPNs) have committed to provide optimal network services. They are gaining popularity in industry day by day. Enterprise customers are moving to service providers that offer MPLS VPNs. The main reason for this shifting is the capability of MPLS VPN to provide built in security features and any-to-any connectivity. MPLS VPNs improved the network performance due to fast label switching as compare to traditional IP Forwarding but traffic classification and policing was still required on per hop basis to enhance the performance of real time traffic which is delay sensitive (particularly voice and video). QoS (Quality of service) is the most important factor to prioritize enterprise networks’ real time traffic such as voice and video. This thesis is focused on the study of QoS parameters (e.g. delay, jitter and MOS (Mean Opinion Score)) for the real time traffic over MPLS VPNs. DiffServ (Differentiated Services) QoS model will be used over MPLS VPN network to get end-to-end service quality.

Keywords: network, MPLS, VPN, DiffServ, MPLS VPN, DiffServ QoS, QoS Model, GNS2

Procedia PDF Downloads 426
273 3D Label-Free Bioimaging of Native Tissue with Selective Plane Illumination Optical Microscopy

Authors: Jing Zhang, Yvonne Reinwald, Nick Poulson, Alicia El Haj, Chung See, Mike Somekh, Melissa Mather

Abstract:

Biomedical imaging of native tissue using light offers the potential to obtain excellent structural and functional information in a non-invasive manner with good temporal resolution. Image contrast can be derived from intrinsic absorption, fluorescence, or scatter, or through the use of extrinsic contrast. A major challenge in applying optical microscopy to in vivo tissue imaging is the effects of light attenuation which limits light penetration depth and achievable imaging resolution. Recently Selective Plane Illumination Microscopy (SPIM) has been used to map the 3D distribution of fluorophores dispersed in biological structures. In this approach, a focused sheet of light is used to illuminate the sample from the side to excite fluorophores within the sample of interest. Images are formed based on detection of fluorescence emission orthogonal to the illumination axis. By scanning the sample along the detection axis and acquiring a stack of images, 3D volumes can be obtained. The combination of rapid image acquisition speeds with the low photon dose to samples optical sectioning provides SPIM is an attractive approach for imaging biological samples in 3D. To date all implementations of SPIM rely on the use of fluorescence reporters be that endogenous or exogenous. This approach has the disadvantage that in the case of exogenous probes the specimens are altered from their native stage rendering them unsuitable for in vivo studies and in general fluorescence emission is weak and transient. Here we present for the first time to our knowledge a label-free implementation of SPIM that has downstream applications in the clinical setting. The experimental set up used in this work incorporates both label-free and fluorescent illumination arms in addition to a high specification camera that can be partitioned for simultaneous imaging of both fluorescent emission and scattered light from intrinsic sources of optical contrast in the sample being studied. This work first involved calibration of the imaging system and validation of the label-free method with well characterised fluorescent microbeads embedded in agarose gel. 3D constructs of mammalian cells cultured in agarose gel with varying cell concentrations were then imaged. A time course study to track cell proliferation in the 3D construct was also carried out and finally a native tissue sample was imaged. For each sample multiple images were obtained by scanning the sample along the axis of detection and 3D maps reconstructed. The results obtained validated label-free SPIM as a viable approach for imaging cells in a 3D gel construct and native tissue. This technique has the potential use in a near-patient environment that can provide results quickly and be implemented in an easy to use manner to provide more information with improved spatial resolution and depth penetration than current approaches.

Keywords: bioimaging, optics, selective plane illumination microscopy, tissue imaging

Procedia PDF Downloads 246
272 Designing and Implementation of MPLS Based VPN

Authors: Muhammad Kamran Asif

Abstract:

MPLS stands for Multi-Protocol Label Switching. It is the technology which replaces ATM (Asynchronous Transfer Mode) and frame relay. In this paper, we have designed a full fledge small scale MPLS based service provider network core network model, which provides communication services (e.g. voice, video and data) to the customer more efficiently using label switching technique. Using MPLS VPN provides security to the customers which are either on LAN or WAN. It protects its single customer sites from being attacked by any intruder from outside world along with the provision of concept of extension of a private network over an internet. In this paper, we tried to implement a service provider network using minimum available resources i.e. five 3800 series CISCO routers comprises of service provider core, provider edge routers and customer edge routers. The customers on the one end of the network (customer side) is capable of sending any kind of data to the customers at the other end using service provider cloud which is MPLS VPN enabled. We have also done simulation and emulation for the model using GNS3 (Graphical Network Simulator-3) and achieved the real time scenarios. We have also deployed a NMS system which monitors our service provider cloud and generates alarm in case of any intrusion or malfunctioning in the network. Moreover, we have also provided a video help desk facility between customers and service provider cloud to resolve the network issues more effectively.

Keywords: MPLS, VPN, NMS, ATM, asynchronous transfer mode

Procedia PDF Downloads 330
271 Electrochemical Impedance Spectroscopy Based Label-Free Detection of TSG101 by Electric Field Lysis of Immobilized Exosomes from Human Serum

Authors: Nusrat Praween, Krishna Thej Pammi Guru, Palash Kumar Basu

Abstract:

Designing non-invasive biosensors for cancer diagnosis is essential for developing an affordable and specific tool to measure cancer-related exosome biomarkers. Exosomes, released by healthy as well as cancer cells, contain valuable information about the biomarkers of various diseases, including cancer. Despite the availability of various isolation techniques, ultracentrifugation is the standard technique that is being employed. Post isolation, exosomes are traditionally exposed to detergents for extracting their proteins, which can often lead to protein degradation. Further to this, it is very essential to develop a sensing platform for the quantification of clinically relevant proteins in a wider range to ensure practicality. In this study, exosomes were immobilized on the Au Screen Printed Electrode (SPE) using EDC/NHS chemistry to facilitate binding. After immobilizing the exosomes on the screen-printed electrode (SPE), we investigated the impact of the electric field by applying various voltages to induce exosome lysis and release their contents. The lysed solution was used for sensing TSG101, a crucial biomarker associated with various cancers, using both faradaic and non-faradaic electrochemical impedance spectroscopy (EIS) methods. The results of non-faradaic and faradaic EIS were comparable and showed good consistency, indicating that non-faradaic sensing can be a reliable alternative. Hence, the non-faradaic sensing technique was used for label-free quantification of the TSG101 biomarker. The results were validated using ELISA. Our electrochemical immunosensor demonstrated a consistent response of TSG101 from 125 pg/mL to 8000 pg/mL, with a detection limit of 0.125 pg/mL at room temperature. Additionally, since non-faradic sensing is label-free, the ease of usage and cost of the final sensor developed can be reduced. The proposed immunosensor is capable of detecting the TSG101 protein at low levels in healthy serum with good sensitivity and specificity, making it a promising platform for biomarker detection.

Keywords: biosensor, exosomes isolation on SPE, electric field lysis of exosome, EIS sensing of TSG101

Procedia PDF Downloads 44
270 A Theoretical Modelling and Simulation of a Surface Plasmon Resonance Biosensor for the Detection of Glucose Concentration in Blood and Urine

Authors: Natasha Mandal, Rakesh Singh Moirangthem

Abstract:

The present work reports a theoretical model to develop a plasmonic biosensor for the detection of glucose concentrations in human blood and urine as the abnormality of glucose label is the major cause of diabetes which becomes a life-threatening disease worldwide. This study is based on the surface plasmon resonance (SPR) sensor applications which is a well-established, highly sensitive, label-free, rapid optical sensing tool. Here we have introduced a sandwich assay of two dielectric spacer layers of MgF2 and BaTiO3which gives better performance compared to commonly used SiO2 and TiO2 dielectric spacers due to their low dielectric loss and higher refractive index. The sensitivity of our proposed sensor was found as 3242 nm/RIU approximately, with an excellent linear response of 0.958, which is higher than the conventional single-layer Au SPR sensor. Further, the sensitivity enhancement is also optimized by coating a few layers of two-dimensional (2D) nanomaterials (e.g., Graphene, h-BN, MXene, MoS2, WS2, etc.) on the sensor chip. Hence, our proposed SPR sensor has the potential for the detection of glucose concentration in blood and urine with enhanced sensitivity and high affinity and could be utilized as a reliable platform for the optical biosensing application in the field of medical diagnosis.

Keywords: biosensor, surface plasmon resonance, dielectric spacer, 2D nanomaterials

Procedia PDF Downloads 104
269 Multi-Label Approach to Facilitate Test Automation Based on Historical Data

Authors: Warda Khan, Remo Lachmann, Adarsh S. Garakahally

Abstract:

The increasing complexity of software and its applicability in a wide range of industries, e.g., automotive, call for enhanced quality assurance techniques. Test automation is one option to tackle the prevailing challenges by supporting test engineers with fast, parallel, and repetitive test executions. A high degree of test automation allows for a shift from mundane (manual) testing tasks to a more analytical assessment of the software under test. However, a high initial investment of test resources is required to establish test automation, which is, in most cases, a limitation to the time constraints provided for quality assurance of complex software systems. Hence, a computer-aided creation of automated test cases is crucial to increase the benefit of test automation. This paper proposes the application of machine learning for the generation of automated test cases. It is based on supervised learning to analyze test specifications and existing test implementations. The analysis facilitates the identification of patterns between test steps and their implementation with test automation components. For the test case generation, this approach exploits historical data of test automation projects. The identified patterns are the foundation to predict the implementation of unknown test case specifications. Based on this support, a test engineer solely has to review and parameterize the test automation components instead of writing them manually, resulting in a significant time reduction for establishing test automation. Compared to other generation approaches, this ML-based solution can handle different writing styles, authors, application domains, and even languages. Furthermore, test automation tools require expert knowledge by means of programming skills, whereas this approach only requires historical data to generate test cases. The proposed solution is evaluated using various multi-label evaluation criteria (EC) and two small-sized real-world systems. The most prominent EC is ‘Subset Accuracy’. The promising results show an accuracy of at least 86% for test cases, where a 1:1 relationship (Multi-Class) between test step specification and test automation component exists. For complex multi-label problems, i.e., one test step can be implemented by several components, the prediction accuracy is still at 60%. It is better than the current state-of-the-art results. It is expected the prediction quality to increase for larger systems with respective historical data. Consequently, this technique facilitates the time reduction for establishing test automation and is thereby independent of the application domain and project. As a work in progress, the next steps are to investigate incremental and active learning as additions to increase the usability of this approach, e.g., in case labelled historical data is scarce.

Keywords: machine learning, multi-class, multi-label, supervised learning, test automation

Procedia PDF Downloads 131
268 Chinese “Wolf Warrior” Diplomacy And Foreign Public Opinion

Authors: Chaohong Pan

Abstract:

Through public diplomacy on social media, governments have attempted to influence foreign public opinion. What is the impact of digital public diplomacy? Public diplomacy research often relies on content analysis to study the strategies employed by communicators but has rarely examined its actual impact on the audience. In addition, we do not know if giving a communicator an explicit label, as Twitter does with “government account”, would change the effects of the messages. Can the government label reduce the percussiveness of public diplomacy messages by sending a warning signal? Using a 2 × 2 survey experiment, the present paper contributes to the study of public diplomacy by randomly exposing American participants to four types of tweets from Chinese diplomats. The stimulus materials vary in terms of the tweets’ content (“positive-china” vs. “negative-US) and Twitter government labels (with vs. without the labels). I found that positive tweets about China have a significant positive effect on Americans’ attitudes toward China, whereas negative tweets about the US have little effect on their opinions. Furthermore, positive-China tweets are effective only on China-related issues, which indicates that Chinese diplomats’ tweets have limited effects on shaping a foreign audience’s attitudes toward their own country. Lastly, I find that labels largely have no impact on a diplomatic tweet’s effect. These results contribute to our understanding of the effects of public diplomacy in the digital age.

Keywords: public diplomacy, china, foreign public opinion, twitter

Procedia PDF Downloads 191
267 Screening of Thyroid Stimulating Hormone Using Paper-Based Lateral Flow Device

Authors: Pattarachaya Preechakasedkit, Kota Osada, Koji Suzuki, Daniel Citterio, Orawon Chailapakul

Abstract:

A paper-based lateral flow device for screening thyroid stimulating hormone (TSH) is reported. A sandwich immunoassay was performed using two mouse monoclonal TSH antibodies (anti-hTSH 5403 and 5404) as immobilized and labeled antibodies for capturing TSH samples. Test (anti-hTSH 5403) and control (goat anti-Mouse IgG) lines were fabricated on nitrocellulose membrane (NCM) using ballpoint pen printed with a speed of 3 cm/s and thickness setting of 1. The novel gold nanoparticles europium complex (AuNPs@Eu) was used as fluorescence label compared to conventional AuNPs label. The results obtained with this device can be visually assessed by the naked eyes and under UV hand lamps, and quantitative analysis can be performed using the ImageJ program. The limit of detection (LOD) under UV hand lamps (0.1 µIU/mL) provided 50-fold greater sensitivity than AuNPs (5 µIU/mL), which is suitable for both hypothyroidism and hyperthyroidism screening within 30 min. A linear relationship between the red intensity and the logarithmic concentrations of TSH was observed with a good correlation (R²=0.992). Furthermore, the device can be effectively applied for screening TSH in the spiked human serum with recovery range of 96.80-104.45% and RSD of 2.18-3.63%. Therefore, the developed device is an alternative method for TSH screening which provides a lot of advantages including low cost, short time analysis, ease of use, disposability, portability, and on-site measurement.

Keywords: thyroid stimulating hormone, paper-based lateral flow, hypothyroidism, hyperthyroidism

Procedia PDF Downloads 361
266 Agri-Food Transparency and Traceability: A Marketing Tool to Satisfy Consumer Awareness Needs

Authors: Angelo Corallo, Maria Elena Latino, Marta Menegoli

Abstract:

The link between man and food plays, in the social and economic system, a central role where cultural and multidisciplinary aspects intertwine: food is not only nutrition, but also communication, culture, politics, environment, science, ethics, fashion. This multi-dimensionality has many implications in the food economy. In recent years, the consumer became more conscious about his food choices, involving a consistent change in consumption models. This change concerns several aspects: awareness of food system issues, employment of socially and environmentally conscious decision-making, food choices based on different characteristics than nutritional ones i.e. origin of food, how it’s produced, and who’s producing it. In this frame the ‘consumption choices’ and the ‘interests of the citizen’ become one part of the others. The figure of the ‘Citizen Consumer’ is born, a responsible and ethically motivated individual to change his lifestyle, achieving the goal of sustainable consumption. Simultaneously the branding, that before was guarantee of the product quality, today is questioned. In order to meet these needs, Agri-Food companies are developing specific product lines that follow two main philosophies: ‘Back to basics’ and ‘Less is more’. However, the issue of ethical behavior does not seem to find an adequate on market offer. Most likely due to a lack of attention on the communication strategy used, very often based on market logic and rarely on ethical one. The label in its classic concept of ‘clean labeling’ can no longer be the only instrument through which to convey product information and its evolution towards a concept of ‘clear label’ is necessary to embrace ethical and transparent concepts in progress the process of democratization of the Food System. The implementation of a voluntary traceability path, relying on the technological models of the Internet of Things or Industry 4.0, would enable the Agri-Food Supply Chain to collect data that, if properly treated, could satisfy the information need of consumers. A change of approach is therefore proposed towards Agri-Food traceability that is no longer intended as a tool to be used to respond to the legislator, but rather as a promotional tool useful to tell the company in a transparent manner and then reach the slice of the market of food citizens. The use of mobile technology can also facilitate this information transfer. However, in order to guarantee maximum efficiency, an appropriate communication model based on the ethical communication principles should be used, which aims to overcome the pipeline communication model, to offer the listener a new way of telling the food product, based on real data collected through processes traceability. The Citizen Consumer is therefore placed at the center of the new model of communication in which he has the opportunity to choose what to know and how. The new label creates a virtual access point capable of telling the product according to different point of views, following the personal interests and offering the possibility to give several content modalities to support different situations and usability.

Keywords: agri food traceability, agri-food transparency, clear label, food system, internet of things

Procedia PDF Downloads 156
265 Brand Identity Creation for Thai Halal Brands

Authors: Pibool Waijittragum

Abstract:

The purpose of this paper is to synthesize the research result of brand Identities of Thai Halal brands which related to the way of life for Thai Muslims. The results will be transforming to Thai Halal Brands packaging and label design. The expected benefit is an alternative of marketing strategy for brand building process for Halal products in Thailand. Four elements of marketing strategies which necessary for the brand identity creation is the research framework: consists of Attributes, Benefits, Values and Personality. The research methodology was applied using qualitative and quantitative; 19 marketing experts with dynamic roles in Thai consumer products were interviewed. In addition, a field survey of 122 Thai Muslims selected from 175 Muslim communities in Bangkok was studied. Data analysis will be according to 5 categories of Thai Halal product: 1) Meat 2) Vegetable and Fruits 3) Instant foods and Garnishing ingredient 4) Beverages, Desserts and Snacks 5) Hygienic daily products. The results will explain some suitable approach for brand Identities of Thai Halal brands as are: 1) Benefit approach as the characteristics of the product with its benefit. The brand identity created transform to the packaging design should be clear and display a fresh product 2) Value approach as the value of products that affect to consumers’ perception. The brand identity created transform to the packaging design should be simply look and using a trustful image 3) Personality approach as the reflection of consumers thought. The brand identity created transform to the packaging design should be sincere, enjoyable, merry, flamboyant look and using a humoristic image.

Keywords: marketing strategies, brand identity, packaging and label design, Thai Halal products

Procedia PDF Downloads 436
264 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 628
263 A Method for Clinical Concept Extraction from Medical Text

Authors: Moshe Wasserblat, Jonathan Mamou, Oren Pereg

Abstract:

Natural Language Processing (NLP) has made a major leap in the last few years, in practical integration into medical solutions; for example, extracting clinical concepts from medical texts such as medical condition, medication, treatment, and symptoms. However, training and deploying those models in real environments still demands a large amount of annotated data and NLP/Machine Learning (ML) expertise, which makes this process costly and time-consuming. We present a practical and efficient method for clinical concept extraction that does not require costly labeled data nor ML expertise. The method includes three steps: Step 1- the user injects a large in-domain text corpus (e.g., PubMed). Then, the system builds a contextual model containing vector representations of concepts in the corpus, in an unsupervised manner (e.g., Phrase2Vec). Step 2- the user provides a seed set of terms representing a specific medical concept (e.g., for the concept of the symptoms, the user may provide: ‘dry mouth,’ ‘itchy skin,’ and ‘blurred vision’). Then, the system matches the seed set against the contextual model and extracts the most semantically similar terms (e.g., additional symptoms). The result is a complete set of terms related to the medical concept. Step 3 –in production, there is a need to extract medical concepts from the unseen medical text. The system extracts key-phrases from the new text, then matches them against the complete set of terms from step 2, and the most semantically similar will be annotated with the same medical concept category. As an example, the seed symptom concepts would result in the following annotation: “The patient complaints on fatigue [symptom], dry skin [symptom], and Weight loss [symptom], which can be an early sign for Diabetes.” Our evaluations show promising results for extracting concepts from medical corpora. The method allows medical analysts to easily and efficiently build taxonomies (in step 2) representing their domain-specific concepts, and automatically annotate a large number of texts (in step 3) for classification/summarization of medical reports.

Keywords: clinical concepts, concept expansion, medical records annotation, medical records summarization

Procedia PDF Downloads 133
262 The Importance of Country-of-Origin Information and Perceived Product Quality in Uzbekistan

Authors: Begzod Nishanov, Farhod Karimov

Abstract:

Globalization and the internet have completely changed the way in which businesses operate as well as has equipped customers with endless potential. Today, consumers’ product choice is not only affected by branding, price and quality of the product, but also by the country-of-origin information. Precisely, ‘Made In’ label is considered as one of the driving factors which directly impact on consumers’ preferences. Generally, it is obvious that products manufactured in less developed countries are considered to be of lower quality and riskier compared to the products made in developed countries. In this regard, it is worth to note that this phenomenon is mainly applicable to western developed countries. However, there is a lack of empirical research on underlying the influence of country-of-origin phenomenon in emerging economies such as Uzbekistan. Today, Uzbek market is being dominated by growing number of foreign made products. Uzbek manufacturers are facing intense competition not only from local producers but also from the availability of foreign goods suppliers. Consequently, consumers are given wider choice of products than ever before. In this regard, it is important to define the importance of country-of-origin information in order to understand Uzbek consumers’ preference. The methodology of the research is formulated based on the methodology of previous papers. A total 527 online questionnaires were completed. Data analysis was conducted using factor analysis and analysis of variance test (ANOVA). Findings of the research support the view that Uzbek consumers attach great importance to the country-of-origin information of products. Precisely, it can be stated that Uzbek people perceive product quality by its ‘Made in...’ label, especially when buying high involvement goods such as car or refrigerator. Another findings of the paper show that products manufactured in developed countries including Germany, Japan and USA are found to be of high quality, while products manufactured in less developed countries are considered to be of lower quality. Marketers can use this information for segmentation purposes. For example, products manufactured in less developed countries can be targeted for low-to-middle income families while goods manufactured in developed countries can be targeted for higher income families. In conclusion, it can be stated that perceived product quality of products that are made in Uzbekistan has slightly increased since 18 years. It implies that nowadays products under ‘Made in Uzbekistan’ label is continually becoming available to many consumers in foreign markets, especially among Commonwealth of Independent States (CIS) countries. Therefore, conducting further research to explore the phenomenon of country-of-origin information and perceived product quality in emerging markets is of paramount importance.

Keywords: country-of-origin, consumer behavior, product evaluation, perceived quality

Procedia PDF Downloads 258
261 How Consumers Perceive Health and Nutritional Information and How It Affects Their Purchasing Behavior: Comparative Study between Colombia and the Dominican Republic

Authors: Daniel Herrera Gonzalez, Maria Luisa Montas

Abstract:

There are some factors affecting consumer decision-making regarding the use of the front of package labels in order to find benefits to the well-being of the human being. Currently, there are several labels that help influence or change the purchase decision for food products. These labels communicate the impact that food has on human health; therefore, consumers are more critical and intelligent when buying and consuming food products. The research explores the association between front-of-pack labeling and food choice; the association between label content and purchasing decisions is complex and influenced by different factors, including the packaging itself. The main objective of this study was to examine the perception of health labels and nutritional declarations and their influence on buying decisions in the non-alcoholic beverages sector. This comparative study of two developing countries will show how consumers take nutritional labels into account when deciding to buy certain foods. This research applied a quantitative methodology with correlational scope. This study has a correlational approach in order to analyze the degree of association between variables. Likewise, the confirmatory factor analysis (CFA) method and structural equation modeling (SEM) as a powerful multivariate technique was used as statistical technique to find the relationships between observable and unobservable variables. The main findings of this research were the obtaining of three large groups and their perception and effects on nutritional and wellness labels. The first group is characterized by taking an attitude of high interest on the issue of the imposition of the nutritional information label on products and would agree that all products should be packaged given its importance to preventing illnesses in the consumer. Likewise, they almost always care about the brand, the size, the list of ingredients, and nutritional information of the food, and also the effect of these on health. The second group stands out for presenting some interest in the importance of the label on products as a purchase decision, in addition to almost always taking into account the characteristics of size, money, components, etc. of the products to decide on their consumption and almost always They are never interested in the effect of these products on their health or nutrition, and in group 3, it differs from the others by being more neutral regarding the issue of nutritional information labels, and being less interested in the purchase decision and characteristics of the product and also on the influence of these on health and nutrition. This new knowledge is essential for different companies that manufacture and market food products because they will have information to adapt or anticipate the new laws of developing countries as well as the new needs of health-conscious consumers when they buy food products.

Keywords: healthy labels, consumer behavior, nutritional information, healthy products

Procedia PDF Downloads 107
260 Transformer-Driven Multi-Category Classification for an Automated Academic Strand Recommendation Framework

Authors: Ma Cecilia Siva

Abstract:

This study introduces a Bidirectional Encoder Representations from Transformers (BERT)-based machine learning model aimed at improving educational counseling by automating the process of recommending academic strands for students. The framework is designed to streamline and enhance the strand selection process by analyzing students' profiles and suggesting suitable academic paths based on their interests, strengths, and goals. Data was gathered from a sample of 200 grade 10 students, which included personal essays and survey responses relevant to strand alignment. After thorough preprocessing, the text data was tokenized, label-encoded, and input into a fine-tuned BERT model set up for multi-label classification. The model was optimized for balanced accuracy and computational efficiency, featuring a multi-category classification layer with sigmoid activation for independent strand predictions. Performance metrics showed an F1 score of 88%, indicating a well-balanced model with precision at 80% and recall at 100%, demonstrating its effectiveness in providing reliable recommendations while reducing irrelevant strand suggestions. To facilitate practical use, the final deployment phase created a recommendation framework that processes new student data through the trained model and generates personalized academic strand suggestions. This automated recommendation system presents a scalable solution for academic guidance, potentially enhancing student satisfaction and alignment with educational objectives. The study's findings indicate that expanding the data set, integrating additional features, and refining the model iteratively could improve the framework's accuracy and broaden its applicability in various educational contexts.

Keywords: tokenized, sigmoid activation, transformer, multi category classification

Procedia PDF Downloads 5
259 Multiple Version of Roman Domination in Graphs

Authors: J. C. Valenzuela-Tripodoro, P. Álvarez-Ruíz, M. A. Mateos-Camacho, M. Cera

Abstract:

In 2004, it was introduced the concept of Roman domination in graphs. This concept was initially inspired and related to the defensive strategy of the Roman Empire. An undefended place is a city so that no legions are established on it, whereas a strong place is a city in which two legions are deployed. This situation may be modeled by labeling the vertices of a finite simple graph with labels {0, 1, 2}, satisfying the condition that any 0-vertex must be adjacent to, at least, a 2-vertex. Roman domination in graphs is a variant of classic domination. Clearly, the main aim is to obtain such labeling of the vertices of the graph with minimum cost, that is to say, having minimum weight (sum of all vertex labels). Formally, a function f: V (G) → {0, 1, 2} is a Roman dominating function (RDF) in the graph G = (V, E) if f(u) = 0 implies that f(v) = 2 for, at least, a vertex v which is adjacent to u. The weight of an RDF is the positive integer w(f)= ∑_(v∈V)▒〖f(v)〗. The Roman domination number, γ_R (G), is the minimum weight among all the Roman dominating functions? Obviously, the set of vertices with a positive label under an RDF f is a dominating set in the graph, and hence γ(G)≤γ_R (G). In this work, we start the study of a generalization of RDF in which we consider that any undefended place should be defended from a sudden attack by, at least, k legions. These legions can be deployed in the city or in any of its neighbours. A function f: V → {0, 1, . . . , k + 1} such that f(N[u]) ≥ k + |AN(u)| for all vertex u with f(u) < k, where AN(u) represents the set of active neighbours (i.e., with a positive label) of vertex u, is called a [k]-multiple Roman dominating functions and it is denoted by [k]-MRDF. The minimum weight of a [k]-MRDF in the graph G is the [k]-multiple Roman domination number ([k]-MRDN) of G, denoted by γ_[kR] (G). First, we prove that the [k]-multiple Roman domination decision problem is NP-complete even when restricted to bipartite and chordal graphs. A problem that had been resolved for other variants and wanted to be generalized. We know the difficulty of calculating the exact value of the [k]-MRD number, even for families of particular graphs. Here, we present several upper and lower bounds for the [k]-MRD number that permits us to estimate it with as much precision as possible. Finally, some graphs with the exact value of this parameter are characterized.

Keywords: multiple roman domination function, decision problem np-complete, bounds, exact values

Procedia PDF Downloads 106
258 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 65
257 The Relational Approach under the Angle of the CSR

Authors: Fatima El Kandoussi, Hind Benouakrim, Afafe El Amrani El Hassani

Abstract:

CSR in the relational approach is imposed today as a matter of concerns lighthouses in the academic environment and managerial. This study presents the issues of the CSR dimension in the field of relationship marketing. This exploratory research was conducted with two groups of Moroccan enterprises having the label of the CSR /CGEM. It presents a better understanding of the approaches taken by the companies interviewed in a CSR and contributed to understand the reasons that lead them to adopt the process of CSR and also allows explaining how these enterprises maintain their relationship with the most important customers in a context of CSR.

Keywords: relationship marketing, CSR, stakeholders, business

Procedia PDF Downloads 446
256 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 107
255 A Clinician’s Perspective on Electroencephalography Annotation and Analysis for Driver Drowsiness Estimation

Authors: Ruxandra Aursulesei, David O’Callaghan, Cian Ryan, Diarmaid O’Cualain, Viktor Varkarakis, Alina Sultana, Joseph Lemley

Abstract:

Human errors caused by drowsiness are among the leading causes of road accidents. Neurobiological research gives information about the electrical signals emitted by neurons firing within the brain. Electrical signal frequencies can be determined by attaching bio-sensors to the head surface. By observing the electrical impulses and the rhythmic interaction of neurons with each other, we can predict the mental state of a person. In this paper, we aim to better understand intersubject and intrasubject variability in terms of electrophysiological patterns that occur at the onset of drowsiness and their evolution with the decreasing of vigilance. The purpose is to lay the foundations for an algorithm that detects the onset of drowsiness before the physical signs become apparent.

Keywords: electroencephalography, drowsiness, ADAS, annotations, clinician

Procedia PDF Downloads 114
254 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 280