Search results for: short text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6213

Search results for: short text classification

5643 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 139
5642 Managing Cognitive Load in Accounting: An Analysis of Three Instructional Designs in Financial Accounting

Authors: Seedwell Sithole

Abstract:

One of the persistent problems in accounting education is how to effectively support students’ learning. A promising technique to this issue is to investigate the extent that learning is determined by the design of instructional material. This study examines the academic performance of students using three instructional designs in financial accounting. Student’s performance scores and reported mental effort ratings were used to determine the instructional effectiveness. The findings of this study show that accounting students prefer graph and text designs that are integrated. The results suggest that spatially separated graph and text presentations in accounting should be reorganized to align with the requirements of human cognitive architecture.

Keywords: accounting, cognitive load, education, instructional preferences, students

Procedia PDF Downloads 153
5641 Land Cover Remote Sensing Classification Advanced Neural Networks Supervised Learning

Authors: Eiman Kattan

Abstract:

This study aims to evaluate the impact of classifying labelled remote sensing images conventional neural network (CNN) architecture, i.e., AlexNet on different land cover scenarios based on two remotely sensed datasets from different point of views such as the computational time and performance. Thus, a set of experiments were conducted to specify the effectiveness of the selected convolutional neural network using two implementing approaches, named fully trained and fine-tuned. For validation purposes, two remote sensing datasets, AID, and RSSCN7 which are publicly available and have different land covers features were used in the experiments. These datasets have a wide diversity of input data, number of classes, amount of labelled data, and texture patterns. A specifically designed interactive deep learning GPU training platform for image classification (Nvidia Digit) was employed in the experiments. It has shown efficiency in training, validation, and testing. As a result, the fully trained approach has achieved a trivial result for both of the two data sets, AID and RSSCN7 by 73.346% and 71.857% within 24 min, 1 sec and 8 min, 3 sec respectively. However, dramatic improvement of the classification performance using the fine-tuning approach has been recorded by 92.5% and 91% respectively within 24min, 44 secs and 8 min 41 sec respectively. The represented conclusion opens the opportunities for a better classification performance in various applications such as agriculture and crops remote sensing.

Keywords: conventional neural network, remote sensing, land cover, land use

Procedia PDF Downloads 372
5640 Faster, Lighter, More Accurate: A Deep Learning Ensemble for Content Moderation

Authors: Arian Hosseini, Mahmudul Hasan

Abstract:

To address the increasing need for efficient and accurate content moderation, we propose an efficient and lightweight deep classification ensemble structure. Our approach is based on a combination of simple visual features, designed for high-accuracy classification of violent content with low false positives. Our ensemble architecture utilizes a set of lightweight models with narrowed-down color features, and we apply it to both images and videos. We evaluated our approach using a large dataset of explosion and blast contents and compared its performance to popular deep learning models such as ResNet-50. Our evaluation results demonstrate significant improvements in prediction accuracy, while benefiting from 7.64x faster inference and lower computation cost. While our approach is tailored to explosion detection, it can be applied to other similar content moderation and violence detection use cases as well. Based on our experiments, we propose a "think small, think many" philosophy in classification scenarios. We argue that transforming a single, large, monolithic deep model into a verification-based step model ensemble of multiple small, simple, and lightweight models with narrowed-down visual features can possibly lead to predictions with higher accuracy.

Keywords: deep classification, content moderation, ensemble learning, explosion detection, video processing

Procedia PDF Downloads 55
5639 Improve Divers Tracking and Classification in Sonar Images Using Robust Diver Wake Detection Algorithm

Authors: Mohammad Tarek Al Muallim, Ozhan Duzenli, Ceyhun Ilguy

Abstract:

Harbor protection systems are so important. The need for automatic protection systems has increased over the last years. Diver detection active sonar has great significance. It used to detect underwater threats such as divers and autonomous underwater vehicle. To automatically detect such threats the sonar image is processed by algorithms. These algorithms used to detect, track and classify of underwater objects. In this work, divers tracking and classification algorithm is improved be proposing a robust wake detection method. To detect objects the sonar images is normalized then segmented based on fixed threshold. Next, the centroids of the segments are found and clustered based on distance metric. Then to track the objects linear Kalman filter is applied. To reduce effect of noise and creation of false tracks, the Kalman tracker is fine tuned. The tuning is done based on our active sonar specifications. After the tracks are initialed and updated they are subjected to a filtering stage to eliminate the noisy and unstable tracks. Also to eliminate object with a speed out of the diver speed range such as buoys and fast boats. Afterwards the result tracks are subjected to a classification stage to deiced the type of the object been tracked. Here the classification stage is to deice wither if the tracked object is an open circuit diver or a close circuit diver. At the classification stage, a small area around the object is extracted and a novel wake detection method is applied. The morphological features of the object with his wake is extracted. We used support vector machine to find the best classifier. The sonar training images and the test images are collected by ARMELSAN Defense Technologies Company using the portable diver detection sonar ARAS-2023. After applying the algorithm to the test sonar data, we get fine and stable tracks of the divers. The total classification accuracy achieved with the diver type is 97%.

Keywords: harbor protection, diver detection, active sonar, wake detection, diver classification

Procedia PDF Downloads 238
5638 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 183
5637 Emotions in Health Tweets: Analysis of American Government Official Accounts

Authors: García López

Abstract:

The Government Departments of Health have the task of informing and educating citizens about public health issues. For this, they use channels like Twitter, key in the search for health information and the propagation of content. The tweets, important in the virality of the content, may contain emotions that influence the contagion and exchange of knowledge. The goal of this study is to perform an analysis of the emotional projection of health information shared on Twitter by official American accounts: the disease control account CDCgov, National Institutes of Health, NIH, the government agency HHSGov, and the professional organization PublicHealth. For this, we used Tone Analyzer, an International Business Machines Corporation (IBM) tool specialized in emotion detection in text, corresponding to the categorical model of emotion representation. For 15 days, all tweets from these accounts were analyzed with the emotional analysis tool in text. The results showed that their tweets contain an important emotional load, a determining factor in the success of their communications. This exposes that official accounts also use subjective language and contain emotions. The predominance of emotion joy over sadness and the strong presence of emotions in their tweets stimulate the virality of content, a key in the work of informing that government health departments have.

Keywords: emotions in tweets, emotion detection in the text, health information on Twitter, American health official accounts, emotions on Twitter, emotions and content

Procedia PDF Downloads 145
5636 Computational Linguistic Implications of Gender Bias: Machines Reflect Misogyny in Society

Authors: Irene Yi

Abstract:

Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Computational linguistics is a growing field dealing with such issues of data collection for technological development. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Computational analysis on such linguistic data is used to find patterns of misogyny. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.

Keywords: computational analysis, gendered grammar, misogynistic language, neural networks

Procedia PDF Downloads 122
5635 Robust Pattern Recognition via Correntropy Generalized Orthogonal Matching Pursuit

Authors: Yulong Wang, Yuan Yan Tang, Cuiming Zou, Lina Yang

Abstract:

This paper presents a novel sparse representation method for robust pattern classification. Generalized orthogonal matching pursuit (GOMP) is a recently proposed efficient sparse representation technique. However, GOMP adopts the mean square error (MSE) criterion and assign the same weights to all measurements, including both severely and slightly corrupted ones. To reduce the limitation, we propose an information-theoretic GOMP (ITGOMP) method by exploiting the correntropy induced metric. The results show that ITGOMP can adaptively assign small weights on severely contaminated measurements and large weights on clean ones, respectively. An ITGOMP based classifier is further developed for robust pattern classification. The experiments on public real datasets demonstrate the efficacy of the proposed approach.

Keywords: correntropy induced metric, matching pursuit, pattern classification, sparse representation

Procedia PDF Downloads 357
5634 Continual Learning Using Data Generation for Hyperspectral Remote Sensing Scene Classification

Authors: Samiah Alammari, Nassim Ammour

Abstract:

When providing a massive number of tasks successively to a deep learning process, a good performance of the model requires preserving the previous tasks data to retrain the model for each upcoming classification. Otherwise, the model performs poorly due to the catastrophic forgetting phenomenon. To overcome this shortcoming, we developed a successful continual learning deep model for remote sensing hyperspectral image regions classification. The proposed neural network architecture encapsulates two trainable subnetworks. The first module adapts its weights by minimizing the discrimination error between the land-cover classes during the new task learning, and the second module tries to learn how to replicate the data of the previous tasks by discovering the latent data structure of the new task dataset. We conduct experiments on HSI dataset Indian Pines. The results confirm the capability of the proposed method.

Keywords: continual learning, data reconstruction, remote sensing, hyperspectral image segmentation

Procedia PDF Downloads 268
5633 The Prevalence of Cardiovascular Diseases in World-Class Triathletes: An Internet-Based Study from 2006 to 2019

Authors: Lingxia Li, Frédéric Schnell, Shuzhe Ding, Solène Le Douairon Lahaye

Abstract:

Background: The prevalence of cardiovascular diseases (CVD) in different triathlon sports disciplines has not been determined. Purpose: The present study aimed to determine the prevalence of CVD in world-class triathletes according to their sex, sports disciplines (aquathlon, duathlon, triathlon…), and formats (short/medium, long, and ultra-long distance). Methods: Male and female elite athletes from eleven triathlon sport disciplines, ranked in the internationally yearly top 10 between 2006 and 2019, were included. The athlete’s name was associated in a Google search with selected key terms related to heart disease and/or cardiac abnormalities. The prevalence and the hazard function of the variation were calculated, and the differences were then compared. Results: From 1329 athletes (male 639, female 690), 13 cases of CVD (0.98%, 95% CI: [0.45-1.51]) were identified, and the mean age of their occurrence was 29±6 years. Although no sex differences were found in each sport discipline/format (p > 0.05), severe outcomes (sudden cardiac arrest/death and those who had to stop their sports practice) were only observed in males. Short-distance triathlon (5.08%, 95% CI: [1.12-9.05]) was more affected than other disciplines in short/medium, long, and ultra-long formats. The prevalence of CVD in athletes who participated in multi-type of sports disciplines (4.14%, 95% CI: [1.14-7.15]) was higher than in those who participated in one type (0.52%, 95% CI: [0.10-0.93]) (p = 0.0004). Conclusion: Athletes in short-distance triathlon were more affected than other disciplines in short/medium, long and ultra-long formats. Athletes who participate in short/medium distances and those who participate in multi-type of sports disciplines should be closely monitored regardless of sex.

Keywords: cardiovascular diseases, sudden cardiac death, triathlon sport disciplines, world-class athletes

Procedia PDF Downloads 151
5632 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 100
5631 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 80
5630 Comparing the Apparent Error Rate of Gender Specifying from Human Skeletal Remains by Using Classification and Cluster Methods

Authors: Jularat Chumnaul

Abstract:

In forensic science, corpses from various homicides are different; there are both complete and incomplete, depending on causes of death or forms of homicide. For example, some corpses are cut into pieces, some are camouflaged by dumping into the river, some are buried, some are burned to destroy the evidence, and others. If the corpses are incomplete, it can lead to the difficulty of personally identifying because some tissues and bones are destroyed. To specify gender of the corpses from skeletal remains, the most precise method is DNA identification. However, this method is costly and takes longer so that other identification techniques are used instead. The first technique that is widely used is considering the features of bones. In general, an evidence from the corpses such as some pieces of bones, especially the skull and pelvis can be used to identify their gender. To use this technique, forensic scientists are required observation skills in order to classify the difference between male and female bones. Although this technique is uncomplicated, saving time and cost, and the forensic scientists can fairly accurately determine gender by using this technique (apparently an accuracy rate of 90% or more), the crucial disadvantage is there are only some positions of skeleton that can be used to specify gender such as supraorbital ridge, nuchal crest, temporal lobe, mandible, and chin. Therefore, the skeletal remains that will be used have to be complete. The other technique that is widely used for gender specifying in forensic science and archeology is skeletal measurements. The advantage of this method is it can be used in several positions in one piece of bones, and it can be used even if the bones are not complete. In this study, the classification and cluster analysis are applied to this technique, including the Kth Nearest Neighbor Classification, Classification Tree, Ward Linkage Cluster, K-mean Cluster, and Two Step Cluster. The data contains 507 particular individuals and 9 skeletal measurements (diameter measurements), and the performance of five methods are investigated by considering the apparent error rate (APER). The results from this study indicate that the Two Step Cluster and Kth Nearest Neighbor method seem to be suitable to specify gender from human skeletal remains because both yield small apparent error rate of 0.20% and 4.14%, respectively. On the other hand, the Classification Tree, Ward Linkage Cluster, and K-mean Cluster method are not appropriate since they yield large apparent error rate of 10.65%, 10.65%, and 16.37%, respectively. However, there are other ways to evaluate the performance of classification such as an estimate of the error rate using the holdout procedure or misclassification costs, and the difference methods can make the different conclusions.

Keywords: skeletal measurements, classification, cluster, apparent error rate

Procedia PDF Downloads 252
5629 Non-intrusive Hand Control of Drone Using an Inexpensive and Streamlined Convolutional Neural Network Approach

Authors: Evan Lowhorn, Rocio Alba-Flores

Abstract:

The purpose of this work is to develop a method for classifying hand signals and using the output in a drone control algorithm. To achieve this, methods based on Convolutional Neural Networks (CNN) were applied. CNN's are a subset of deep learning, which allows grid-like inputs to be processed and passed through a neural network to be trained for classification. This type of neural network allows for classification via imaging, which is less intrusive than previous methods using biosensors, such as EMG sensors. Classification CNN's operate purely from the pixel values in an image; therefore they can be used without additional exteroceptive sensors. A development bench was constructed using a desktop computer connected to a high-definition webcam mounted on a scissor arm. This allowed the camera to be pointed downwards at the desk to provide a constant solid background for the dataset and a clear detection area for the user. A MATLAB script was created to automate dataset image capture at the development bench and save the images to the desktop. This allowed the user to create their own dataset of 12,000 images within three hours. These images were evenly distributed among seven classes. The defined classes include forward, backward, left, right, idle, and land. The drone has a popular flip function which was also included as an additional class. To simplify control, the corresponding hand signals chosen were the numerical hand signs for one through five for movements, a fist for land, and the universal “ok” sign for the flip command. Transfer learning with PyTorch (Python) was performed using a pre-trained 18-layer residual learning network (ResNet-18) to retrain the network for custom classification. An algorithm was created to interpret the classification and send encoded messages to a Ryze Tello drone over its 2.4 GHz Wi-Fi connection. The drone’s movements were performed in half-meter distance increments at a constant speed. When combined with the drone control algorithm, the classification performed as desired with negligible latency when compared to the delay in the drone’s movement commands.

Keywords: classification, computer vision, convolutional neural networks, drone control

Procedia PDF Downloads 212
5628 Nation Branding as Reframing: From the Perspective of Translation Studies

Authors: Ye Tian

Abstract:

Soft power has replaced hard power and become one of the most attractive ways nations pursue to expand their international influence. One of the ways to improve a nation’s soft power is to commercialise the country and brand or rebrand it to the international audience, and thus attract interests or foreign investments. In this process, translation has often been regarded as merely a tool, and researches in it are either in translating literature as culture export or in how (in)accuracy of translation influences the branding campaign. This paper proposes to analyse nation branding campaign with framing theory, and thus gives an entry for translation studies to come to a central stage in today’s soft power research. To frame information or elements of a text, an event, or, as in this paper, a nation is to put them in a mental structure. This structure can be built by outsiders or by those who create the text, the event, or by citizens of the nation. To frame information like this can be regarded as a process of translation, as what translation does in its traditional meaning of ‘translating a text’ is to put a framework on the text to, deliberately or not, highlight some of the elements while hiding the others. In the discourse of nations, then, people unavoidably simplify a national image and put the nation into their imaginary framework. In this way, problems like stereotype and prejudice come into being. Meanwhile, if nations seek ways to frame or reframe themselves, they make efforts to have in control what and who they are in the eyes of international audiences, and thus make profits, economically or politically, from it. The paper takes African nations, which are usually perceived as a whole, and the United Kingdom as examples to justify passive and active framing process, and assesses both positive and negative influence framing has on nations. In conclusion, translation as framing causes problems like prejudice, and the image of a nation is not always in the hands of nation branders, but reframing the nation in a positive way has the potential to turn the tide.

Keywords: framing, nation branding, stereotype, translation

Procedia PDF Downloads 156
5627 Text Mining Past Medical History in Electrophysiological Studies

Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly

Abstract:

Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.

Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis

Procedia PDF Downloads 96
5626 Recommendations to Improve Classification of Grade Crossings in Urban Areas of Mexico

Authors: Javier Alfonso Bonilla-Chávez, Angélica Lozano

Abstract:

In North America, more than 2,000 people annually die in accidents related to railroad tracks. In 2020, collisions at grade crossings were the main cause of deaths related to railway accidents in Mexico. Railway networks have constant interaction with motor transport users, cyclists, and pedestrians, mainly in grade crossings, where is the greatest vulnerability and risk of accidents. Usually, accidents at grade crossings are directly related to risky behavior and non-compliance with regulations by motorists, cyclists, and pedestrians, especially in developing countries. Around the world, countries classify these crossings in different ways. In Mexico, according to their dangerousness (high, medium, or low), types A, B and C have been established, recommending for each one different type of auditive and visual signaling and gates, as well as horizontal and vertical signaling. This classification is based in a weighting, but regrettably, it is not explained how the weight values were obtained. A review of the variables and the current approach for the grade crossing classification is required, since it is inadequate for some crossings. In contrast, North America (USA and Canada) and European countries consider a broader classification so that attention to each crossing is addressed more precisely and equipment costs are adjusted. Lack of a proper classification, could lead to cost overruns in the equipment and a deficient operation. To exemplify the lack of a good classification, six crossings are studied, three located in the rural area of Mexico and three in Mexico City. These cases show the need of: improving the current regulations, improving the existing infrastructure, and implementing technological systems, including informative signals with nomenclature of the involved crossing and direct telephone line for reporting emergencies. This implementation is unaffordable for most municipal governments. Also, an inventory of the most dangerous grade crossings in urban and rural areas must be obtained. Then, an approach for improving the classification of grade crossings is suggested. This approach must be based on criteria design, characteristics of adjacent roads or intersections which can influence traffic flow through the crossing, accidents related to motorized and non-motorized vehicles, land use and land management, type of area, and services and economic activities in the zone where the grade crossings is located. An expanded classification of grade crossing in Mexico could reduce accidents and improve the efficiency of the railroad.

Keywords: accidents, grade crossing, railroad, traffic safety

Procedia PDF Downloads 109
5625 Tensor Deep Stacking Neural Networks and Bilinear Mapping Based Speech Emotion Classification Using Facial Electromyography

Authors: P. S. Jagadeesh Kumar, Yang Yung, Wenli Hu

Abstract:

Speech emotion classification is a dominant research field in finding a sturdy and profligate classifier appropriate for different real-life applications. This effort accentuates on classifying different emotions from speech signal quarried from the features related to pitch, formants, energy contours, jitter, shimmer, spectral, perceptual and temporal features. Tensor deep stacking neural networks were supported to examine the factors that influence the classification success rate. Facial electromyography signals were composed of several forms of focuses in a controlled atmosphere by means of audio-visual stimuli. Proficient facial electromyography signals were pre-processed using moving average filter, and a set of arithmetical features were excavated. Extracted features were mapped into consistent emotions using bilinear mapping. With facial electromyography signals, a database comprising diverse emotions will be exposed with a suitable fine-tuning of features and training data. A success rate of 92% can be attained deprived of increasing the system connivance and the computation time for sorting diverse emotional states.

Keywords: speech emotion classification, tensor deep stacking neural networks, facial electromyography, bilinear mapping, audio-visual stimuli

Procedia PDF Downloads 256
5624 Evaluating the Effectiveness of Animated Videos in Learning Economics

Authors: J. Chow

Abstract:

In laboratory settings, this study measured and reported the effects of undergraduate students watching animated videos on learning microeconomics as compared with the effectiveness of reading written texts. The study described an experiment on learning microeconomics in higher education using two different types of learning materials. It reported the effectiveness on microeconomics learning of watching animated videos and reading written texts. Undergraduate students in the university were randomly assigned to either a ‘video group’ or a ‘text group’ in the experiment. Previously-validated multiple-choice questions on fundamental concepts of microeconomics were administered. Both groups showed improvement between the pre-test and post-test. The experience of learning using text and video materials was also assessed. After controlling the student characteristics variables, the analyses showed that both types of materials showed comparable level of perceived learning experience. The effect size and statistical significance of these results supported the hypothesis that animated video is an effective alternative to text materials as a learning tool for students. The findings suggest that such animated videos may support teaching microeconomics in higher education.

Keywords: animated videos for education, laboratory experiment, microeconomics education, undergraduate economics education

Procedia PDF Downloads 146
5623 Construction and Analysis of Tamazight (Berber) Text Corpus

Authors: Zayd Khayi

Abstract:

This paper deals with the construction and analysis of the Tamazight text corpus. The grammatical structure of the Tamazight remains poorly understood, and a lack of comparative grammar leads to linguistic issues. In order to fill this gap, even though it is small, by constructed the diachronic corpus of the Tamazight language, and elaborated the program tool. In addition, this work is devoted to constructing that tool to analyze the different aspects of the Tamazight, with its different dialects used in the north of Africa, specifically in Morocco. It also focused on three Moroccan dialects: Tamazight, Tarifiyt, and Tachlhit. The Latin version was good choice because of the many sources it has. The corpus is based on the grammatical parameters and features of that language. The text collection contains more than 500 texts that cover a long historical period. It is free, and it will be useful for further investigations. The texts were transformed into an XML-format standardization goal. The corpus counts more than 200,000 words. Based on the linguistic rules and statistical methods, the original user interface and software prototype were developed by combining the technologies of web design and Python. The corpus presents more details and features about how this corpus provides users with the ability to distinguish easily between feminine/masculine nouns and verbs. The interface used has three languages: TMZ, FR, and EN. Selected texts were not initially categorized. This work was done in a manual way. Within corpus linguistics, there is currently no commonly accepted approach to the classification of texts. Texts are distinguished into ten categories. To describe and represent the texts in the corpus, we elaborated the XML structure according to the TEI recommendations. Using the search function may provide us with the types of words we would search for, like feminine/masculine nouns and verbs. Nouns are divided into two parts. The gender in the corpus has two forms. The neutral form of the word corresponds to masculine, while feminine is indicated by a double t-t affix (the prefix t- and the suffix -t), ex: Tarbat (girl), Tamtut (woman), Taxamt (tent), and Tislit (bride). However, there are some words whose feminine form contains only the prefix t- and the suffix –a, ex: Tasa (liver), tawja (family), and tarwa (progenitors). Generally, Tamazight masculine words have prefixes that distinguish them from other words. For instance, 'a', 'u', 'i', ex: Asklu (tree), udi (cheese), ighef (head). Verbs in the corpus are for the first person singular and plural that have suffixes 'agh','ex', 'egh', ex: 'ghrex' (I study), 'fegh' (I go out), 'nadagh' (I call). The program tool permits the following characteristics of this corpus: list of all tokens; list of unique words; lexical diversity; realize different grammatical requests. To conclude, this corpus has only focused on a small group of parts of speech in Tamazight language verbs, nouns. Work is still on the adjectives, prounouns, adverbs and others.

Keywords: Tamazight (Berber) language, corpus linguistic, grammar rules, statistical methods

Procedia PDF Downloads 68
5622 Information Disclosure And Financial Sentiment Index Using a Machine Learning Approach

Authors: Alev Atak

Abstract:

In this paper, we aim to create a financial sentiment index by investigating the company’s voluntary information disclosures. We retrieve structured content from BIST 100 companies’ financial reports for the period 1998-2018 and extract relevant financial information for sentiment analysis through Natural Language Processing. We measure strategy-related disclosures and their cross-sectional variation and classify report content into generic sections using synonym lists divided into four main categories according to their liquidity risk profile, risk positions, intra-annual information, and exposure to risk. We use Word Error Rate and Cosin Similarity for comparing and measuring text similarity and derivation in sets of texts. In addition to performing text extraction, we will provide a range of text analysis options, such as the readability metrics, word counts using pre-determined lists (e.g., forward-looking, uncertainty, tone, etc.), and comparison with reference corpus (word, parts of speech and semantic level). Therefore, we create an adequate analytical tool and a financial dictionary to depict the importance of granular financial disclosure for investors to identify correctly the risk-taking behavior and hence make the aggregated effects traceable.

Keywords: financial sentiment, machine learning, information disclosure, risk

Procedia PDF Downloads 94
5621 Research on Reservoir Lithology Prediction Based on Residual Neural Network and Squeeze-and- Excitation Neural Network

Authors: Li Kewen, Su Zhaoxin, Wang Xingmou, Zhu Jian Bing

Abstract:

Conventional reservoir prediction methods ar not sufficient to explore the implicit relation between seismic attributes, and thus data utilization is low. In order to improve the predictive classification accuracy of reservoir lithology, this paper proposes a deep learning lithology prediction method based on ResNet (Residual Neural Network) and SENet (Squeeze-and-Excitation Neural Network). The neural network model is built and trained by using seismic attribute data and lithology data of Shengli oilfield, and the nonlinear mapping relationship between seismic attribute and lithology marker is established. The experimental results show that this method can significantly improve the classification effect of reservoir lithology, and the classification accuracy is close to 70%. This study can effectively predict the lithology of undrilled area and provide support for exploration and development.

Keywords: convolutional neural network, lithology, prediction of reservoir, seismic attributes

Procedia PDF Downloads 178
5620 Random Forest Classification for Population Segmentation

Authors: Regina Chua

Abstract:

To reduce the costs of re-fielding a large survey, a Random Forest classifier was applied to measure the accuracy of classifying individuals into their assigned segments with the fewest possible questions. Given a long survey, one needed to determine the most predictive ten or fewer questions that would accurately assign new individuals to custom segments. Furthermore, the solution needed to be quick in its classification and usable in non-Python environments. In this paper, a supervised Random Forest classifier was modeled on a dataset with 7,000 individuals, 60 questions, and 254 features. The Random Forest consisted of an iterative collection of individual decision trees that result in a predicted segment with robust precision and recall scores compared to a single tree. A random 70-30 stratified sampling for training the algorithm was used, and accuracy trade-offs at different depths for each segment were identified. Ultimately, the Random Forest classifier performed at 87% accuracy at a depth of 10 with 20 instead of 254 features and 10 instead of 60 questions. With an acceptable accuracy in prioritizing feature selection, new tools were developed for non-Python environments: a worksheet with a formulaic version of the algorithm and an embedded function to predict the segment of an individual in real-time. Random Forest was determined to be an optimal classification model by its feature selection, performance, processing speed, and flexible application in other environments.

Keywords: machine learning, supervised learning, data science, random forest, classification, prediction, predictive modeling

Procedia PDF Downloads 95
5619 Genetic Algorithms for Feature Generation in the Context of Audio Classification

Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes

Abstract:

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

Keywords: feature generation, feature learning, genetic algorithm, music information retrieval

Procedia PDF Downloads 436
5618 Presence and Absence: The Use of Photographs in Paris, Texas

Authors: Yi-Ting Wang, Wen-Shu Lai

Abstract:

The subject of this paper is the photography in the 1983 film Paris, Texas, directed by Wim Wenders. Wenders is well known as a film director as well as a photographer. We have found that photography is shown as a photographic element in many of his films. Some of these photographs serve as details within the films, while others play important roles that are relevant to the story. This paper aims to consider photographs in film as a specific type of text, which is the output of both still photography and the film itself. In the film Paris, Texas, three sets of important photographs appear whose symbolic meanings are as dialectical as their text types. The relationship between the existence of these photos and the storyline is both dependent and isolated. The film’s images fly by and progress into other images, while the photos in the film serve a unique narrative function by stopping the continuously flowing images thus provide the viewer a space for imagination and contemplation. They are more than just artistic forms; they also contained multiple meanings. The photographs in Paris, Texas play the role of both presence and absence according to their shifting meanings. There are references to their presence: photographs exist between film time and narrative time, so in terms of the interaction between the characters in the film, photographs are a common symbol of the beginning and end of the characters’ journeys. In terms of the audience, the film’s photographs are a link in the viewing frame structure, through which the creative motivation of the film director can be explored. Photographs also point to the absence of certain objects: the scenes in the photos represent an imaginary map of emotion. The town of Paris, Texas is therefore isolated from the physical presence of the photograph, and is far more abstract than the reality in the film. This paper embraces the ambiguous nature of photography and demonstrates its presence and absence in film with regard to the meaning of text. However, it is worth reflecting that the temporary nature of the interpretation of the film’s photographs is far greater than any other type of photographic text: the characteristics of the text cause the interpretation results to change along with the variations in the interpretation process, which makes their meaning a dynamic process. The photographs’ presence or absence in the context of Paris, Texas also demonstrates the presence and absence of the creator, time, the truth, and the imagination. The film becomes more complete as a result of the revelation of the photographs, while the intertextual connection between these two forms simultaneously provides multiple possibilities for the interpretation of the photographs in the film.

Keywords: film, Paris, Texas, photography, Wim Wenders

Procedia PDF Downloads 320
5617 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 144
5616 Understanding Music through the Framework of Feminist Confessional Literary Criticism: Heightening Audience Identification and Prioritising the Female Voice

Authors: Katharine Pollock

Abstract:

Feminist scholars assert that a defining aspect of feminist confessional literature is that it expresses both an individual and communal identity, one which is predicated on the commonly-shared aspects of female experience. Reading feminist confessional literature in this way accommodates a plurality of readerly experiences and textual interpretations. It affirms the individual whilst acknowledging those experiences which bind women together, and refuses traditional objective criticism. It invites readers to see themselves reflected in the text, and encourages them to share their own stories. Similarly, music which communicates women’s personal experience, fictive or not, expresses a dual identity. There is an inherent risk of imposing a confessional reading upon a musical or literary text. Understanding music as being multivocal in the same way as confessional literature negates this patriarchal tendency, and allows listeners to engage with both the subjective and collective aspects of a text. By hearing their own stories reflected in the music, listeners engage in an ongoing dialogic process in which female stories are prioritised. This refuses patriarchal silencing and ensures a diversity of female voices. To demonstrate the veracity of these claims, literary criticism is applied to Lily Allen’s music, and memoir My Thoughts Exactly.

Keywords: confession, female, feminist, literature, music

Procedia PDF Downloads 155
5615 Driving in a Short Arm Plaster Cast Steer a Patient off Course: A Randomised, Controlled, Crossover Study

Authors: B. W. Kenny, D.Mansour, K. G. Mansour, J. Attia, B. Meads

Abstract:

There is currently insufficient evidence to make a conclusive statement about safety while immobilized in a short arm cast. There is a paucity of published literature on this topic. The purpose of this study is to specifically evaluate short arm casts and their effect on driving abilities, particularly steering and avoidance of obstacles. The ability to drive safely is extrapolated from this data. In this study, a randomised, controlled, crossover design was used to assess 30 subjects randomised into 2 groups. A Logitech force feedback steering column and simulated driving program with a standardised road course was used. Objective outcome measures were the number of times subjects drove off the track, the number of crashes, time to lap completion and subjective assessment on whether wearing a short arm plaster cast impeded their steering. Recruited subjects had no upper limb pathology. The side of the applied plaster cast was randomised. The mean lap completion time reduced with repetition, the difference being statistically significant. There was no significant difference in mean number of times subjects in casts drove off the track (3 with vs. 3.07 without casts), average number of crashes (1.27 vs 0.97). Steering ability was not reduced whilst a subject was immobilised in a short arm Plaster of Paris cast, despite subject’s own impressions that their steering was impeded. This may help guide doctors in their advice to patients regarding driving in these casts.

Keywords: upper limb, arm injury, plaster cast, splint, driving, automobile, bone fracture

Procedia PDF Downloads 247
5614 An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing

Authors: Aleksandra Zysk, Pawel Badura

Abstract:

Recognizing and controlling vocal registers during singing is a difficult task for beginner vocalist. It requires among others identifying which part of natural resonators is being used when a sound propagates through the body. Thus, an application has been designed allowing for sound recording, automatic vocal register recognition (VRR), and a graphical user interface providing real-time visualization of the signal and recognition results. Six spectral features are determined for each time frame and passed to the support vector machine classifier yielding a binary decision on the head or chest register assignment of the segment. The classification training and testing data have been recorded by ten professional female singers (soprano, aged 19-29) performing sounds for both chest and head register. The classification accuracy exceeded 93% in each of various validation schemes. Apart from a hard two-class clustering, the support vector classifier returns also information on the distance between particular feature vector and the discrimination hyperplane in a feature space. Such an information reflects the level of certainty of the vocal register classification in a fuzzy way. Thus, the designed recognition and training application is able to assess and visualize the continuous trend in singing in a user-friendly graphical mode providing an easy way to control the vocal emission.

Keywords: classification, singing, spectral analysis, vocal emission, vocal register

Procedia PDF Downloads 305