Search results for: imbalance dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1409

Search results for: imbalance dataset

1079 Using Machine Learning to Classify Different Body Parts and Determine Healthiness

Authors: Zachary Pan

Abstract:

Our general mission is to solve the problem of classifying images into different body part types and deciding if each of them is healthy or not. However, for now, we will determine healthiness for only one-sixth of the body parts, specifically the chest. We will detect pneumonia in X-ray scans of those chest images. With this type of AI, doctors can use it as a second opinion when they are taking CT or X-ray scans of their patients. Another ad-vantage of using this machine learning classifier is that it has no human weaknesses like fatigue. The overall ap-proach to this problem is to split the problem into two parts: first, classify the image, then determine if it is healthy. In order to classify the image into a specific body part class, the body parts dataset must be split into test and training sets. We can then use many models, like neural networks or logistic regression models, and fit them using the training set. Now, using the test set, we can obtain a realistic accuracy the models will have on images in the real world since these testing images have never been seen by the models before. In order to increase this testing accuracy, we can also apply many complex algorithms to the models, like multiplicative weight update. For the second part of the problem, to determine if the body part is healthy, we can have another dataset consisting of healthy and non-healthy images of the specific body part and once again split that into the test and training sets. We then use another neural network to train on those training set images and use the testing set to figure out its accuracy. We will do this process only for the chest images. A major conclusion reached is that convolutional neural networks are the most reliable and accurate at image classification. In classifying the images, the logistic regression model, the neural network, neural networks with multiplicative weight update, neural networks with the black box algorithm, and the convolutional neural network achieved 96.83 percent accuracy, 97.33 percent accuracy, 97.83 percent accuracy, 96.67 percent accuracy, and 98.83 percent accuracy, respectively. On the other hand, the overall accuracy of the model that de-termines if the images are healthy or not is around 78.37 percent accuracy.

Keywords: body part, healthcare, machine learning, neural networks

Procedia PDF Downloads 103
1078 Detecting Hate Speech And Cyberbullying Using Natural Language Processing

Authors: Nádia Pereira, Paula Ferreira, Sofia Francisco, Sofia Oliveira, Sidclay Souza, Paula Paulino, Ana Margarida Veiga Simão

Abstract:

Social media has progressed into a platform for hate speech among its users, and thus, there is an increasing need to develop automatic detection classifiers of offense and conflicts to help decrease the prevalence of such incidents. Online communication can be used to intentionally harm someone, which is why such classifiers could be essential in social networks. A possible application of these classifiers is the automatic detection of cyberbullying. Even though identifying the aggressive language used in online interactions could be important to build cyberbullying datasets, there are other criteria that must be considered. Being able to capture the language, which is indicative of the intent to harm others in a specific context of online interaction is fundamental. Offense and hate speech may be the foundation of online conflicts, which have become commonly used in social media and are an emergent research focus in machine learning and natural language processing. This study presents two Portuguese language offense-related datasets which serve as examples for future research and extend the study of the topic. The first is similar to other offense detection related datasets and is entitled Aggressiveness dataset. The second is a novelty because of the use of the history of the interaction between users and is entitled the Conflicts/Attacks dataset. Both datasets were developed in different phases. Firstly, we performed a content analysis of verbal aggression witnessed by adolescents in situations of cyberbullying. Secondly, we computed frequency analyses from the previous phase to gather lexical and linguistic cues used to identify potentially aggressive conflicts and attacks which were posted on Twitter. Thirdly, thorough annotation of real tweets was performed byindependent postgraduate educational psychologists with experience in cyberbullying research. Lastly, we benchmarked these datasets with other machine learning classifiers.

Keywords: aggression, classifiers, cyberbullying, datasets, hate speech, machine learning

Procedia PDF Downloads 228
1077 Data Mining Model for Predicting the Status of HIV Patients during Drug Regimen Change

Authors: Ermias A. Tegegn, Million Meshesha

Abstract:

Human Immunodeficiency Virus and Acquired Immunodeficiency Syndrome (HIV/AIDS) is a major cause of death for most African countries. Ethiopia is one of the seriously affected countries in sub Saharan Africa. Previously in Ethiopia, having HIV/AIDS was almost equivalent to a death sentence. With the introduction of Antiretroviral Therapy (ART), HIV/AIDS has become chronic, but manageable disease. The study focused on a data mining technique to predict future living status of HIV/AIDS patients at the time of drug regimen change when the patients become toxic to the currently taking ART drug combination. The data is taken from University of Gondar Hospital ART program database. Hybrid methodology is followed to explore the application of data mining on ART program dataset. Data cleaning, handling missing values and data transformation were used for preprocessing the data. WEKA 3.7.9 data mining tools, classification algorithms, and expertise are utilized as means to address the research problem. By using four different classification algorithms, (i.e., J48 Classifier, PART rule induction, Naïve Bayes and Neural network) and by adjusting their parameters thirty-two models were built on the pre-processed University of Gondar ART program dataset. The performances of the models were evaluated using the standard metrics of accuracy, precision, recall, and F-measure. The most effective model to predict the status of HIV patients with drug regimen substitution is pruned J48 decision tree with a classification accuracy of 98.01%. This study extracts interesting attributes such as Ever taking Cotrim, Ever taking TbRx, CD4 count, Age, Weight, and Gender so as to predict the status of drug regimen substitution. The outcome of this study can be used as an assistant tool for the clinician to help them make more appropriate drug regimen substitution. Future research directions are forwarded to come up with an applicable system in the area of the study.

Keywords: HIV drug regimen, data mining, hybrid methodology, predictive model

Procedia PDF Downloads 142
1076 Lake Water Surface Variations and Its Influencing Factors in Tibetan Plateau in Recent 10 Years

Authors: Shanlong Lu, Jiming Jin, Xiaochun Wang

Abstract:

The Tibetan Plateau has the largest number of inland lakes with the highest elevation on the planet. These massive and large lakes are mostly in natural state and are less affected by human activities. Their shrinking or expansion can truly reflect regional climate and environmental changes and are sensitive indicators of global climate change. However, due to the sparsely populated nature of the plateau and the poor natural conditions, it is difficult to effectively obtain the change data of the lake, which has affected people's understanding of the temporal and spatial processes of lake water changes and their influencing factors. By using the MODIS (Moderate Resolution Imaging Spectroradiometer) MOD09Q1 surface reflectance images as basic data, this study produced the 8-day lake water surface data set of the Tibetan Plateau from 2000 to 2012 at 250 m spatial resolution, with a lake water surface extraction method of combined with lake water surface boundary buffer analyzing and lake by lake segmentation threshold determining. Then based on the dataset, the lake water surface variations and their influencing factors were analyzed, by using 4 typical natural geographical zones of Eastern Qinghai and Qilian, Southern Qinghai, Qiangtang, and Southern Tibet, and the watersheds of the top 10 lakes of Qinghai, Siling Co, Namco, Zhari NamCo, Tangra Yumco, Ngoring, UlanUla, Yamdrok Tso, Har and Gyaring as the analysis units. The accuracy analysis indicate that compared with water surface data of the 134 sample lakes extracted from the 30 m Landsat TM (Thematic Mapper ) images, the average overall accuracy of the lake water surface data set is 91.81% with average commission and omission error of 3.26% and 5.38%; the results also show strong linear (R2=0.9991) correlation with the global MODIS water mask dataset with overall accuracy of 86.30%; and the lake area difference between the Second National Lake Survey and this study is only 4.74%, respectively. This study provides reliable dataset for the lake change research of the plateau in the recent decade. The change trends and influencing factors analysis indicate that the total water surface area of lakes in the plateau showed overall increases, but only lakes with areas larger than 10 km2 had statistically significant increases. Furthermore, lakes with area larger than 100 km2 experienced an abrupt change in 2005. In addition, the annual average precipitation of Southern Tibet and Southern Qinghai experienced significant increasing and decreasing trends, and corresponding abrupt changes in 2004 and 2006, respectively. The annual average temperature of Southern Tibet and Qiangtang showed a significant increasing trend with an abrupt change in 2004. The major reason for the lake water surface variation in Eastern Qinghai and Qilian, Southern Qinghai and Southern Tibet is the changes of precipitation, and that for Qiangtang is the temperature variations.

Keywords: lake water surface variation, MODIS MOD09Q1, remote sensing, Tibetan Plateau

Procedia PDF Downloads 231
1075 Contemporary Technological Developments in Urban Warfare

Authors: Mehmet Ozturk, Serdal Akyuz, Halit Turan

Abstract:

By the evolving technology, the nature of the war has been changed since the beginning of the history. In the first generation war, the bayonet came to the fore in battlefields; successively; in the second-generation firepower; in the third generation maneuver. Today, in the fourth-generation, fighters, sides, and even fighters’ borders are unclear; consequently, lines of the battles have lost their significance. Furthermore, the actors in the battles can be state or non-state, military, paramilitary or civilian. In order to change the balance according to their interests, parties have utilized the urban areas as warfare. The main reason for using urban areas as a battlefield is the imbalance between parties. To balance the power strength, exploiting technological developments has utmost importance. There are many newly developed technologies for urban warfare such as change in the size of the unmanned aerial vehicle, increased usage of unmanned ground vehicles (especially in supply and evacuation purposes), systems showing the behind of the wall, simulations used for educational purposes. This study will focus on the technological equipment being used for urban warfare.

Keywords: urban warfare, unmanned ground vehicles, technological developments, nature of the war

Procedia PDF Downloads 419
1074 Heart Rate Variability Analysis for Early Stage Prediction of Sudden Cardiac Death

Authors: Reeta Devi, Hitender Kumar Tyagi, Dinesh Kumar

Abstract:

In present scenario, cardiovascular problems are growing challenge for researchers and physiologists. As heart disease have no geographic, gender or socioeconomic specific reasons; detecting cardiac irregularities at early stage followed by quick and correct treatment is very important. Electrocardiogram is the finest tool for continuous monitoring of heart activity. Heart rate variability (HRV) is used to measure naturally occurring oscillations between consecutive cardiac cycles. Analysis of this variability is carried out using time domain, frequency domain and non-linear parameters. This paper presents HRV analysis of the online dataset for normal sinus rhythm (taken as healthy subject) and sudden cardiac death (SCD subject) using all three methods computing values for parameters like standard deviation of node to node intervals (SDNN), square root of mean of the sequences of difference between adjacent RR intervals (RMSSD), mean of R to R intervals (mean RR) in time domain, very low-frequency (VLF), low-frequency (LF), high frequency (HF) and ratio of low to high frequency (LF/HF ratio) in frequency domain and Poincare plot for non linear analysis. To differentiate HRV of healthy subject from subject died with SCD, k –nearest neighbor (k-NN) classifier has been used because of its high accuracy. Results show highly reduced values for all stated parameters for SCD subjects as compared to healthy ones. As the dataset used for SCD patients is recording of their ECG signal one hour prior to their death, it is therefore, verified with an accuracy of 95% that proposed algorithm can identify mortality risk of a patient one hour before its death. The identification of a patient’s mortality risk at such an early stage may prevent him/her meeting sudden death if in-time and right treatment is given by the doctor.

Keywords: early stage prediction, heart rate variability, linear and non-linear analysis, sudden cardiac death

Procedia PDF Downloads 339
1073 Disaster Management in Indonesia: A Study on Indonesian Law No. 24 Year 2007

Authors: Eva Fadhilah, Ummi Sholihah Pertiwi Abidin

Abstract:

One common problem in Indonesia is a matter of disaster and its management. Therefore, Indonesia is recognized as ones of disaster-prone nations. The serious problem of a high number of disasters and victims in Indonesia is the lack of attention from various parties related to aid which is given to victims in the evacuation areas. In Indonesia, it is estimated that 25 percents of disaster victims are fertile women, 4 percents of them are pregnants, and 15-20 percents among them encountered complication of pregnancy. Unfortunately, disaster management is frequently viewed as ethnicity, so that, the way to treat them is also done in the same way either to treat men or women, toddler or adult, young or aged. This matter then caused the imbalance in helping distribution which caused an inappropriateness towards help distribution. Whereas if we look in depth, the needs of every human are totally different. Sometimes susceptible groups such as women need to gain priority help compared with man. This is caused such as in the certain times that women could be in menstruation period, pregnancy, suckling period which never be experienced by men. This paper aims to study Indonesian Law No. 24 Year 2007 about Disaster management. This study was done by qualitative study which emphasizes on literature study to discuss the study.

Keywords: disaster management, Indonesian law, disaster victims’ needs, women’s needs

Procedia PDF Downloads 349
1072 Effective Survey Designing for Conducting Opinion Survey to Follow Participatory Approach in a Study of Transport Infrastructure Projects: A Case Study of the City of Kolkata

Authors: Jayanti De

Abstract:

Users of any urban road infrastructure may be classified into various categories. The current paper intends to see whether the opinions on different environmental and transportation criteria vary significantly among different types of road users or not. The paper addresses this issue by using a unique survey data that has been collected from Kolkata, a highly populated city in India. Multiple criteria should be taken into account while planning on infrastructure development programs. Given limited resources, a welfare maximizing government typically resorts to public opinion by designing surveys for prioritization of one project over another. Designing such surveys can be challenging and costly. Deciding upon whom to include in a survey and how to represent each group of consumers/road-users depend crucially on how opinion for different criteria vary across consumer groups. A unique dataset has been collected from various parts of Kolkata to statistically test (using Kolmogorov-Smirnov test) whether assigning of weights to rank the transportation criteria like congestion, air pollution, noise pollution, and morning/evening delay vary significantly across the various groups of users of such infrastructure. The different consumer/user groups in the dataset include pedestrian, private car owner, para-transit (taxi /auto rickshaw) user, public transport (bus) user and freight transporter among others. Very little evidence has been found that ranking of different criteria among these groups vary significantly. This also supports the hypothesis that road- users/consumers form their opinion by using their long-run rather than immediate experience. As a policy prescription, this implies that under-representation or over-representation of a specific consumer group in a survey may not necessarily distort the overall opinion, since opinions across different consumer groups are highly correlated as evident from this particular case study.

Keywords: multi criteria analysis, project-prioritization, road- users, survey designing

Procedia PDF Downloads 280
1071 Comparison Of Virtual Non-Contrast To True Non-Contrast Images Using Dual Layer Spectral Computed Tomography

Authors: O’Day Luke

Abstract:

Purpose: To validate virtual non-contrast reconstructions generated from dual-layer spectral computed tomography (DL-CT) data as an alternative for the acquisition of a dedicated true non-contrast dataset during multiphase contrast studies. Material and methods: Thirty-three patients underwent a routine multiphase clinical CT examination, using Dual-Layer Spectral CT, from March to August 2021. True non-contrast (TNC) and virtual non-contrast (VNC) datasets, generated from both portal venous and arterial phase imaging were evaluated. For every patient in both true and virtual non-contrast datasets, a region-of-interest (ROI) was defined in aorta, liver, fluid (i.e. gallbladder, urinary bladder), kidney, muscle, fat and spongious bone, resulting in 693 ROIs. Differences in attenuation for VNC and TNV images were compared, both separately and combined. Consistency between VNC reconstructions obtained from the arterial and portal venous phase was evaluated. Results: Comparison of CT density (HU) on the VNC and TNC images showed a high correlation. The mean difference between TNC and VNC images (excluding bone results) was 5.5 ± 9.1 HU and > 90% of all comparisons showed a difference of less than 15 HU. For all tissues but spongious bone, the mean absolute difference between TNC and VNC images was below 10 HU. VNC images derived from the arterial and the portal venous phase showed a good correlation in most tissue types. The aortic attenuation was somewhat dependent however on which dataset was used for reconstruction. Bone evaluation with VNC datasets continues to be a problem, as spectral CT algorithms are currently poor in differentiating bone and iodine. Conclusion: Given the increasing availability of DL-CT and proven accuracy of virtual non-contrast processing, VNC is a promising tool for generating additional data during routine contrast-enhanced studies. This study shows the utility of virtual non-contrast scans as an alternative for true non-contrast studies during multiphase CT, with potential for dose reduction, without loss of diagnostic information.

Keywords: dual-layer spectral computed tomography, virtual non-contrast, true non-contrast, clinical comparison

Procedia PDF Downloads 141
1070 Spatial Distribution of Socio-Economic Factors in Kogi State, Nigeria: Development Issues and Implication(s)

Authors: Yahya A. Sadiq, Grace F. Balogun, Olufemi J. Anjorin

Abstract:

This study analyzed the spatial distribution of socio-economic factors in Kogi state with a view to examining its implications on the development of the state. Consequently, questionnaires were administered on both the selected individual respondents (784) in the state and on the administrative offices (local council offices, 21) to solicit relevant information on the spatial distribution of socio-economic factors in their areas. The collected data were tabulated and analyzed using percentages. The study revealed commerce/trade, education, and health care, etc. as the major socio-economic factors in the state but with marked variation/imbalance in their spatial distribution across the study area. The rural-based local government areas have far less of such important facilities. Conclusively, it was recommended that there is need for socio-economic transformation of living conditions of people in the study area especially by positively redistributing local political power and the resources that are abound in the state will be felt by everybody including the commoners.

Keywords: development, local government areas (LGAs), spatial distribution, socio-economic factors

Procedia PDF Downloads 409
1069 Off-Line Text-Independent Arabic Writer Identification Using Optimum Codebooks

Authors: Ahmed Abdullah Ahmed

Abstract:

The task of recognizing the writer of a handwritten text has been an attractive research problem in the document analysis and recognition community with applications in handwriting forensics, paleography, document examination and handwriting recognition. This research presents an automatic method for writer recognition from digitized images of unconstrained writings. Although a great effort has been made by previous studies to come out with various methods, their performances, especially in terms of accuracy, are fallen short, and room for improvements is still wide open. The proposed technique employs optimal codebook based writer characterization where each writing sample is represented by a set of features computed from two codebooks, beginning and ending. Unlike most of the classical codebook based approaches which segment the writing into graphemes, this study is based on fragmenting a particular area of writing which are beginning and ending strokes. The proposed method starting with contour detection to extract significant information from the handwriting and the curve fragmentation is then employed to categorize the handwriting into Beginning and Ending zones into small fragments. The similar fragments of beginning strokes are grouped together to create Beginning cluster, and similarly, the ending strokes are grouped to create the ending cluster. These two clusters lead to the development of two codebooks (beginning and ending) by choosing the center of every similar fragments group. Writings under study are then represented by computing the probability of occurrence of codebook patterns. The probability distribution is used to characterize each writer. Two writings are then compared by computing distances between their respective probability distribution. The evaluations carried out on ICFHR standard dataset of 206 writers using Beginning and Ending codebooks separately. Finally, the Ending codebook achieved the highest identification rate of 98.23%, which is the best result so far on ICFHR dataset.

Keywords: off-line text-independent writer identification, feature extraction, codebook, fragments

Procedia PDF Downloads 512
1068 Census and Mapping of Oil Palms Over Satellite Dataset Using Deep Learning Model

Authors: Gholba Niranjan Dilip, Anil Kumar

Abstract:

Conduct of accurate reliable mapping of oil palm plantations and census of individual palm trees is a huge challenge. This study addresses this challenge and developed an optimized solution implemented deep learning techniques on remote sensing data. The oil palm is a very important tropical crop. To improve its productivity and land management, it is imperative to have accurate census over large areas. Since, manual census is costly and prone to approximations, a methodology for automated census using panchromatic images from Cartosat-2, SkySat and World View-3 satellites is demonstrated. It is selected two different study sites in Indonesia. The customized set of training data and ground-truth data are created for this study from Cartosat-2 images. The pre-trained model of Single Shot MultiBox Detector (SSD) Lite MobileNet V2 Convolutional Neural Network (CNN) from the TensorFlow Object Detection API is subjected to transfer learning on this customized dataset. The SSD model is able to generate the bounding boxes for each oil palm and also do the counting of palms with good accuracy on the panchromatic images. The detection yielded an F-Score of 83.16 % on seven different images. The detections are buffered and dissolved to generate polygons demarcating the boundaries of the oil palm plantations. This provided the area under the plantations and also gave maps of their location, thereby completing the automated census, with a fairly high accuracy (≈100%). The trained CNN was found competent enough to detect oil palm crowns from images obtained from multiple satellite sensors and of varying temporal vintage. It helped to estimate the increase in oil palm plantations from 2014 to 2021 in the study area. The study proved that high-resolution panchromatic satellite image can successfully be used to undertake census of oil palm plantations using CNNs.

Keywords: object detection, oil palm tree census, panchromatic images, single shot multibox detector

Procedia PDF Downloads 160
1067 Integrating Radar Sensors with an Autonomous Vehicle Simulator for an Enhanced Smart Parking Management System

Authors: Mohamed Gazzeh, Bradley Null, Fethi Tlili, Hichem Besbes

Abstract:

The burgeoning global ownership of personal vehicles has posed a significant strain on urban infrastructure, notably parking facilities, leading to traffic congestion and environmental concerns. Effective parking management systems (PMS) are indispensable for optimizing urban traffic flow and reducing emissions. The most commonly deployed systems nowadays rely on computer vision technology. This paper explores the integration of radar sensors and simulation in the context of smart parking management. We concentrate on radar sensors due to their versatility and utility in automotive applications, which extends to PMS. Additionally, radar sensors play a crucial role in driver assistance systems and autonomous vehicle development. However, the resource-intensive nature of radar data collection for algorithm development and testing necessitates innovative solutions. Simulation, particularly the monoDrive simulator, an internal development tool used by NI the Test and Measurement division of Emerson, offers a practical means to overcome this challenge. The primary objectives of this study encompass simulating radar sensors to generate a substantial dataset for algorithm development, testing, and, critically, assessing the transferability of models between simulated and real radar data. We focus on occupancy detection in parking as a practical use case, categorizing each parking space as vacant or occupied. The simulation approach using monoDrive enables algorithm validation and reliability assessment for virtual radar sensors. It meticulously designed various parking scenarios, involving manual measurements of parking spot coordinates, orientations, and the utilization of TI AWR1843 radar. To create a diverse dataset, we generated 4950 scenarios, comprising a total of 455,400 parking spots. This extensive dataset encompasses radar configuration details, ground truth occupancy information, radar detections, and associated object attributes such as range, azimuth, elevation, radar cross-section, and velocity data. The paper also addresses the intricacies and challenges of real-world radar data collection, highlighting the advantages of simulation in producing radar data for parking lot applications. We developed classification models based on Support Vector Machines (SVM) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), exclusively trained and evaluated on simulated data. Subsequently, we applied these models to real-world data, comparing their performance against the monoDrive dataset. The study demonstrates the feasibility of transferring models from a simulated environment to real-world applications, achieving an impressive accuracy score of 92% using only one radar sensor. This finding underscores the potential of radar sensors and simulation in the development of smart parking management systems, offering significant benefits for improving urban mobility and reducing environmental impact. The integration of radar sensors and simulation represents a promising avenue for enhancing smart parking management systems, addressing the challenges posed by the exponential growth in personal vehicle ownership. This research contributes valuable insights into the practicality of using simulated radar data in real-world applications and underscores the role of radar technology in advancing urban sustainability.

Keywords: autonomous vehicle simulator, FMCW radar sensors, occupancy detection, smart parking management, transferability of models

Procedia PDF Downloads 81
1066 Love and Money: Societal Attitudes Toward Income Disparities in Age-Gap Relationships

Authors: Victoria Scarratt

Abstract:

Couples involved in age-gap relationships generally evoke negative stereotypes, opinions, and social disapproval. This research seeks to examine whether financial disparities in age-discrepant relationships cause negative attitudes in study participants. It was hypothesized that an age-gap couple (29 year difference) would receive a greater degree of societal disapproval when the couple also had a large salary gap compared to a similarly aged couple (1 year difference) with a salary gap. Additionally, there would be no significant difference between age-gap couples without a salary-gap compared to a similarly aged couple without a salary gap. To test the hypothesis, participants were given one of four scenarios regarding a couple in a romantic relationship.Then they were asked to respond to nine Likert scale questions. Results indicated that participants perceived age-gap relationships with a salary disparity to be less equitable in regard to a power imbalance between the couple and the financial and general gain that one partner will receive. A significant interaction was also detected for evoking feelings of disgust in participants and how morally correct it is for the couple to continue their relationship.

Keywords: age gap relationships, love, financial disparities, societal stigmas, relationship dynamics

Procedia PDF Downloads 113
1065 Personalizing Human Physical Life Routines Recognition over Cloud-based Sensor Data via AI and Machine Learning

Authors: Kaushik Sathupadi, Sandesh Achar

Abstract:

Pervasive computing is a growing research field that aims to acknowledge human physical life routines (HPLR) based on body-worn sensors such as MEMS sensors-based technologies. The use of these technologies for human activity recognition is progressively increasing. On the other hand, personalizing human life routines using numerous machine-learning techniques has always been an intriguing topic. In contrast, various methods have demonstrated the ability to recognize basic movement patterns. However, it still needs to be improved to anticipate the dynamics of human living patterns. This study introduces state-of-the-art techniques for recognizing static and dy-namic patterns and forecasting those challenging activities from multi-fused sensors. Further-more, numerous MEMS signals are extracted from one self-annotated IM-WSHA dataset and two benchmarked datasets. First, we acquired raw data is filtered with z-normalization and denoiser methods. Then, we adopted statistical, local binary pattern, auto-regressive model, and intrinsic time scale decomposition major features for feature extraction from different domains. Next, the acquired features are optimized using maximum relevance and minimum redundancy (mRMR). Finally, the artificial neural network is applied to analyze the whole system's performance. As a result, we attained a 90.27% recognition rate for the self-annotated dataset, while the HARTH and KU-HAR achieved 83% on nine living activities and 90.94% on 18 static and dynamic routines. Thus, the proposed HPLR system outperformed other state-of-the-art systems when evaluated with other methods in the literature.

Keywords: artificial intelligence, machine learning, gait analysis, local binary pattern (LBP), statistical features, micro-electro-mechanical systems (MEMS), maximum relevance and minimum re-dundancy (MRMR)

Procedia PDF Downloads 20
1064 PsyVBot: Chatbot for Accurate Depression Diagnosis using Long Short-Term Memory and NLP

Authors: Thaveesha Dheerasekera, Dileeka Sandamali Alwis

Abstract:

The escalating prevalence of mental health issues, such as depression and suicidal ideation, is a matter of significant global concern. It is plausible that a variety of factors, such as life events, social isolation, and preexisting physiological or psychological health conditions, could instigate or exacerbate these conditions. Traditional approaches to diagnosing depression entail a considerable amount of time and necessitate the involvement of adept practitioners. This underscores the necessity for automated systems capable of promptly detecting and diagnosing symptoms of depression. The PsyVBot system employs sophisticated natural language processing and machine learning methodologies, including the use of the NLTK toolkit for dataset preprocessing and the utilization of a Long Short-Term Memory (LSTM) model. The PsyVBot exhibits a remarkable ability to diagnose depression with a 94% accuracy rate through the analysis of user input. Consequently, this resource proves to be efficacious for individuals, particularly those enrolled in academic institutions, who may encounter challenges pertaining to their psychological well-being. The PsyVBot employs a Long Short-Term Memory (LSTM) model that comprises a total of three layers, namely an embedding layer, an LSTM layer, and a dense layer. The stratification of these layers facilitates a precise examination of linguistic patterns that are associated with the condition of depression. The PsyVBot has the capability to accurately assess an individual's level of depression through the identification of linguistic and contextual cues. The task is achieved via a rigorous training regimen, which is executed by utilizing a dataset comprising information sourced from the subreddit r/SuicideWatch. The diverse data present in the dataset ensures precise and delicate identification of symptoms linked with depression, thereby guaranteeing accuracy. PsyVBot not only possesses diagnostic capabilities but also enhances the user experience through the utilization of audio outputs. This feature enables users to engage in more captivating and interactive interactions. The PsyVBot platform offers individuals the opportunity to conveniently diagnose mental health challenges through a confidential and user-friendly interface. Regarding the advancement of PsyVBot, maintaining user confidentiality and upholding ethical principles are of paramount significance. It is imperative to note that diligent efforts are undertaken to adhere to ethical standards, thereby safeguarding the confidentiality of user information and ensuring its security. Moreover, the chatbot fosters a conducive atmosphere that is supportive and compassionate, thereby promoting psychological welfare. In brief, PsyVBot is an automated conversational agent that utilizes an LSTM model to assess the level of depression in accordance with the input provided by the user. The demonstrated accuracy rate of 94% serves as a promising indication of the potential efficacy of employing natural language processing and machine learning techniques in tackling challenges associated with mental health. The reliability of PsyVBot is further improved by the fact that it makes use of the Reddit dataset and incorporates Natural Language Toolkit (NLTK) for preprocessing. PsyVBot represents a pioneering and user-centric solution that furnishes an easily accessible and confidential medium for seeking assistance. The present platform is offered as a modality to tackle the pervasive issue of depression and the contemplation of suicide.

Keywords: chatbot, depression diagnosis, LSTM model, natural language process

Procedia PDF Downloads 68
1063 Interfacing Photovoltaic Systems to the Utility Grid: A Comparative Simulation Study to Mitigate the Impact of Unbalanced Voltage Dips

Authors: Badr M. Alshammari, A. Rabeh, A. K. Mohamed

Abstract:

This paper presents the modeling and the control of a grid-connected photovoltaic system (PVS). Firstly, the MPPT control of the PVS and its associated DC/DC converter has been analyzed in order to extract the maximum of available power. Secondly, the control system of the grid side converter (GSC) which is a three-phase voltage source inverter (VSI) has been presented. A special attention has been paid to the control algorithms of the GSC converter during grid voltages imbalances. Especially, three different control objectives are to achieve; the mitigation of the grid imbalance adverse effects, at the point of common coupling (PCC), on the injected currents, the elimination of double frequency oscillations in active power flow, and the elimination of double frequency oscillations in reactive power flow. Simulation results of two control strategies have been performed via MATLAB software in order to demonstrate the particularities of each control strategy according to power quality standards.

Keywords: renewable energies, photovoltaic systems, dc link, voltage source inverter, space vector SVPWM, unbalanced voltage dips, symmetrical components

Procedia PDF Downloads 377
1062 Protective Role of Peroxiredoxin V against Ischemia/Reperfusion-Induced Acute Kidney Injury in Mice

Authors: Eun Gyeong Lee, Ji Young Park, Hyun Ae Woo

Abstract:

Reactive oxygen species (ROS) production is involved in ischemia/reperfusion (I/R) injury in kidney of mice. Oxidative stress develops from an imbalance between ROS production and reduced antioxidant defenses. Many enzymatic and nonenzymatic antioxidant systems including peroxiredoxins (Prxs) are present in kidney to maintain an appropriate level of ROS and prevent oxidative damage. Prxs are a family of peroxidases that reduce peroxides, with a conserved cysteine residue serving as the site of oxidation by peroxides. In this study, we examined the protective role of Prx V against I/R-induced acute kidney injury (AKI) using Prx V wild type (WT) and knockout (KO) mice. We compared the response of Prx V WT and KO mice in mice model of I/R injury. Renal structure, functions, oxidative stress markers, protein levels of oxidative damage marker were worse in Prx V KO mice. Ablation of Prx V enhanced susceptibility to I/R-induced oxidative stress. Prx V KO mice were seen to have more severe renal damage than Prx V WT mice in mice model of I/R injury. Our results demonstrate that Prx V is protective against I/R-induced AKI.

Keywords: peroxiredoxin, ischemia/reperfusion, kidney, oxidative stress

Procedia PDF Downloads 386
1061 Effect of Playing Football or Body Building on Measurements of Forward Head Posture

Authors: Mohamed Gomaa Mohamed

Abstract:

Type of study: Observational cross section study. Background and purpose: Forward head posture (FHP) is a common sagittal faulty posture with anterior head translation relative to vertical posture line. FHP related to temporomandibular joint dysfunctions, neck pain and headache. Sports persons usually overuse one side of the body in training and playing leading to postural imbalance, yet the effect of playing football or bodybuilding on measurements of FHP has never been studied. Participants: Thirty six subjects divided into 3 groups of 12 football players, 12 body builders and 12 students. Method: FHP severity was assessed by measuring the craniovertebral (CVA) and gaze angles, using the photogrammetric method. Photos were taken from right side of subjects while assuming standing position. Analysis of variance was used to assess angles difference between the three groups. Results: No significant differences were found in CVA and gaze angles between the three groups (P > 0.05). Conclusion: Playing football or body building doesn't impose significant FHP.

Keywords: craniovertebral angle, gaze angle, football, body building

Procedia PDF Downloads 416
1060 Evaluation of Video Quality Metrics and Performance Comparison on Contents Taken from Most Commonly Used Devices

Authors: Pratik Dhabal Deo, Manoj P.

Abstract:

With the increasing number of social media users, the amount of video content available has also significantly increased. Currently, the number of smartphone users is at its peak, and many are increasingly using their smartphones as their main photography and recording devices. There have been a lot of developments in the field of Video Quality Assessment (VQA) and metrics like VMAF, SSIM etc. are said to be some of the best performing metrics, but the evaluation of these metrics is dominantly done on professionally taken video contents using professional tools, lighting conditions etc. No study particularly pinpointing the performance of the metrics on the contents taken by users on very commonly available devices has been done. Datasets that contain a huge number of videos from different high-end devices make it difficult to analyze the performance of the metrics on the content from most used devices even if they contain contents taken in poor lighting conditions using lower-end devices. These devices face a lot of distortions due to various factors since the spectrum of contents recorded on these devices is huge. In this paper, we have presented an analysis of the objective VQA metrics on contents taken only from most used devices and their performance on them, focusing on full-reference metrics. To carry out this research, we created a custom dataset containing a total of 90 videos that have been taken from three most commonly used devices, and android smartphone, an IOS smartphone and a DSLR. On the videos taken on each of these devices, the six most common types of distortions that users face have been applied on addition to already existing H.264 compression based on four reference videos. These six applied distortions have three levels of degradation each. A total of the five most popular VQA metrics have been evaluated on this dataset and the highest values and the lowest values of each of the metrics on the distortions have been recorded. Finally, it is found that blur is the artifact on which most of the metrics didn’t perform well. Thus, in order to understand the results better the amount of blur in the data set has been calculated and an additional evaluation of the metrics was done using HEVC codec, which is the next version of H.264 compression, on the camera that proved to be the sharpest among the devices. The results have shown that as the resolution increases, the performance of the metrics tends to become more accurate and the best performing metric among them is VQM with very few inconsistencies and inaccurate results when the compression applied is H.264, but when the compression is applied is HEVC, SSIM and VMAF have performed significantly better.

Keywords: distortion, metrics, performance, resolution, video quality assessment

Procedia PDF Downloads 203
1059 Functional Compounds Activity of Analog Rice Based on Purple Yam and Bran as Alternative Food for People with Diabetes Mellitus Type II

Authors: A. Iqbal Banauaji, Muchamad Sholikun

Abstract:

Diabetes mellitus (DM) is a metabolism disorder that tends to increase its prevalence in the world, including in Indonesia. The development of DM type 2 can cause oxidative stress characterized by an imbalance between oxidants and antioxidants in the body Increased oxidative stress causes type 2 diabetes mellitus to require intake of exogenous antioxidants in large quantities to inhibit oxidative damage in the body. Bran can be defined as a functional food because it consists of 11.39% fiberand 28.7% antioxidants and the purple yam consists of anthocyanin which functions as an antioxidant. With abundant amount and low price, purple yam and bran can be used for analog rice as the effort to diversify functional food. The antioxidant’s activity of analog rice from purple yam and bran which is measured by using DPPH’s method is 12,963%. The rough fiber’s level on the analog rice from purple yam is 2.985%. The water amount of analog rice from purple yam and bran is 8.726%. Analog rice from purple yam and bran has the similar texture as the usual rice, tasted slightly sweet, light purple colored, and smelled like bran.

Keywords: antioxidant, analog rice, functional food, diabetes mellitus

Procedia PDF Downloads 193
1058 Emotion Recognition in Video and Images in the Wild

Authors: Faizan Tariq, Moayid Ali Zaidi

Abstract:

Facial emotion recognition algorithms are expanding rapidly now a day. People are using different algorithms with different combinations to generate best results. There are six basic emotions which are being studied in this area. Author tried to recognize the facial expressions using object detector algorithms instead of traditional algorithms. Two object detection algorithms were chosen which are Faster R-CNN and YOLO. For pre-processing we used image rotation and batch normalization. The dataset I have chosen for the experiments is Static Facial Expression in Wild (SFEW). Our approach worked well but there is still a lot of room to improve it, which will be a future direction.

Keywords: face recognition, emotion recognition, deep learning, CNN

Procedia PDF Downloads 187
1057 Study and Analysis of the Factors Affecting Road Safety Using Decision Tree Algorithms

Authors: Naina Mahajan, Bikram Pal Kaur

Abstract:

The purpose of traffic accident analysis is to find the possible causes of an accident. Road accidents cannot be totally prevented but by suitable traffic engineering and management the accident rate can be reduced to a certain extent. This paper discusses the classification techniques C4.5 and ID3 using the WEKA Data mining tool. These techniques use on the NH (National highway) dataset. With the C4.5 and ID3 technique it gives best results and high accuracy with less computation time and error rate.

Keywords: C4.5, ID3, NH(National highway), WEKA data mining tool

Procedia PDF Downloads 338
1056 Cascaded Multi-Level Single-Phase Switched Boost Inverter

Authors: Van-Thuan Tran, Minh-Khai Nguyen, Geum-Bae Cho

Abstract:

Recently, multilevel inverters have become more attractive for researchers due to low total harmonic distortion (THD) in the output voltage and low electromagnetic interference (EMI). This paper proposes a single-phase cascaded H-bridge quasi switched boost inverter (CHB-qSBI) for renewable energy sources applications. The proposed inverter has the advantage over the cascaded H-bridge quasi-Z-source inverter (CHB-qZSI) in reducing two capacitors and two inductors. As a result, cost, weight, and size are reduced. Furthermore, the dc-link voltage of each module is controlled by individual shoot-through duty cycle to get the same values. Therefore, the proposed inverter solves the imbalance problem of dc-link voltage in traditional CHB inverter. This paper shows the operating principles and analysis of the single-phase cascaded H-bridge quasi switched boost inverter. Also, a control strategy for the proposed inverter is shown. Experimental and simulation results are shown to verify the operating principle of the proposed inverter.

Keywords: renewable energy sources, cascaded h-bridge inverter, quasi switched boost inverter, quasi z-source inverter, multilevel inverter

Procedia PDF Downloads 334
1055 Analysis of Citation Rate and Data Reuse for Openly Accessible Biodiversity Datasets on Global Biodiversity Information Facility

Authors: Nushrat Khan, Mike Thelwall, Kayvan Kousha

Abstract:

Making research data openly accessible has been mandated by most funders over the last 5 years as it promotes reproducibility in science and reduces duplication of effort to collect the same data. There are evidence that articles that publicly share research data have higher citation rates in biological and social sciences. However, how and whether shared data is being reused is not always intuitive as such information is not easily accessible from the majority of research data repositories. This study aims to understand the practice of data citation and how data is being reused over the years focusing on biodiversity since research data is frequently reused in this field. Metadata of 38,878 datasets including citation counts were collected through the Global Biodiversity Information Facility (GBIF) API for this purpose. GBIF was used as a data source since it provides citation count for datasets, not a commonly available feature for most repositories. Analysis of dataset types, citation counts, creation and update time of datasets suggests that citation rate varies for different types of datasets, where occurrence datasets that have more granular information have higher citation rates than checklist and metadata-only datasets. Another finding is that biodiversity datasets on GBIF are frequently updated, which is unique to this field. Majority of the datasets from the earliest year of 2007 were updated after 11 years, with no dataset that was not updated since creation. For each year between 2007 and 2017, we compared the correlations between update time and citation rate of four different types of datasets. While recent datasets do not show any correlations, 3 to 4 years old datasets show weak correlation where datasets that were updated more recently received high citations. The results are suggestive that it takes several years to cumulate citations for research datasets. However, this investigation found that when searched on Google Scholar or Scopus databases for the same datasets, the number of citations is often not the same as GBIF. Hence future aim is to further explore the citation count system adopted by GBIF to evaluate its reliability and whether it can be applicable to other fields of studies as well.

Keywords: data citation, data reuse, research data sharing, webometrics

Procedia PDF Downloads 178
1054 Statistical Models and Time Series Forecasting on Crime Data in Nepal

Authors: Dila Ram Bhandari

Abstract:

Throughout the 20th century, new governments were created where identities such as ethnic, religious, linguistic, caste, communal, tribal, and others played a part in the development of constitutions and the legal system of victim and criminal justice. Acute issues with extremism, poverty, environmental degradation, cybercrimes, human rights violations, crime against, and victimization of both individuals and groups have recently plagued South Asian nations. Everyday massive number of crimes are steadfast, these frequent crimes have made the lives of common citizens restless. Crimes are one of the major threats to society and also for civilization. Crime is a bone of contention that can create a societal disturbance. The old-style crime solving practices are unable to live up to the requirement of existing crime situations. Crime analysis is one of the most important activities of the majority of intelligent and law enforcement organizations all over the world. The South Asia region lacks such a regional coordination mechanism, unlike central Asia of Asia Pacific regions, to facilitate criminal intelligence sharing and operational coordination related to organized crime, including illicit drug trafficking and money laundering. There have been numerous conversations in recent years about using data mining technology to combat crime and terrorism. The Data Detective program from Sentient as a software company, uses data mining techniques to support the police (Sentient, 2017). The goals of this internship are to test out several predictive model solutions and choose the most effective and promising one. First, extensive literature reviews on data mining, crime analysis, and crime data mining were conducted. Sentient offered a 7-year archive of crime statistics that were daily aggregated to produce a univariate dataset. Moreover, a daily incidence type aggregation was performed to produce a multivariate dataset. Each solution's forecast period lasted seven days. Statistical models and neural network models were the two main groups into which the experiments were split. For the crime data, neural networks fared better than statistical models. This study gives a general review of the applied statistics and neural network models. A detailed image of each model's performance on the available data and generalizability is provided by a comparative analysis of all the models on a comparable dataset. Obviously, the studies demonstrated that, in comparison to other models, Gated Recurrent Units (GRU) produced greater prediction. The crime records of 2005-2019 which was collected from Nepal Police headquarter and analysed by R programming. In conclusion, gated recurrent unit implementation could give benefit to police in predicting crime. Hence, time series analysis using GRU could be a prospective additional feature in Data Detective.

Keywords: time series analysis, forecasting, ARIMA, machine learning

Procedia PDF Downloads 164
1053 Physics Informed Deep Residual Networks Based Type-A Aortic Dissection Prediction

Authors: Joy Cao, Min Zhou

Abstract:

Purpose: Acute Type A aortic dissection is a well-known cause of extremely high mortality rate. A highly accurate and cost-effective non-invasive predictor is critically needed so that the patient can be treated at earlier stage. Although various CFD approaches have been tried to establish some prediction frameworks, they are sensitive to uncertainty in both image segmentation and boundary conditions. Tedious pre-processing and demanding calibration procedures requirement further compound the issue, thus hampering their clinical applicability. Using the latest physics informed deep learning methods to establish an accurate and cost-effective predictor framework are amongst the main goals for a better Type A aortic dissection treatment. Methods: Via training a novel physics-informed deep residual network, with non-invasive 4D MRI displacement vectors as inputs, the trained model can cost-effectively calculate all these biomarkers: aortic blood pressure, WSS, and OSI, which are used to predict potential type A aortic dissection to avoid the high mortality events down the road. Results: The proposed deep learning method has been successfully trained and tested with both synthetic 3D aneurysm dataset and a clinical dataset in the aortic dissection context using Google colab environment. In both cases, the model has generated aortic blood pressure, WSS, and OSI results matching the expected patient’s health status. Conclusion: The proposed novel physics-informed deep residual network shows great potential to create a cost-effective, non-invasive predictor framework. Additional physics-based de-noising algorithm will be added to make the model more robust to clinical data noises. Further studies will be conducted in collaboration with big institutions such as Cleveland Clinic with more clinical samples to further improve the model’s clinical applicability.

Keywords: type-a aortic dissection, deep residual networks, blood flow modeling, data-driven modeling, non-invasive diagnostics, deep learning, artificial intelligence.

Procedia PDF Downloads 89
1052 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 296
1051 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 629
1050 Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcome

Authors: Yijun Shao, Yan Cheng, Rashmee U. Shah, Charlene R. Weir, Bruce E. Bray, Qing Zeng-Treitler

Abstract:

Deep neural network (DNN) models are being explored in the clinical domain, following the recent success in other domains such as image recognition. For clinical adoption, outcome prediction models require explanation, but due to the multiple non-linear inner transformations, DNN models are viewed by many as a black box. In this study, we developed a deep neural network model for predicting 1-year mortality of patients who underwent major cardio vascular procedures (MCVPs), using temporal image representation of past medical history as input. The dataset was obtained from the electronic medical data warehouse administered by Veteran Affairs Information and Computing Infrastructure (VINCI). We identified 21,355 veterans who had their first MCVP in 2014. Features for prediction included demographics, diagnoses, procedures, medication orders, hospitalizations, and frailty measures extracted from clinical notes. Temporal variables were created based on the patient history data in the 2-year window prior to the index MCVP. A temporal image was created based on these variables for each individual patient. To generate the explanation for the DNN model, we defined a new concept called impact score, based on the presence/value of clinical conditions’ impact on the predicted outcome. Like (log) odds ratio reported by the logistic regression (LR) model, impact scores are continuous variables intended to shed light on the black box model. For comparison, a logistic regression model was fitted on the same dataset. In our cohort, about 6.8% of patients died within one year. The prediction of the DNN model achieved an area under the curve (AUC) of 78.5% while the LR model achieved an AUC of 74.6%. A strong but not perfect correlation was found between the aggregated impact scores and the log odds ratios (Spearman’s rho = 0.74), which helped validate our explanation.

Keywords: deep neural network, temporal data, prediction, frailty, logistic regression model

Procedia PDF Downloads 153