Search results for: KaraAgroAI cocoa dataset
924 Automatic Near-Infrared Image Colorization Using Synthetic Images
Authors: Yoganathan Karthik, Guhanathan Poravi
Abstract:
Colorizing near-infrared (NIR) images poses unique challenges due to the absence of color information and the nuances in light absorption. In this paper, we present an approach to NIR image colorization utilizing a synthetic dataset generated from visible light images. Our method addresses two major challenges encountered in NIR image colorization: accurately colorizing objects with color variations and avoiding over/under saturation in dimly lit scenes. To tackle these challenges, we propose a Generative Adversarial Network (GAN)-based framework that learns to map NIR images to their corresponding colorized versions. The synthetic dataset ensures diverse color representations, enabling the model to effectively handle objects with varying hues and shades. Furthermore, the GAN architecture facilitates the generation of realistic colorizations while preserving the integrity of dimly lit scenes, thus mitigating issues related to over/under saturation. Experimental results on benchmark NIR image datasets demonstrate the efficacy of our approach in producing high-quality colorizations with improved color accuracy and naturalness. Quantitative evaluations and comparative studies validate the superiority of our method over existing techniques, showcasing its robustness and generalization capability across diverse NIR image scenarios. Our research not only contributes to advancing NIR image colorization but also underscores the importance of synthetic datasets and GANs in addressing domain-specific challenges in image processing tasks. The proposed framework holds promise for various applications in remote sensing, medical imaging, and surveillance where accurate color representation of NIR imagery is crucial for analysis and interpretation.Keywords: computer vision, near-infrared images, automatic image colorization, generative adversarial networks, synthetic data
Procedia PDF Downloads 43923 Machine Learning Techniques for COVID-19 Detection: A Comparative Analysis
Authors: Abeer A. Aljohani
Abstract:
COVID-19 virus spread has been one of the extreme pandemics across the globe. It is also referred to as coronavirus, which is a contagious disease that continuously mutates into numerous variants. Currently, the B.1.1.529 variant labeled as omicron is detected in South Africa. The huge spread of COVID-19 disease has affected several lives and has surged exceptional pressure on the healthcare systems worldwide. Also, everyday life and the global economy have been at stake. This research aims to predict COVID-19 disease in its initial stage to reduce the death count. Machine learning (ML) is nowadays used in almost every area. Numerous COVID-19 cases have produced a huge burden on the hospitals as well as health workers. To reduce this burden, this paper predicts COVID-19 disease is based on the symptoms and medical history of the patient. This research presents a unique architecture for COVID-19 detection using ML techniques integrated with feature dimensionality reduction. This paper uses a standard UCI dataset for predicting COVID-19 disease. This dataset comprises symptoms of 5434 patients. This paper also compares several supervised ML techniques to the presented architecture. The architecture has also utilized 10-fold cross validation process for generalization and the principal component analysis (PCA) technique for feature reduction. Standard parameters are used to evaluate the proposed architecture including F1-Score, precision, accuracy, recall, receiver operating characteristic (ROC), and area under curve (AUC). The results depict that decision tree, random forest, and neural networks outperform all other state-of-the-art ML techniques. This achieved result can help effectively in identifying COVID-19 infection cases.Keywords: supervised machine learning, COVID-19 prediction, healthcare analytics, random forest, neural network
Procedia PDF Downloads 92922 Quality Analysis of Vegetables Through Image Processing
Authors: Abdul Khalique Baloch, Ali Okatan
Abstract:
The quality analysis of food and vegetable from image is hot topic now a day, where researchers make them better then pervious findings through different technique and methods. In this research we have review the literature, and find gape from them, and suggest better proposed approach, design the algorithm, developed a software to measure the quality from images, where accuracy of image show better results, and compare the results with Perouse work done so for. The Application we uses an open-source dataset and python language with tensor flow lite framework. In this research we focus to sort food and vegetable from image, in the images, the application can sorts and make them grading after process the images, it could create less errors them human base sorting errors by manual grading. Digital pictures datasets were created. The collected images arranged by classes. The classification accuracy of the system was about 94%. As fruits and vegetables play main role in day-to-day life, the quality of fruits and vegetables is necessary in evaluating agricultural produce, the customer always buy good quality fruits and vegetables. This document is about quality detection of fruit and vegetables using images. Most of customers suffering due to unhealthy foods and vegetables by suppliers, so there is no proper quality measurement level followed by hotel managements. it have developed software to measure the quality of the fruits and vegetables by using images, it will tell you how is your fruits and vegetables are fresh or rotten. Some algorithms reviewed in this thesis including digital images, ResNet, VGG16, CNN and Transfer Learning grading feature extraction. This application used an open source dataset of images and language used python, and designs a framework of system.Keywords: deep learning, computer vision, image processing, rotten fruit detection, fruits quality criteria, vegetables quality criteria
Procedia PDF Downloads 70921 Machine Learning for Disease Prediction Using Symptoms and X-Ray Images
Authors: Ravija Gunawardana, Banuka Athuraliya
Abstract:
Machine learning has emerged as a powerful tool for disease diagnosis and prediction. The use of machine learning algorithms has the potential to improve the accuracy of disease prediction, thereby enabling medical professionals to provide more effective and personalized treatments. This study focuses on developing a machine-learning model for disease prediction using symptoms and X-ray images. The importance of this study lies in its potential to assist medical professionals in accurately diagnosing diseases, thereby improving patient outcomes. Respiratory diseases are a significant cause of morbidity and mortality worldwide, and chest X-rays are commonly used in the diagnosis of these diseases. However, accurately interpreting X-ray images requires significant expertise and can be time-consuming, making it difficult to diagnose respiratory diseases in a timely manner. By incorporating machine learning algorithms, we can significantly enhance disease prediction accuracy, ultimately leading to better patient care. The study utilized the Mask R-CNN algorithm, which is a state-of-the-art method for object detection and segmentation in images, to process chest X-ray images. The model was trained and tested on a large dataset of patient information, which included both symptom data and X-ray images. The performance of the model was evaluated using a range of metrics, including accuracy, precision, recall, and F1-score. The results showed that the model achieved an accuracy rate of over 90%, indicating that it was able to accurately detect and segment regions of interest in the X-ray images. In addition to X-ray images, the study also incorporated symptoms as input data for disease prediction. The study used three different classifiers, namely Random Forest, K-Nearest Neighbor and Support Vector Machine, to predict diseases based on symptoms. These classifiers were trained and tested using the same dataset of patient information as the X-ray model. The results showed promising accuracy rates for predicting diseases using symptoms, with the ensemble learning techniques significantly improving the accuracy of disease prediction. The study's findings indicate that the use of machine learning algorithms can significantly enhance disease prediction accuracy, ultimately leading to better patient care. The model developed in this study has the potential to assist medical professionals in diagnosing respiratory diseases more accurately and efficiently. However, it is important to note that the accuracy of the model can be affected by several factors, including the quality of the X-ray images, the size of the dataset used for training, and the complexity of the disease being diagnosed. In conclusion, the study demonstrated the potential of machine learning algorithms for disease prediction using symptoms and X-ray images. The use of these algorithms can improve the accuracy of disease diagnosis, ultimately leading to better patient care. Further research is needed to validate the model's accuracy and effectiveness in a clinical setting and to expand its application to other diseases.Keywords: K-nearest neighbor, mask R-CNN, random forest, support vector machine
Procedia PDF Downloads 155920 Female Autism Spectrum Disorder and Understanding Rigid Repetitive Behaviors
Authors: Erin Micali, Katerina Tolstikova, Cheryl Maykel, Elizabeth Harwood
Abstract:
Female ASD is seldomly studied separately from males. Further, females with ASD are disproportionately underrepresented in the research at a rate of 3:1 (male to female). As such, much of the current understanding about female rigid repetitive behaviors (RRBs) stems from research’s understanding of male RRBs. This can be detrimental to understanding female ASD because this understanding of female RRBs largely discounts female camouflaging and the possibility that females present their autistic symptoms differently. Current literature suggests that females with ASD engage in fewer RRBs than males with ASD and when females do engage in RRBs, they are likely to engage in more subtle, less overt obsessions and repetitive behaviors than males. Method: The current study utilized a mixed methods research design to identify the type and frequency of RRBs that females with ASD engaged in by using a cross-sectional design. The researcher recruited only females to be part of the present study with the criteria they be at least age six and not have co-occurring cognitive impairment. Results: The researcher collected previous testing data (Autism Diagnostic Interview-Revised (ADI-R), Child or Adolescent/Adult Sensory Profile-2, Autism/ Empathy Quotient, Yale Brown Obsessive Compulsive Checklist, Rigid Repetitive Behavior Checklist (evaluator created list), and demographic questionnaire) from 25 total participants. The participants ages ranged from six to 52. The participants were 96% Caucasion and 4% Latin American. Qualitative analysis found the current participant pool engaged in six RRB themes including repetitive behaviors, socially restrictive behaviors, repetitive speech, difficulty with transition, obsessive behaviors, and restricted interests. The current dataset engaged in socially restrictive behaviors and restrictive interests most frequently. Within the main themes 40 subthemes were isolated, defined, and analyzed. Further, preliminary quantitative analysis was run to determine if age impacted camouflaging behaviors and overall presentation of RRBs. Within this dataset this was not founded. Further qualitative data will be run to determine if this dataset engaged in more overt or subtle RRBs to confirm or rebuff previous research. The researcher intends to run SPSS analysis to determine if there was statistical difference between each RRB theme and overall presentation. Secondly, each participant will be analyzed for presentation of RRB, age, and previous diagnoses. Conclusion: The present study aimed to assist in diagnostic clarity. This was achieved by collecting data from a female only participant pool across the lifespan. Current data aided in clarity of the type of RRBs engage in. A limited sample size was a barrier in this study.Keywords: autism spectrum disorder, camouflaging, rigid repetitive behaviors, gender disparity
Procedia PDF Downloads 143919 Generating Synthetic Chest X-ray Images for Improved COVID-19 Detection Using Generative Adversarial Networks
Authors: Muneeb Ullah, Daishihan, Xiadong Young
Abstract:
Deep learning plays a crucial role in identifying COVID-19 and preventing its spread. To improve the accuracy of COVID-19 diagnoses, it is important to have access to a sufficient number of training images of CXRs (chest X-rays) depicting the disease. However, there is currently a shortage of such images. To address this issue, this paper introduces COVID-19 GAN, a model that uses generative adversarial networks (GANs) to generate realistic CXR images of COVID-19, which can be used to train identification models. Initially, a generator model is created that uses digressive channels to generate images of CXR scans for COVID-19. To differentiate between real and fake disease images, an efficient discriminator is developed by combining the dense connectivity strategy and instance normalization. This approach makes use of their feature extraction capabilities on CXR hazy areas. Lastly, the deep regret gradient penalty technique is utilized to ensure stable training of the model. With the use of 4,062 grape leaf disease images, the Leaf GAN model successfully produces 8,124 COVID-19 CXR images. The COVID-19 GAN model produces COVID-19 CXR images that outperform DCGAN and WGAN in terms of the Fréchet inception distance. Experimental findings suggest that the COVID-19 GAN-generated CXR images possess noticeable haziness, offering a promising approach to address the limited training data available for COVID-19 model training. When the dataset was expanded, CNN-based classification models outperformed other models, yielding higher accuracy rates than those of the initial dataset and other augmentation techniques. Among these models, ImagNet exhibited the best recognition accuracy of 99.70% on the testing set. These findings suggest that the proposed augmentation method is a solution to address overfitting issues in disease identification and can enhance identification accuracy effectively.Keywords: classification, deep learning, medical images, CXR, GAN.
Procedia PDF Downloads 96918 Influence of HDI in the Spread of RSV Bronchiolitis in Children Aged 0 to 2 Years
Authors: Chloé Kernaléguen, Laura Kundun, Tessie Lery, Ryan Laleg, Zhangyun Tan
Abstract:
This study explores global disparities in respiratory syncytial virus (RSV) bronchiolitis incidence among children aged 0-2 years, focusing on the human development index (HDI) as a key determinant. RSV bronchiolitis poses a significant health risk to young children, influenced by factors, including socio-economic conditions captured by the HDI. Through a comprehensive systematic review and dataset selection (Switzerland, Brazil, United States of America), we formulated an HDI-SEIRS numerical model within the SEIRS framework. Results show variations in RSV bronchiolitis dynamics across countries, emphasizing the influence of HDI. Modelling reveals a correlation between higher HDI and increased bronchiolitis spread, notably in the USA and Switzerland. The ratios HDIcountry over HDImax strengthen this association, while climate disparities contribute to variations, especially in colder climates like the USA and Switzerland. The study raises the hypothesis of an indirect link between higher HDI and more frequent bronchiolitis, underlining the need for nuanced understanding. Factors like improved healthcare access, population density, mobility, and social behaviors in higher HDI countries might contribute to unexpected trends. Limitations include dataset quality and restricted RSV bronchiolitis data. Future research should encompass diverse HDI datasets to refine HDI's role in bronchiolitis dynamics. In conclusion, HDI-SEIRS models offer insights into factors influencing RSV bronchiolitis spread. While HDI is a significant indicator, its impact is indirect, necessitating a holistic approach to effective public health policies. This analysis sets the stage for further investigations into multifaceted interactions shaping bronchiolitis dynamics in diverse socio-economic contexts.Keywords: bronchiolitis propagation, HDI influence, respiratory syncytial virus, SEIRS model
Procedia PDF Downloads 67917 Strategies for Synchronizing Chocolate Conching Data Using Dynamic Time Warping
Authors: Fernanda A. P. Peres, Thiago N. Peres, Flavio S. Fogliatto, Michel J. Anzanello
Abstract:
Batch processes are widely used in food industry and have an important role in the production of high added value products, such as chocolate. Process performance is usually described by variables that are monitored as the batch progresses. Data arising from these processes are likely to display a strong correlation-autocorrelation structure, and are usually monitored using control charts based on multiway principal components analysis (MPCA). Process control of a new batch is carried out comparing the trajectories of its relevant process variables with those in a reference set of batches that yielded products within specifications; it is clear that proper determination of the reference set is key for the success of a correct signalization of non-conforming batches in such quality control schemes. In chocolate manufacturing, misclassifications of non-conforming batches in the conching phase may lead to significant financial losses. In such context, the accuracy of process control grows in relevance. In addition to that, the main assumption in MPCA-based monitoring strategies is that all batches are synchronized in duration, both the new batch being monitored and those in the reference set. Such assumption is often not satisfied in chocolate manufacturing process. As a consequence, traditional techniques as MPCA-based charts are not suitable for process control and monitoring. To address that issue, the objective of this work is to compare the performance of three dynamic time warping (DTW) methods in the alignment and synchronization of chocolate conching process variables’ trajectories, aimed at properly determining the reference distribution for multivariate statistical process control. The power of classification of batches in two categories (conforming and non-conforming) was evaluated using the k-nearest neighbor (KNN) algorithm. Real data from a milk chocolate conching process was collected and the following variables were monitored over time: frequency of soybean lecithin dosage, rotation speed of the shovels, current of the main motor of the conche, and chocolate temperature. A set of 62 batches with durations between 495 and 1,170 minutes was considered; 53% of the batches were known to be conforming based on lab test results and experts’ evaluations. Results showed that all three DTW methods tested were able to align and synchronize the conching dataset. However, synchronized datasets obtained from these methods performed differently when inputted in the KNN classification algorithm. Kassidas, MacGregor and Taylor’s (named KMT) method was deemed the best DTW method for aligning and synchronizing a milk chocolate conching dataset, presenting 93.7% accuracy, 97.2% sensitivity and 90.3% specificity in batch classification, being considered the best option to determine the reference set for the milk chocolate dataset. Such method was recommended due to the lowest number of iterations required to achieve convergence and highest average accuracy in the testing portion using the KNN classification technique.Keywords: batch process monitoring, chocolate conching, dynamic time warping, reference set distribution, variable duration
Procedia PDF Downloads 167916 Using Machine Learning to Classify Different Body Parts and Determine Healthiness
Authors: Zachary Pan
Abstract:
Our general mission is to solve the problem of classifying images into different body part types and deciding if each of them is healthy or not. However, for now, we will determine healthiness for only one-sixth of the body parts, specifically the chest. We will detect pneumonia in X-ray scans of those chest images. With this type of AI, doctors can use it as a second opinion when they are taking CT or X-ray scans of their patients. Another ad-vantage of using this machine learning classifier is that it has no human weaknesses like fatigue. The overall ap-proach to this problem is to split the problem into two parts: first, classify the image, then determine if it is healthy. In order to classify the image into a specific body part class, the body parts dataset must be split into test and training sets. We can then use many models, like neural networks or logistic regression models, and fit them using the training set. Now, using the test set, we can obtain a realistic accuracy the models will have on images in the real world since these testing images have never been seen by the models before. In order to increase this testing accuracy, we can also apply many complex algorithms to the models, like multiplicative weight update. For the second part of the problem, to determine if the body part is healthy, we can have another dataset consisting of healthy and non-healthy images of the specific body part and once again split that into the test and training sets. We then use another neural network to train on those training set images and use the testing set to figure out its accuracy. We will do this process only for the chest images. A major conclusion reached is that convolutional neural networks are the most reliable and accurate at image classification. In classifying the images, the logistic regression model, the neural network, neural networks with multiplicative weight update, neural networks with the black box algorithm, and the convolutional neural network achieved 96.83 percent accuracy, 97.33 percent accuracy, 97.83 percent accuracy, 96.67 percent accuracy, and 98.83 percent accuracy, respectively. On the other hand, the overall accuracy of the model that de-termines if the images are healthy or not is around 78.37 percent accuracy.Keywords: body part, healthcare, machine learning, neural networks
Procedia PDF Downloads 103915 Detecting Hate Speech And Cyberbullying Using Natural Language Processing
Authors: Nádia Pereira, Paula Ferreira, Sofia Francisco, Sofia Oliveira, Sidclay Souza, Paula Paulino, Ana Margarida Veiga Simão
Abstract:
Social media has progressed into a platform for hate speech among its users, and thus, there is an increasing need to develop automatic detection classifiers of offense and conflicts to help decrease the prevalence of such incidents. Online communication can be used to intentionally harm someone, which is why such classifiers could be essential in social networks. A possible application of these classifiers is the automatic detection of cyberbullying. Even though identifying the aggressive language used in online interactions could be important to build cyberbullying datasets, there are other criteria that must be considered. Being able to capture the language, which is indicative of the intent to harm others in a specific context of online interaction is fundamental. Offense and hate speech may be the foundation of online conflicts, which have become commonly used in social media and are an emergent research focus in machine learning and natural language processing. This study presents two Portuguese language offense-related datasets which serve as examples for future research and extend the study of the topic. The first is similar to other offense detection related datasets and is entitled Aggressiveness dataset. The second is a novelty because of the use of the history of the interaction between users and is entitled the Conflicts/Attacks dataset. Both datasets were developed in different phases. Firstly, we performed a content analysis of verbal aggression witnessed by adolescents in situations of cyberbullying. Secondly, we computed frequency analyses from the previous phase to gather lexical and linguistic cues used to identify potentially aggressive conflicts and attacks which were posted on Twitter. Thirdly, thorough annotation of real tweets was performed byindependent postgraduate educational psychologists with experience in cyberbullying research. Lastly, we benchmarked these datasets with other machine learning classifiers.Keywords: aggression, classifiers, cyberbullying, datasets, hate speech, machine learning
Procedia PDF Downloads 228914 Data Mining Model for Predicting the Status of HIV Patients during Drug Regimen Change
Authors: Ermias A. Tegegn, Million Meshesha
Abstract:
Human Immunodeficiency Virus and Acquired Immunodeficiency Syndrome (HIV/AIDS) is a major cause of death for most African countries. Ethiopia is one of the seriously affected countries in sub Saharan Africa. Previously in Ethiopia, having HIV/AIDS was almost equivalent to a death sentence. With the introduction of Antiretroviral Therapy (ART), HIV/AIDS has become chronic, but manageable disease. The study focused on a data mining technique to predict future living status of HIV/AIDS patients at the time of drug regimen change when the patients become toxic to the currently taking ART drug combination. The data is taken from University of Gondar Hospital ART program database. Hybrid methodology is followed to explore the application of data mining on ART program dataset. Data cleaning, handling missing values and data transformation were used for preprocessing the data. WEKA 3.7.9 data mining tools, classification algorithms, and expertise are utilized as means to address the research problem. By using four different classification algorithms, (i.e., J48 Classifier, PART rule induction, Naïve Bayes and Neural network) and by adjusting their parameters thirty-two models were built on the pre-processed University of Gondar ART program dataset. The performances of the models were evaluated using the standard metrics of accuracy, precision, recall, and F-measure. The most effective model to predict the status of HIV patients with drug regimen substitution is pruned J48 decision tree with a classification accuracy of 98.01%. This study extracts interesting attributes such as Ever taking Cotrim, Ever taking TbRx, CD4 count, Age, Weight, and Gender so as to predict the status of drug regimen substitution. The outcome of this study can be used as an assistant tool for the clinician to help them make more appropriate drug regimen substitution. Future research directions are forwarded to come up with an applicable system in the area of the study.Keywords: HIV drug regimen, data mining, hybrid methodology, predictive model
Procedia PDF Downloads 142913 Lake Water Surface Variations and Its Influencing Factors in Tibetan Plateau in Recent 10 Years
Authors: Shanlong Lu, Jiming Jin, Xiaochun Wang
Abstract:
The Tibetan Plateau has the largest number of inland lakes with the highest elevation on the planet. These massive and large lakes are mostly in natural state and are less affected by human activities. Their shrinking or expansion can truly reflect regional climate and environmental changes and are sensitive indicators of global climate change. However, due to the sparsely populated nature of the plateau and the poor natural conditions, it is difficult to effectively obtain the change data of the lake, which has affected people's understanding of the temporal and spatial processes of lake water changes and their influencing factors. By using the MODIS (Moderate Resolution Imaging Spectroradiometer) MOD09Q1 surface reflectance images as basic data, this study produced the 8-day lake water surface data set of the Tibetan Plateau from 2000 to 2012 at 250 m spatial resolution, with a lake water surface extraction method of combined with lake water surface boundary buffer analyzing and lake by lake segmentation threshold determining. Then based on the dataset, the lake water surface variations and their influencing factors were analyzed, by using 4 typical natural geographical zones of Eastern Qinghai and Qilian, Southern Qinghai, Qiangtang, and Southern Tibet, and the watersheds of the top 10 lakes of Qinghai, Siling Co, Namco, Zhari NamCo, Tangra Yumco, Ngoring, UlanUla, Yamdrok Tso, Har and Gyaring as the analysis units. The accuracy analysis indicate that compared with water surface data of the 134 sample lakes extracted from the 30 m Landsat TM (Thematic Mapper ) images, the average overall accuracy of the lake water surface data set is 91.81% with average commission and omission error of 3.26% and 5.38%; the results also show strong linear (R2=0.9991) correlation with the global MODIS water mask dataset with overall accuracy of 86.30%; and the lake area difference between the Second National Lake Survey and this study is only 4.74%, respectively. This study provides reliable dataset for the lake change research of the plateau in the recent decade. The change trends and influencing factors analysis indicate that the total water surface area of lakes in the plateau showed overall increases, but only lakes with areas larger than 10 km2 had statistically significant increases. Furthermore, lakes with area larger than 100 km2 experienced an abrupt change in 2005. In addition, the annual average precipitation of Southern Tibet and Southern Qinghai experienced significant increasing and decreasing trends, and corresponding abrupt changes in 2004 and 2006, respectively. The annual average temperature of Southern Tibet and Qiangtang showed a significant increasing trend with an abrupt change in 2004. The major reason for the lake water surface variation in Eastern Qinghai and Qilian, Southern Qinghai and Southern Tibet is the changes of precipitation, and that for Qiangtang is the temperature variations.Keywords: lake water surface variation, MODIS MOD09Q1, remote sensing, Tibetan Plateau
Procedia PDF Downloads 231912 Heart Rate Variability Analysis for Early Stage Prediction of Sudden Cardiac Death
Authors: Reeta Devi, Hitender Kumar Tyagi, Dinesh Kumar
Abstract:
In present scenario, cardiovascular problems are growing challenge for researchers and physiologists. As heart disease have no geographic, gender or socioeconomic specific reasons; detecting cardiac irregularities at early stage followed by quick and correct treatment is very important. Electrocardiogram is the finest tool for continuous monitoring of heart activity. Heart rate variability (HRV) is used to measure naturally occurring oscillations between consecutive cardiac cycles. Analysis of this variability is carried out using time domain, frequency domain and non-linear parameters. This paper presents HRV analysis of the online dataset for normal sinus rhythm (taken as healthy subject) and sudden cardiac death (SCD subject) using all three methods computing values for parameters like standard deviation of node to node intervals (SDNN), square root of mean of the sequences of difference between adjacent RR intervals (RMSSD), mean of R to R intervals (mean RR) in time domain, very low-frequency (VLF), low-frequency (LF), high frequency (HF) and ratio of low to high frequency (LF/HF ratio) in frequency domain and Poincare plot for non linear analysis. To differentiate HRV of healthy subject from subject died with SCD, k –nearest neighbor (k-NN) classifier has been used because of its high accuracy. Results show highly reduced values for all stated parameters for SCD subjects as compared to healthy ones. As the dataset used for SCD patients is recording of their ECG signal one hour prior to their death, it is therefore, verified with an accuracy of 95% that proposed algorithm can identify mortality risk of a patient one hour before its death. The identification of a patient’s mortality risk at such an early stage may prevent him/her meeting sudden death if in-time and right treatment is given by the doctor.Keywords: early stage prediction, heart rate variability, linear and non-linear analysis, sudden cardiac death
Procedia PDF Downloads 342911 In-House Enzyme Blends from Polyporus ciliatus CBS 366.74 for Enzymatic Saccharification of Pretreated Corn Stover
Authors: Joseph A. Bentil, Anders Thygesen, Lene Langea, Moses Mensah, Anne Meyer
Abstract:
The study investigated the saccharification potential of in-house enzymes produced from a white-rot basidiomycete strain, Polyporus ciliatus CBS 366.74. The in-house enzymes were produced by growing the fungus on mono and composite substrates of cocoa pod husk (CPH) and green seaweed (GS) (Ulva lactuca sp.) with and without the addition of 25mM ammonium nitrate at 4%w/v substrate concentration in submerged condition for 144 hours. The crude enzyme extracts preparations (CEE 1-5 and CEE 1-5+AN) obtained from the fungal cultivation process were sterile-filtered and used as enzyme sources for enzymatic hydrolysis of hydrothermally pretreated corn stover using a commercial cocktail enzyme, Cellic Ctec3, as benchmark. The hydrolysis was conducted at 50ᵒC with 50mM sodium acetate buffer, pH 5 based on enzyme dosages of 5 and 10 CMCase Units/g biomass at 1%w/v dry weight substrate concentration at time points of 6, 24, and 72 hours. The enzyme activity profile of the in-house enzymes varied among the growth substrates with the composite substrates (50-75% GS and AN inclusion), yielding better enzyme activities, especially endoglucanases (0.4-0.5U/mL), β-glucosidases (0.1-0.2 U/mL), and xylanases (3-10 U/mL). However, nitrogen supplementation had no significant effect on enzyme activities of crude extracts from 100% GS substituted substrates. From the enzymatic hydrolysis, it was observed that the in-house enzymes were capable of hydrolysing the pretreated corn stover at varying degrees; however, the saccharification yield was less than 10%. Consequently, theoretical glucose yield was ten times lower than Cellic Ctec3 at both dosage levels. There was no linear correlation between glucose yield and enzyme dosage for the in-house enzymes, unlike the benchmark enzyme. It is therefore recommended that the in-house enzymes are used to complement the dosage of commercial enzymes to reduce the cost of biomass saccharification.Keywords: enzyme production, hydrolysis yield, feedstock, enzyme blend, Polyporus ciliatus
Procedia PDF Downloads 267910 Effective Survey Designing for Conducting Opinion Survey to Follow Participatory Approach in a Study of Transport Infrastructure Projects: A Case Study of the City of Kolkata
Authors: Jayanti De
Abstract:
Users of any urban road infrastructure may be classified into various categories. The current paper intends to see whether the opinions on different environmental and transportation criteria vary significantly among different types of road users or not. The paper addresses this issue by using a unique survey data that has been collected from Kolkata, a highly populated city in India. Multiple criteria should be taken into account while planning on infrastructure development programs. Given limited resources, a welfare maximizing government typically resorts to public opinion by designing surveys for prioritization of one project over another. Designing such surveys can be challenging and costly. Deciding upon whom to include in a survey and how to represent each group of consumers/road-users depend crucially on how opinion for different criteria vary across consumer groups. A unique dataset has been collected from various parts of Kolkata to statistically test (using Kolmogorov-Smirnov test) whether assigning of weights to rank the transportation criteria like congestion, air pollution, noise pollution, and morning/evening delay vary significantly across the various groups of users of such infrastructure. The different consumer/user groups in the dataset include pedestrian, private car owner, para-transit (taxi /auto rickshaw) user, public transport (bus) user and freight transporter among others. Very little evidence has been found that ranking of different criteria among these groups vary significantly. This also supports the hypothesis that road- users/consumers form their opinion by using their long-run rather than immediate experience. As a policy prescription, this implies that under-representation or over-representation of a specific consumer group in a survey may not necessarily distort the overall opinion, since opinions across different consumer groups are highly correlated as evident from this particular case study.Keywords: multi criteria analysis, project-prioritization, road- users, survey designing
Procedia PDF Downloads 280909 Comparison Of Virtual Non-Contrast To True Non-Contrast Images Using Dual Layer Spectral Computed Tomography
Authors: O’Day Luke
Abstract:
Purpose: To validate virtual non-contrast reconstructions generated from dual-layer spectral computed tomography (DL-CT) data as an alternative for the acquisition of a dedicated true non-contrast dataset during multiphase contrast studies. Material and methods: Thirty-three patients underwent a routine multiphase clinical CT examination, using Dual-Layer Spectral CT, from March to August 2021. True non-contrast (TNC) and virtual non-contrast (VNC) datasets, generated from both portal venous and arterial phase imaging were evaluated. For every patient in both true and virtual non-contrast datasets, a region-of-interest (ROI) was defined in aorta, liver, fluid (i.e. gallbladder, urinary bladder), kidney, muscle, fat and spongious bone, resulting in 693 ROIs. Differences in attenuation for VNC and TNV images were compared, both separately and combined. Consistency between VNC reconstructions obtained from the arterial and portal venous phase was evaluated. Results: Comparison of CT density (HU) on the VNC and TNC images showed a high correlation. The mean difference between TNC and VNC images (excluding bone results) was 5.5 ± 9.1 HU and > 90% of all comparisons showed a difference of less than 15 HU. For all tissues but spongious bone, the mean absolute difference between TNC and VNC images was below 10 HU. VNC images derived from the arterial and the portal venous phase showed a good correlation in most tissue types. The aortic attenuation was somewhat dependent however on which dataset was used for reconstruction. Bone evaluation with VNC datasets continues to be a problem, as spectral CT algorithms are currently poor in differentiating bone and iodine. Conclusion: Given the increasing availability of DL-CT and proven accuracy of virtual non-contrast processing, VNC is a promising tool for generating additional data during routine contrast-enhanced studies. This study shows the utility of virtual non-contrast scans as an alternative for true non-contrast studies during multiphase CT, with potential for dose reduction, without loss of diagnostic information.Keywords: dual-layer spectral computed tomography, virtual non-contrast, true non-contrast, clinical comparison
Procedia PDF Downloads 141908 Off-Line Text-Independent Arabic Writer Identification Using Optimum Codebooks
Authors: Ahmed Abdullah Ahmed
Abstract:
The task of recognizing the writer of a handwritten text has been an attractive research problem in the document analysis and recognition community with applications in handwriting forensics, paleography, document examination and handwriting recognition. This research presents an automatic method for writer recognition from digitized images of unconstrained writings. Although a great effort has been made by previous studies to come out with various methods, their performances, especially in terms of accuracy, are fallen short, and room for improvements is still wide open. The proposed technique employs optimal codebook based writer characterization where each writing sample is represented by a set of features computed from two codebooks, beginning and ending. Unlike most of the classical codebook based approaches which segment the writing into graphemes, this study is based on fragmenting a particular area of writing which are beginning and ending strokes. The proposed method starting with contour detection to extract significant information from the handwriting and the curve fragmentation is then employed to categorize the handwriting into Beginning and Ending zones into small fragments. The similar fragments of beginning strokes are grouped together to create Beginning cluster, and similarly, the ending strokes are grouped to create the ending cluster. These two clusters lead to the development of two codebooks (beginning and ending) by choosing the center of every similar fragments group. Writings under study are then represented by computing the probability of occurrence of codebook patterns. The probability distribution is used to characterize each writer. Two writings are then compared by computing distances between their respective probability distribution. The evaluations carried out on ICFHR standard dataset of 206 writers using Beginning and Ending codebooks separately. Finally, the Ending codebook achieved the highest identification rate of 98.23%, which is the best result so far on ICFHR dataset.Keywords: off-line text-independent writer identification, feature extraction, codebook, fragments
Procedia PDF Downloads 512907 Census and Mapping of Oil Palms Over Satellite Dataset Using Deep Learning Model
Authors: Gholba Niranjan Dilip, Anil Kumar
Abstract:
Conduct of accurate reliable mapping of oil palm plantations and census of individual palm trees is a huge challenge. This study addresses this challenge and developed an optimized solution implemented deep learning techniques on remote sensing data. The oil palm is a very important tropical crop. To improve its productivity and land management, it is imperative to have accurate census over large areas. Since, manual census is costly and prone to approximations, a methodology for automated census using panchromatic images from Cartosat-2, SkySat and World View-3 satellites is demonstrated. It is selected two different study sites in Indonesia. The customized set of training data and ground-truth data are created for this study from Cartosat-2 images. The pre-trained model of Single Shot MultiBox Detector (SSD) Lite MobileNet V2 Convolutional Neural Network (CNN) from the TensorFlow Object Detection API is subjected to transfer learning on this customized dataset. The SSD model is able to generate the bounding boxes for each oil palm and also do the counting of palms with good accuracy on the panchromatic images. The detection yielded an F-Score of 83.16 % on seven different images. The detections are buffered and dissolved to generate polygons demarcating the boundaries of the oil palm plantations. This provided the area under the plantations and also gave maps of their location, thereby completing the automated census, with a fairly high accuracy (≈100%). The trained CNN was found competent enough to detect oil palm crowns from images obtained from multiple satellite sensors and of varying temporal vintage. It helped to estimate the increase in oil palm plantations from 2014 to 2021 in the study area. The study proved that high-resolution panchromatic satellite image can successfully be used to undertake census of oil palm plantations using CNNs.Keywords: object detection, oil palm tree census, panchromatic images, single shot multibox detector
Procedia PDF Downloads 160906 Integrating Radar Sensors with an Autonomous Vehicle Simulator for an Enhanced Smart Parking Management System
Authors: Mohamed Gazzeh, Bradley Null, Fethi Tlili, Hichem Besbes
Abstract:
The burgeoning global ownership of personal vehicles has posed a significant strain on urban infrastructure, notably parking facilities, leading to traffic congestion and environmental concerns. Effective parking management systems (PMS) are indispensable for optimizing urban traffic flow and reducing emissions. The most commonly deployed systems nowadays rely on computer vision technology. This paper explores the integration of radar sensors and simulation in the context of smart parking management. We concentrate on radar sensors due to their versatility and utility in automotive applications, which extends to PMS. Additionally, radar sensors play a crucial role in driver assistance systems and autonomous vehicle development. However, the resource-intensive nature of radar data collection for algorithm development and testing necessitates innovative solutions. Simulation, particularly the monoDrive simulator, an internal development tool used by NI the Test and Measurement division of Emerson, offers a practical means to overcome this challenge. The primary objectives of this study encompass simulating radar sensors to generate a substantial dataset for algorithm development, testing, and, critically, assessing the transferability of models between simulated and real radar data. We focus on occupancy detection in parking as a practical use case, categorizing each parking space as vacant or occupied. The simulation approach using monoDrive enables algorithm validation and reliability assessment for virtual radar sensors. It meticulously designed various parking scenarios, involving manual measurements of parking spot coordinates, orientations, and the utilization of TI AWR1843 radar. To create a diverse dataset, we generated 4950 scenarios, comprising a total of 455,400 parking spots. This extensive dataset encompasses radar configuration details, ground truth occupancy information, radar detections, and associated object attributes such as range, azimuth, elevation, radar cross-section, and velocity data. The paper also addresses the intricacies and challenges of real-world radar data collection, highlighting the advantages of simulation in producing radar data for parking lot applications. We developed classification models based on Support Vector Machines (SVM) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), exclusively trained and evaluated on simulated data. Subsequently, we applied these models to real-world data, comparing their performance against the monoDrive dataset. The study demonstrates the feasibility of transferring models from a simulated environment to real-world applications, achieving an impressive accuracy score of 92% using only one radar sensor. This finding underscores the potential of radar sensors and simulation in the development of smart parking management systems, offering significant benefits for improving urban mobility and reducing environmental impact. The integration of radar sensors and simulation represents a promising avenue for enhancing smart parking management systems, addressing the challenges posed by the exponential growth in personal vehicle ownership. This research contributes valuable insights into the practicality of using simulated radar data in real-world applications and underscores the role of radar technology in advancing urban sustainability.Keywords: autonomous vehicle simulator, FMCW radar sensors, occupancy detection, smart parking management, transferability of models
Procedia PDF Downloads 81905 Personalizing Human Physical Life Routines Recognition over Cloud-based Sensor Data via AI and Machine Learning
Authors: Kaushik Sathupadi, Sandesh Achar
Abstract:
Pervasive computing is a growing research field that aims to acknowledge human physical life routines (HPLR) based on body-worn sensors such as MEMS sensors-based technologies. The use of these technologies for human activity recognition is progressively increasing. On the other hand, personalizing human life routines using numerous machine-learning techniques has always been an intriguing topic. In contrast, various methods have demonstrated the ability to recognize basic movement patterns. However, it still needs to be improved to anticipate the dynamics of human living patterns. This study introduces state-of-the-art techniques for recognizing static and dy-namic patterns and forecasting those challenging activities from multi-fused sensors. Further-more, numerous MEMS signals are extracted from one self-annotated IM-WSHA dataset and two benchmarked datasets. First, we acquired raw data is filtered with z-normalization and denoiser methods. Then, we adopted statistical, local binary pattern, auto-regressive model, and intrinsic time scale decomposition major features for feature extraction from different domains. Next, the acquired features are optimized using maximum relevance and minimum redundancy (mRMR). Finally, the artificial neural network is applied to analyze the whole system's performance. As a result, we attained a 90.27% recognition rate for the self-annotated dataset, while the HARTH and KU-HAR achieved 83% on nine living activities and 90.94% on 18 static and dynamic routines. Thus, the proposed HPLR system outperformed other state-of-the-art systems when evaluated with other methods in the literature.Keywords: artificial intelligence, machine learning, gait analysis, local binary pattern (LBP), statistical features, micro-electro-mechanical systems (MEMS), maximum relevance and minimum re-dundancy (MRMR)
Procedia PDF Downloads 21904 A Study on the Application of Machine Learning and Deep Learning Techniques for Skin Cancer Detection
Authors: Hritwik Ghosh, Irfan Sadiq Rahat, Sachi Nandan Mohanty, J. V. R. Ravindra
Abstract:
In the rapidly evolving landscape of medical diagnostics, the early detection and accurate classification of skin cancer remain paramount for effective treatment outcomes. This research delves into the transformative potential of Artificial Intelligence (AI), specifically Deep Learning (DL), as a tool for discerning and categorizing various skin conditions. Utilizing a diverse dataset of 3,000 images representing nine distinct skin conditions, we confront the inherent challenge of class imbalance. This imbalance, where conditions like melanomas are over-represented, is addressed by incorporating class weights during the model training phase, ensuring an equitable representation of all conditions in the learning process. Our pioneering approach introduces a hybrid model, amalgamating the strengths of two renowned Convolutional Neural Networks (CNNs), VGG16 and ResNet50. These networks, pre-trained on the ImageNet dataset, are adept at extracting intricate features from images. By synergizing these models, our research aims to capture a holistic set of features, thereby bolstering classification performance. Preliminary findings underscore the hybrid model's superiority over individual models, showcasing its prowess in feature extraction and classification. Moreover, the research emphasizes the significance of rigorous data pre-processing, including image resizing, color normalization, and segmentation, in ensuring data quality and model reliability. In essence, this study illuminates the promising role of AI and DL in revolutionizing skin cancer diagnostics, offering insights into its potential applications in broader medical domains.Keywords: artificial intelligence, machine learning, deep learning, skin cancer, dermatology, convolutional neural networks, image classification, computer vision, healthcare technology, cancer detection, medical imaging
Procedia PDF Downloads 87903 PsyVBot: Chatbot for Accurate Depression Diagnosis using Long Short-Term Memory and NLP
Authors: Thaveesha Dheerasekera, Dileeka Sandamali Alwis
Abstract:
The escalating prevalence of mental health issues, such as depression and suicidal ideation, is a matter of significant global concern. It is plausible that a variety of factors, such as life events, social isolation, and preexisting physiological or psychological health conditions, could instigate or exacerbate these conditions. Traditional approaches to diagnosing depression entail a considerable amount of time and necessitate the involvement of adept practitioners. This underscores the necessity for automated systems capable of promptly detecting and diagnosing symptoms of depression. The PsyVBot system employs sophisticated natural language processing and machine learning methodologies, including the use of the NLTK toolkit for dataset preprocessing and the utilization of a Long Short-Term Memory (LSTM) model. The PsyVBot exhibits a remarkable ability to diagnose depression with a 94% accuracy rate through the analysis of user input. Consequently, this resource proves to be efficacious for individuals, particularly those enrolled in academic institutions, who may encounter challenges pertaining to their psychological well-being. The PsyVBot employs a Long Short-Term Memory (LSTM) model that comprises a total of three layers, namely an embedding layer, an LSTM layer, and a dense layer. The stratification of these layers facilitates a precise examination of linguistic patterns that are associated with the condition of depression. The PsyVBot has the capability to accurately assess an individual's level of depression through the identification of linguistic and contextual cues. The task is achieved via a rigorous training regimen, which is executed by utilizing a dataset comprising information sourced from the subreddit r/SuicideWatch. The diverse data present in the dataset ensures precise and delicate identification of symptoms linked with depression, thereby guaranteeing accuracy. PsyVBot not only possesses diagnostic capabilities but also enhances the user experience through the utilization of audio outputs. This feature enables users to engage in more captivating and interactive interactions. The PsyVBot platform offers individuals the opportunity to conveniently diagnose mental health challenges through a confidential and user-friendly interface. Regarding the advancement of PsyVBot, maintaining user confidentiality and upholding ethical principles are of paramount significance. It is imperative to note that diligent efforts are undertaken to adhere to ethical standards, thereby safeguarding the confidentiality of user information and ensuring its security. Moreover, the chatbot fosters a conducive atmosphere that is supportive and compassionate, thereby promoting psychological welfare. In brief, PsyVBot is an automated conversational agent that utilizes an LSTM model to assess the level of depression in accordance with the input provided by the user. The demonstrated accuracy rate of 94% serves as a promising indication of the potential efficacy of employing natural language processing and machine learning techniques in tackling challenges associated with mental health. The reliability of PsyVBot is further improved by the fact that it makes use of the Reddit dataset and incorporates Natural Language Toolkit (NLTK) for preprocessing. PsyVBot represents a pioneering and user-centric solution that furnishes an easily accessible and confidential medium for seeking assistance. The present platform is offered as a modality to tackle the pervasive issue of depression and the contemplation of suicide.Keywords: chatbot, depression diagnosis, LSTM model, natural language process
Procedia PDF Downloads 69902 Evaluation of Video Quality Metrics and Performance Comparison on Contents Taken from Most Commonly Used Devices
Authors: Pratik Dhabal Deo, Manoj P.
Abstract:
With the increasing number of social media users, the amount of video content available has also significantly increased. Currently, the number of smartphone users is at its peak, and many are increasingly using their smartphones as their main photography and recording devices. There have been a lot of developments in the field of Video Quality Assessment (VQA) and metrics like VMAF, SSIM etc. are said to be some of the best performing metrics, but the evaluation of these metrics is dominantly done on professionally taken video contents using professional tools, lighting conditions etc. No study particularly pinpointing the performance of the metrics on the contents taken by users on very commonly available devices has been done. Datasets that contain a huge number of videos from different high-end devices make it difficult to analyze the performance of the metrics on the content from most used devices even if they contain contents taken in poor lighting conditions using lower-end devices. These devices face a lot of distortions due to various factors since the spectrum of contents recorded on these devices is huge. In this paper, we have presented an analysis of the objective VQA metrics on contents taken only from most used devices and their performance on them, focusing on full-reference metrics. To carry out this research, we created a custom dataset containing a total of 90 videos that have been taken from three most commonly used devices, and android smartphone, an IOS smartphone and a DSLR. On the videos taken on each of these devices, the six most common types of distortions that users face have been applied on addition to already existing H.264 compression based on four reference videos. These six applied distortions have three levels of degradation each. A total of the five most popular VQA metrics have been evaluated on this dataset and the highest values and the lowest values of each of the metrics on the distortions have been recorded. Finally, it is found that blur is the artifact on which most of the metrics didn’t perform well. Thus, in order to understand the results better the amount of blur in the data set has been calculated and an additional evaluation of the metrics was done using HEVC codec, which is the next version of H.264 compression, on the camera that proved to be the sharpest among the devices. The results have shown that as the resolution increases, the performance of the metrics tends to become more accurate and the best performing metric among them is VQM with very few inconsistencies and inaccurate results when the compression applied is H.264, but when the compression is applied is HEVC, SSIM and VMAF have performed significantly better.Keywords: distortion, metrics, performance, resolution, video quality assessment
Procedia PDF Downloads 203901 Emotion Recognition in Video and Images in the Wild
Authors: Faizan Tariq, Moayid Ali Zaidi
Abstract:
Facial emotion recognition algorithms are expanding rapidly now a day. People are using different algorithms with different combinations to generate best results. There are six basic emotions which are being studied in this area. Author tried to recognize the facial expressions using object detector algorithms instead of traditional algorithms. Two object detection algorithms were chosen which are Faster R-CNN and YOLO. For pre-processing we used image rotation and batch normalization. The dataset I have chosen for the experiments is Static Facial Expression in Wild (SFEW). Our approach worked well but there is still a lot of room to improve it, which will be a future direction.Keywords: face recognition, emotion recognition, deep learning, CNN
Procedia PDF Downloads 187900 Study and Analysis of the Factors Affecting Road Safety Using Decision Tree Algorithms
Authors: Naina Mahajan, Bikram Pal Kaur
Abstract:
The purpose of traffic accident analysis is to find the possible causes of an accident. Road accidents cannot be totally prevented but by suitable traffic engineering and management the accident rate can be reduced to a certain extent. This paper discusses the classification techniques C4.5 and ID3 using the WEKA Data mining tool. These techniques use on the NH (National highway) dataset. With the C4.5 and ID3 technique it gives best results and high accuracy with less computation time and error rate.Keywords: C4.5, ID3, NH(National highway), WEKA data mining tool
Procedia PDF Downloads 338899 Analysis of Citation Rate and Data Reuse for Openly Accessible Biodiversity Datasets on Global Biodiversity Information Facility
Authors: Nushrat Khan, Mike Thelwall, Kayvan Kousha
Abstract:
Making research data openly accessible has been mandated by most funders over the last 5 years as it promotes reproducibility in science and reduces duplication of effort to collect the same data. There are evidence that articles that publicly share research data have higher citation rates in biological and social sciences. However, how and whether shared data is being reused is not always intuitive as such information is not easily accessible from the majority of research data repositories. This study aims to understand the practice of data citation and how data is being reused over the years focusing on biodiversity since research data is frequently reused in this field. Metadata of 38,878 datasets including citation counts were collected through the Global Biodiversity Information Facility (GBIF) API for this purpose. GBIF was used as a data source since it provides citation count for datasets, not a commonly available feature for most repositories. Analysis of dataset types, citation counts, creation and update time of datasets suggests that citation rate varies for different types of datasets, where occurrence datasets that have more granular information have higher citation rates than checklist and metadata-only datasets. Another finding is that biodiversity datasets on GBIF are frequently updated, which is unique to this field. Majority of the datasets from the earliest year of 2007 were updated after 11 years, with no dataset that was not updated since creation. For each year between 2007 and 2017, we compared the correlations between update time and citation rate of four different types of datasets. While recent datasets do not show any correlations, 3 to 4 years old datasets show weak correlation where datasets that were updated more recently received high citations. The results are suggestive that it takes several years to cumulate citations for research datasets. However, this investigation found that when searched on Google Scholar or Scopus databases for the same datasets, the number of citations is often not the same as GBIF. Hence future aim is to further explore the citation count system adopted by GBIF to evaluate its reliability and whether it can be applicable to other fields of studies as well.Keywords: data citation, data reuse, research data sharing, webometrics
Procedia PDF Downloads 178898 Statistical Models and Time Series Forecasting on Crime Data in Nepal
Authors: Dila Ram Bhandari
Abstract:
Throughout the 20th century, new governments were created where identities such as ethnic, religious, linguistic, caste, communal, tribal, and others played a part in the development of constitutions and the legal system of victim and criminal justice. Acute issues with extremism, poverty, environmental degradation, cybercrimes, human rights violations, crime against, and victimization of both individuals and groups have recently plagued South Asian nations. Everyday massive number of crimes are steadfast, these frequent crimes have made the lives of common citizens restless. Crimes are one of the major threats to society and also for civilization. Crime is a bone of contention that can create a societal disturbance. The old-style crime solving practices are unable to live up to the requirement of existing crime situations. Crime analysis is one of the most important activities of the majority of intelligent and law enforcement organizations all over the world. The South Asia region lacks such a regional coordination mechanism, unlike central Asia of Asia Pacific regions, to facilitate criminal intelligence sharing and operational coordination related to organized crime, including illicit drug trafficking and money laundering. There have been numerous conversations in recent years about using data mining technology to combat crime and terrorism. The Data Detective program from Sentient as a software company, uses data mining techniques to support the police (Sentient, 2017). The goals of this internship are to test out several predictive model solutions and choose the most effective and promising one. First, extensive literature reviews on data mining, crime analysis, and crime data mining were conducted. Sentient offered a 7-year archive of crime statistics that were daily aggregated to produce a univariate dataset. Moreover, a daily incidence type aggregation was performed to produce a multivariate dataset. Each solution's forecast period lasted seven days. Statistical models and neural network models were the two main groups into which the experiments were split. For the crime data, neural networks fared better than statistical models. This study gives a general review of the applied statistics and neural network models. A detailed image of each model's performance on the available data and generalizability is provided by a comparative analysis of all the models on a comparable dataset. Obviously, the studies demonstrated that, in comparison to other models, Gated Recurrent Units (GRU) produced greater prediction. The crime records of 2005-2019 which was collected from Nepal Police headquarter and analysed by R programming. In conclusion, gated recurrent unit implementation could give benefit to police in predicting crime. Hence, time series analysis using GRU could be a prospective additional feature in Data Detective.Keywords: time series analysis, forecasting, ARIMA, machine learning
Procedia PDF Downloads 164897 Physics Informed Deep Residual Networks Based Type-A Aortic Dissection Prediction
Abstract:
Purpose: Acute Type A aortic dissection is a well-known cause of extremely high mortality rate. A highly accurate and cost-effective non-invasive predictor is critically needed so that the patient can be treated at earlier stage. Although various CFD approaches have been tried to establish some prediction frameworks, they are sensitive to uncertainty in both image segmentation and boundary conditions. Tedious pre-processing and demanding calibration procedures requirement further compound the issue, thus hampering their clinical applicability. Using the latest physics informed deep learning methods to establish an accurate and cost-effective predictor framework are amongst the main goals for a better Type A aortic dissection treatment. Methods: Via training a novel physics-informed deep residual network, with non-invasive 4D MRI displacement vectors as inputs, the trained model can cost-effectively calculate all these biomarkers: aortic blood pressure, WSS, and OSI, which are used to predict potential type A aortic dissection to avoid the high mortality events down the road. Results: The proposed deep learning method has been successfully trained and tested with both synthetic 3D aneurysm dataset and a clinical dataset in the aortic dissection context using Google colab environment. In both cases, the model has generated aortic blood pressure, WSS, and OSI results matching the expected patient’s health status. Conclusion: The proposed novel physics-informed deep residual network shows great potential to create a cost-effective, non-invasive predictor framework. Additional physics-based de-noising algorithm will be added to make the model more robust to clinical data noises. Further studies will be conducted in collaboration with big institutions such as Cleveland Clinic with more clinical samples to further improve the model’s clinical applicability.Keywords: type-a aortic dissection, deep residual networks, blood flow modeling, data-driven modeling, non-invasive diagnostics, deep learning, artificial intelligence.
Procedia PDF Downloads 89896 Constructing a Semi-Supervised Model for Network Intrusion Detection
Authors: Tigabu Dagne Akal
Abstract:
While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.Keywords: intrusion detection, data mining, computer science, data mining
Procedia PDF Downloads 296895 Saudi Twitter Corpus for Sentiment Analysis
Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari
Abstract:
Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.Keywords: Arabic, sentiment analysis, Twitter, annotation
Procedia PDF Downloads 630