Search results for: atrous convolutional
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 377

Search results for: atrous convolutional

77 Wolof Voice Response Recognition System: A Deep Learning Model for Wolof Audio Classification

Authors: Krishna Mohan Bathula, Fatou Bintou Loucoubar, FNU Kaleemunnisa, Christelle Scharff, Mark Anthony De Castro

Abstract:

Voice recognition algorithms such as automatic speech recognition and text-to-speech systems with African languages can play an important role in bridging the digital divide of Artificial Intelligence in Africa, contributing to the establishment of a fully inclusive information society. This paper proposes a Deep Learning model that can classify the user responses as inputs for an interactive voice response system. A dataset with Wolof language words ‘yes’ and ‘no’ is collected as audio recordings. A two stage Data Augmentation approach is adopted for enhancing the dataset size required by the deep neural network. Data preprocessing and feature engineering with Mel-Frequency Cepstral Coefficients are implemented. Convolutional Neural Networks (CNNs) have proven to be very powerful in image classification and are promising for audio processing when sounds are transformed into spectra. For performing voice response classification, the recordings are transformed into sound frequency feature spectra and then applied image classification methodology using a deep CNN model. The inference model of this trained and reusable Wolof voice response recognition system can be integrated with many applications associated with both web and mobile platforms.

Keywords: automatic speech recognition, interactive voice response, voice response recognition, wolof word classification

Procedia PDF Downloads 90
76 Multimodal Deep Learning for Human Activity Recognition

Authors: Ons Slimene, Aroua Taamallah, Maha Khemaja

Abstract:

In recent years, human activity recognition (HAR) has been a key area of research due to its diverse applications. It has garnered increasing attention in the field of computer vision. HAR plays an important role in people’s daily lives as it has the ability to learn advanced knowledge about human activities from data. In HAR, activities are usually represented by exploiting different types of sensors, such as embedded sensors or visual sensors. However, these sensors have limitations, such as local obstacles, image-related obstacles, sensor unreliability, and consumer concerns. Recently, several deep learning-based approaches have been proposed for HAR and these approaches are classified into two categories based on the type of data used: vision-based approaches and sensor-based approaches. This research paper highlights the importance of multimodal data fusion from skeleton data obtained from videos and data generated by embedded sensors using deep neural networks for achieving HAR. We propose a deep multimodal fusion network based on a twostream architecture. These two streams use the Convolutional Neural Network combined with the Bidirectional LSTM (CNN BILSTM) to process skeleton data and data generated by embedded sensors and the fusion at the feature level is considered. The proposed model was evaluated on a public OPPORTUNITY++ dataset and produced a accuracy of 96.77%.

Keywords: human activity recognition, action recognition, sensors, vision, human-centric sensing, deep learning, context-awareness

Procedia PDF Downloads 71
75 Detection of Atrial Fibrillation Using Wearables via Attentional Two-Stream Heterogeneous Networks

Authors: Huawei Bai, Jianguo Yao, Fellow, IEEE

Abstract:

Atrial fibrillation (AF) is the most common form of heart arrhythmia and is closely associated with mortality and morbidity in heart failure, stroke, and coronary artery disease. The development of single spot optical sensors enables widespread photoplethysmography (PPG) screening, especially for AF, since it represents a more convenient and noninvasive approach. To our knowledge, most existing studies based on public and unbalanced datasets can barely handle the multiple noises sources in the real world and, also, lack interpretability. In this paper, we construct a large- scale PPG dataset using measurements collected from PPG wrist- watch devices worn by volunteers and propose an attention-based two-stream heterogeneous neural network (TSHNN). The first stream is a hybrid neural network consisting of a three-layer one-dimensional convolutional neural network (1D-CNN) and two-layer attention- based bidirectional long short-term memory (Bi-LSTM) network to learn representations from temporally sampled signals. The second stream extracts latent representations from the PPG time-frequency spectrogram using a five-layer CNN. The outputs from both streams are fed into a fusion layer for the outcome. Visualization of the attention weights learned demonstrates the effectiveness of the attention mechanism against noise. The experimental results show that the TSHNN outperforms all the competitive baseline approaches and with 98.09% accuracy, achieves state-of-the-art performance.

Keywords: PPG wearables, atrial fibrillation, feature fusion, attention mechanism, hyber network

Procedia PDF Downloads 87
74 An Investigation into Computer Vision Methods to Identify Material Other Than Grapes in Harvested Wine Grape Loads

Authors: Riaan Kleyn

Abstract:

Mass wine production companies across the globe are provided with grapes from winegrowers that predominantly utilize mechanical harvesting machines to harvest wine grapes. Mechanical harvesting accelerates the rate at which grapes are harvested, allowing grapes to be delivered faster to meet the demands of wine cellars. The disadvantage of the mechanical harvesting method is the inclusion of material-other-than-grapes (MOG) in the harvested wine grape loads arriving at the cellar which degrades the quality of wine that can be produced. Currently, wine cellars do not have a method to determine the amount of MOG present within wine grape loads. This paper seeks to find an optimal computer vision method capable of detecting the amount of MOG within a wine grape load. A MOG detection method will encourage winegrowers to deliver MOG-free wine grape loads to avoid penalties which will indirectly enhance the quality of the wine to be produced. Traditional image segmentation methods were compared to deep learning segmentation methods based on images of wine grape loads that were captured at a wine cellar. The Mask R-CNN model with a ResNet-50 convolutional neural network backbone emerged as the optimal method for this study to determine the amount of MOG in an image of a wine grape load. Furthermore, a statistical analysis was conducted to determine how the MOG on the surface of a grape load relates to the mass of MOG within the corresponding grape load.

Keywords: computer vision, wine grapes, machine learning, machine harvested grapes

Procedia PDF Downloads 62
73 Preprocessing and Fusion of Multiple Representation of Finger Vein patterns using Conventional and Machine Learning techniques

Authors: Tomas Trainys, Algimantas Venckauskas

Abstract:

Application of biometric features to the cryptography for human identification and authentication is widely studied and promising area of the development of high-reliability cryptosystems. Biometric cryptosystems typically are designed for patterns recognition, which allows biometric data acquisition from an individual, extracts feature sets, compares the feature set against the set stored in the vault and gives a result of the comparison. Preprocessing and fusion of biometric data are the most important phases in generating a feature vector for key generation or authentication. Fusion of biometric features is critical for achieving a higher level of security and prevents from possible spoofing attacks. The paper focuses on the tasks of initial processing and fusion of multiple representations of finger vein modality patterns. These tasks are solved by applying conventional image preprocessing methods and machine learning techniques, Convolutional Neural Network (SVM) method for image segmentation and feature extraction. An article presents a method for generating sets of biometric features from a finger vein network using several instances of the same modality. Extracted features sets were fused at the feature level. The proposed method was tested and compared with the performance and accuracy results of other authors.

Keywords: bio-cryptography, biometrics, cryptographic key generation, data fusion, information security, SVM, pattern recognition, finger vein method.

Procedia PDF Downloads 122
72 Deep Vision: A Robust Dominant Colour Extraction Framework for T-Shirts Based on Semantic Segmentation

Authors: Kishore Kumar R., Kaustav Sengupta, Shalini Sood Sehgal, Poornima Santhanam

Abstract:

Fashion is a human expression that is constantly changing. One of the prime factors that consistently influences fashion is the change in colour preferences. The role of colour in our everyday lives is very significant. It subconsciously explains a lot about one’s mindset and mood. Analyzing the colours by extracting them from the outfit images is a critical study to examine the individual’s/consumer behaviour. Several research works have been carried out on extracting colours from images, but to the best of our knowledge, there were no studies that extract colours to specific apparel and identify colour patterns geographically. This paper proposes a framework for accurately extracting colours from T-shirt images and predicting dominant colours geographically. The proposed method consists of two stages: first, a U-Net deep learning model is adopted to segment the T-shirts from the images. Second, the colours are extracted only from the T-shirt segments. The proposed method employs the iMaterialist (Fashion) 2019 dataset for the semantic segmentation task. The proposed framework also includes a mechanism for gathering data and analyzing India’s general colour preferences. From this research, it was observed that black and grey are the dominant colour in different regions of India. The proposed method can be adapted to study fashion’s evolving colour preferences.

Keywords: colour analysis in t-shirts, convolutional neural network, encoder-decoder, k-means clustering, semantic segmentation, U-Net model

Procedia PDF Downloads 83
71 Tomato-Weed Classification by RetinaNet One-Step Neural Network

Authors: Dionisio Andujar, Juan lópez-Correa, Hugo Moreno, Angela Ri

Abstract:

The increased number of weeds in tomato crops highly lower yields. Weed identification with the aim of machine learning is important to carry out site-specific control. The last advances in computer vision are a powerful tool to face the problem. The analysis of RGB (Red, Green, Blue) images through Artificial Neural Networks had been rapidly developed in the past few years, providing new methods for weed classification. The development of the algorithms for crop and weed species classification looks for a real-time classification system using Object Detection algorithms based on Convolutional Neural Networks. The site study was located in commercial corn fields. The classification system has been tested. The procedure can detect and classify weed seedlings in tomato fields. The input to the Neural Network was a set of 10,000 RGB images with a natural infestation of Cyperus rotundus l., Echinochloa crus galli L., Setaria italica L., Portulaca oeracea L., and Solanum nigrum L. The validation process was done with a random selection of RGB images containing the aforementioned species. The mean average precision (mAP) was established as the metric for object detection. The results showed agreements higher than 95 %. The system will provide the input for an online spraying system. Thus, this work plays an important role in Site Specific Weed Management by reducing herbicide use in a single step.

Keywords: deep learning, object detection, cnn, tomato, weeds

Procedia PDF Downloads 81
70 Refined Edge Detection Network

Authors: Omar Elharrouss, Youssef Hmamouche, Assia Kamal Idrissi, Btissam El Khamlichi, Amal El Fallah-Seghrouchni

Abstract:

Edge detection is represented as one of the most challenging tasks in computer vision, due to the complexity of detecting the edges or boundaries in real-world images that contains objects of different types and scales like trees, building as well as various backgrounds. Edge detection is represented also as a key task for many computer vision applications. Using a set of backbones as well as attention modules, deep-learning-based methods improved the detection of edges compared with the traditional methods like Sobel and Canny. However, images of complex scenes still represent a challenge for these methods. Also, the detected edges using the existing approaches suffer from non-refined results while the image output contains many erroneous edges. To overcome this, n this paper, by using the mechanism of residual learning, a refined edge detection network is proposed (RED-Net). By maintaining the high resolution of edges during the training process, and conserving the resolution of the edge image during the network stage, we make the pooling outputs at each stage connected with the output of the previous layer. Also, after each layer, we use an affined batch normalization layer as an erosion operation for the homogeneous region in the image. The proposed methods are evaluated using the most challenging datasets including BSDS500, NYUD, and Multicue. The obtained results outperform the designed edge detection networks in terms of performance metrics and quality of output images.

Keywords: edge detection, convolutional neural networks, deep learning, scale-representation, backbone

Procedia PDF Downloads 71
69 Enhancer: An Effective Transformer Architecture for Single Image Super Resolution

Authors: Pitigalage Chamath Chandira Peiris

Abstract:

A widely researched domain in the field of image processing in recent times has been single image super-resolution, which tries to restore a high-resolution image from a single low-resolution image. Many more single image super-resolution efforts have been completed utilizing equally traditional and deep learning methodologies, as well as a variety of other methodologies. Deep learning-based super-resolution methods, in particular, have received significant interest. As of now, the most advanced image restoration approaches are based on convolutional neural networks; nevertheless, only a few efforts have been performed using Transformers, which have demonstrated excellent performance on high-level vision tasks. The effectiveness of CNN-based algorithms in image super-resolution has been impressive. However, these methods cannot completely capture the non-local features of the data. Enhancer is a simple yet powerful Transformer-based approach for enhancing the resolution of images. A method for single image super-resolution was developed in this study, which utilized an efficient and effective transformer design. This proposed architecture makes use of a locally enhanced window transformer block to alleviate the enormous computational load associated with non-overlapping window-based self-attention. Additionally, it incorporates depth-wise convolution in the feed-forward network to enhance its ability to capture local context. This study is assessed by comparing the results obtained for popular datasets to those obtained by other techniques in the domain.

Keywords: single image super resolution, computer vision, vision transformers, image restoration

Procedia PDF Downloads 79
68 Downscaling Seasonal Sea Surface Temperature Forecasts over the Mediterranean Sea Using Deep Learning

Authors: Redouane Larbi Boufeniza, Jing-Jia Luo

Abstract:

This study assesses the suitability of deep learning (DL) for downscaling sea surface temperature (SST) over the Mediterranean Sea in the context of seasonal forecasting. We design a set of experiments that compare different DL configurations and deploy the best-performing architecture to downscale one-month lead forecasts of June–September (JJAS) SST from the Nanjing University of Information Science and Technology Climate Forecast System version 1.0 (NUIST-CFS1.0) for the period of 1982–2020. We have also introduced predictors over a larger area to include information about the main large-scale circulations that drive SST over the Mediterranean Sea region, which improves the downscaling results. Finally, we validate the raw model and downscaled forecasts in terms of both deterministic and probabilistic verification metrics, as well as their ability to reproduce the observed precipitation extreme and spell indicator indices. The results showed that the convolutional neural network (CNN)-based downscaling consistently improves the raw model forecasts, with lower bias and more accurate representations of the observed mean and extreme SST spatial patterns. Besides, the CNN-based downscaling yields a much more accurate forecast of extreme SST and spell indicators and reduces the significant relevant biases exhibited by the raw model predictions. Moreover, our results show that the CNN-based downscaling yields better skill scores than the raw model forecasts over most portions of the Mediterranean Sea. The results demonstrate the potential usefulness of CNN in downscaling seasonal SST predictions over the Mediterranean Sea, particularly in providing improved forecast products.

Keywords: Mediterranean Sea, sea surface temperature, seasonal forecasting, downscaling, deep learning

Procedia PDF Downloads 50
67 Automatic Classification of Lung Diseases from CT Images

Authors: Abobaker Mohammed Qasem Farhan, Shangming Yang, Mohammed Al-Nehari

Abstract:

Pneumonia is a kind of lung disease that creates congestion in the chest. Such pneumonic conditions lead to loss of life of the severity of high congestion. Pneumonic lung disease is caused by viral pneumonia, bacterial pneumonia, or Covidi-19 induced pneumonia. The early prediction and classification of such lung diseases help to reduce the mortality rate. We propose the automatic Computer-Aided Diagnosis (CAD) system in this paper using the deep learning approach. The proposed CAD system takes input from raw computerized tomography (CT) scans of the patient's chest and automatically predicts disease classification. We designed the Hybrid Deep Learning Algorithm (HDLA) to improve accuracy and reduce processing requirements. The raw CT scans have pre-processed first to enhance their quality for further analysis. We then applied a hybrid model that consists of automatic feature extraction and classification. We propose the robust 2D Convolutional Neural Network (CNN) model to extract the automatic features from the pre-processed CT image. This CNN model assures feature learning with extremely effective 1D feature extraction for each input CT image. The outcome of the 2D CNN model is then normalized using the Min-Max technique. The second step of the proposed hybrid model is related to training and classification using different classifiers. The simulation outcomes using the publically available dataset prove the robustness and efficiency of the proposed model compared to state-of-art algorithms.

Keywords: CT scan, Covid-19, deep learning, image processing, lung disease classification

Procedia PDF Downloads 115
66 A Comparative Study on Deep Learning Models for Pneumonia Detection

Authors: Hichem Sassi

Abstract:

Pneumonia, being a respiratory infection, has garnered global attention due to its rapid transmission and relatively high mortality rates. Timely detection and treatment play a crucial role in significantly reducing mortality associated with pneumonia. Presently, X-ray diagnosis stands out as a reasonably effective method. However, the manual scrutiny of a patient's X-ray chest radiograph by a proficient practitioner usually requires 5 to 15 minutes. In situations where cases are concentrated, this places immense pressure on clinicians for timely diagnosis. Relying solely on the visual acumen of imaging doctors proves to be inefficient, particularly given the low speed of manual analysis. Therefore, the integration of artificial intelligence into the clinical image diagnosis of pneumonia becomes imperative. Additionally, AI recognition is notably rapid, with convolutional neural networks (CNNs) demonstrating superior performance compared to human counterparts in image identification tasks. To conduct our study, we utilized a dataset comprising chest X-ray images obtained from Kaggle, encompassing a total of 5216 training images and 624 test images, categorized into two classes: normal and pneumonia. Employing five mainstream network algorithms, we undertook a comprehensive analysis to classify these diseases within the dataset, subsequently comparing the results. The integration of artificial intelligence, particularly through improved network architectures, stands as a transformative step towards more efficient and accurate clinical diagnoses across various medical domains.

Keywords: deep learning, computer vision, pneumonia, models, comparative study

Procedia PDF Downloads 28
65 Early Depression Detection for Young Adults with a Psychiatric and AI Interdisciplinary Multimodal Framework

Authors: Raymond Xu, Ashley Hua, Andrew Wang, Yuru Lin

Abstract:

During COVID-19, the depression rate has increased dramatically. Young adults are most vulnerable to the mental health effects of the pandemic. Lower-income families have a higher ratio to be diagnosed with depression than the general population, but less access to clinics. This research aims to achieve early depression detection at low cost, large scale, and high accuracy with an interdisciplinary approach by incorporating clinical practices defined by American Psychiatric Association (APA) as well as multimodal AI framework. The proposed approach detected the nine depression symptoms with Natural Language Processing sentiment analysis and a symptom-based Lexicon uniquely designed for young adults. The experiments were conducted on the multimedia survey results from adolescents and young adults and unbiased Twitter communications. The result was further aggregated with the facial emotional cues analyzed by the Convolutional Neural Network on the multimedia survey videos. Five experiments each conducted on 10k data entries reached consistent results with an average accuracy of 88.31%, higher than the existing natural language analysis models. This approach can reach 300+ million daily active Twitter users and is highly accessible by low-income populations to promote early depression detection to raise awareness in adolescents and young adults and reveal complementary cues to assist clinical depression diagnosis.

Keywords: artificial intelligence, COVID-19, depression detection, psychiatric disorder

Procedia PDF Downloads 108
64 A Survey of Skin Cancer Detection and Classification from Skin Lesion Images Using Deep Learning

Authors: Joseph George, Anne Kotteswara Roa

Abstract:

Skin disease is one of the most common and popular kinds of health issues faced by people nowadays. Skin cancer (SC) is one among them, and its detection relies on the skin biopsy outputs and the expertise of the doctors, but it consumes more time and some inaccurate results. At the early stage, skin cancer detection is a challenging task, and it easily spreads to the whole body and leads to an increase in the mortality rate. Skin cancer is curable when it is detected at an early stage. In order to classify correct and accurate skin cancer, the critical task is skin cancer identification and classification, and it is more based on the cancer disease features such as shape, size, color, symmetry and etc. More similar characteristics are present in many skin diseases; hence it makes it a challenging issue to select important features from a skin cancer dataset images. Hence, the skin cancer diagnostic accuracy is improved by requiring an automated skin cancer detection and classification framework; thereby, the human expert’s scarcity is handled. Recently, the deep learning techniques like Convolutional neural network (CNN), Deep belief neural network (DBN), Artificial neural network (ANN), Recurrent neural network (RNN), and Long and short term memory (LSTM) have been widely used for the identification and classification of skin cancers. This survey reviews different DL techniques for skin cancer identification and classification. The performance metrics such as precision, recall, accuracy, sensitivity, specificity, and F-measures are used to evaluate the effectiveness of SC identification using DL techniques. By using these DL techniques, the classification accuracy increases along with the mitigation of computational complexities and time consumption.

Keywords: skin cancer, deep learning, performance measures, accuracy, datasets

Procedia PDF Downloads 99
63 COVID-19 Detection from Computed Tomography Images Using UNet Segmentation, Region Extraction, and Classification Pipeline

Authors: Kenan Morani, Esra Kaya Ayana

Abstract:

This study aimed to develop a novel pipeline for COVID-19 detection using a large and rigorously annotated database of computed tomography (CT) images. The pipeline consists of UNet-based segmentation, lung extraction, and a classification part, with the addition of optional slice removal techniques following the segmentation part. In this work, a batch normalization was added to the original UNet model to produce lighter and better localization, which is then utilized to build a full pipeline for COVID-19 diagnosis. To evaluate the effectiveness of the proposed pipeline, various segmentation methods were compared in terms of their performance and complexity. The proposed segmentation method with batch normalization outperformed traditional methods and other alternatives, resulting in a higher dice score on a publicly available dataset. Moreover, at the slice level, the proposed pipeline demonstrated high validation accuracy, indicating the efficiency of predicting 2D slices. At the patient level, the full approach exhibited higher validation accuracy and macro F1 score compared to other alternatives, surpassing the baseline. The classification component of the proposed pipeline utilizes a convolutional neural network (CNN) to make final diagnosis decisions. The COV19-CT-DB dataset, which contains a large number of CT scans with various types of slices and rigorously annotated for COVID-19 detection, was utilized for classification. The proposed pipeline outperformed many other alternatives on the dataset.

Keywords: classification, computed tomography, lung extraction, macro F1 score, UNet segmentation

Procedia PDF Downloads 104
62 Remote Sensing through Deep Neural Networks for Satellite Image Classification

Authors: Teja Sai Puligadda

Abstract:

Satellite images in detail can serve an important role in the geographic study. Quantitative and qualitative information provided by the satellite and remote sensing images minimizes the complexity of work and time. Data/images are captured at regular intervals by satellite remote sensing systems, and the amount of data collected is often enormous, and it expands rapidly as technology develops. Interpreting remote sensing images, geographic data mining, and researching distinct vegetation types such as agricultural and forests are all part of satellite image categorization. One of the biggest challenge data scientists faces while classifying satellite images is finding the best suitable classification algorithms based on the available that could able to classify images with utmost accuracy. In order to categorize satellite images, which is difficult due to the sheer volume of data, many academics are turning to deep learning machine algorithms. As, the CNN algorithm gives high accuracy in image recognition problems and automatically detects the important features without any human supervision and the ANN algorithm stores information on the entire network (Abhishek Gupta., 2020), these two deep learning algorithms have been used for satellite image classification. This project focuses on remote sensing through Deep Neural Networks i.e., ANN and CNN with Deep Sat (SAT-4) Airborne dataset for classifying images. Thus, in this project of classifying satellite images, the algorithms ANN and CNN are implemented, evaluated & compared and the performance is analyzed through evaluation metrics such as Accuracy and Loss. Additionally, the Neural Network algorithm which gives the lowest bias and lowest variance in solving multi-class satellite image classification is analyzed.

Keywords: artificial neural network, convolutional neural network, remote sensing, accuracy, loss

Procedia PDF Downloads 130
61 Credit Card Fraud Detection with Ensemble Model: A Meta-Heuristic Approach

Authors: Gong Zhilin, Jing Yang, Jian Yin

Abstract:

The purpose of this paper is to develop a novel system for credit card fraud detection based on sequential modeling of data using hybrid deep learning models. The projected model encapsulates five major phases are pre-processing, imbalance-data handling, feature extraction, optimal feature selection, and fraud detection with an ensemble classifier. The collected raw data (input) is pre-processed to enhance the quality of the data through alleviation of the missing data, noisy data as well as null values. The pre-processed data are class imbalanced in nature, and therefore they are handled effectively with the K-means clustering-based SMOTE model. From the balanced class data, the most relevant features like improved Principal Component Analysis (PCA), statistical features (mean, median, standard deviation) and higher-order statistical features (skewness and kurtosis). Among the extracted features, the most optimal features are selected with the Self-improved Arithmetic Optimization Algorithm (SI-AOA). This SI-AOA model is the conceptual improvement of the standard Arithmetic Optimization Algorithm. The deep learning models like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and optimized Quantum Deep Neural Network (QDNN). The LSTM and CNN are trained with the extracted optimal features. The outcomes from LSTM and CNN will enter as input to optimized QDNN that provides the final detection outcome. Since the QDNN is the ultimate detector, its weight function is fine-tuned with the Self-improved Arithmetic Optimization Algorithm (SI-AOA).

Keywords: credit card, data mining, fraud detection, money transactions

Procedia PDF Downloads 104
60 Medical Diagnosis of Retinal Diseases Using Artificial Intelligence Deep Learning Models

Authors: Ethan James

Abstract:

Over one billion people worldwide suffer from some level of vision loss or blindness as a result of progressive retinal diseases. Many patients, particularly in developing areas, are incorrectly diagnosed or undiagnosed whatsoever due to unconventional diagnostic tools and screening methods. Artificial intelligence (AI) based on deep learning (DL) convolutional neural networks (CNN) have recently gained a high interest in ophthalmology for its computer-imaging diagnosis, disease prognosis, and risk assessment. Optical coherence tomography (OCT) is a popular imaging technique used to capture high-resolution cross-sections of retinas. In ophthalmology, DL has been applied to fundus photographs, optical coherence tomography, and visual fields, achieving robust classification performance in the detection of various retinal diseases including macular degeneration, diabetic retinopathy, and retinitis pigmentosa. However, there is no complete diagnostic model to analyze these retinal images that provide a diagnostic accuracy above 90%. Thus, the purpose of this project was to develop an AI model that utilizes machine learning techniques to automatically diagnose specific retinal diseases from OCT scans. The algorithm consists of neural network architecture that was trained from a dataset of over 20,000 real-world OCT images to train the robust model to utilize residual neural networks with cyclic pooling. This DL model can ultimately aid ophthalmologists in diagnosing patients with these retinal diseases more quickly and more accurately, therefore facilitating earlier treatment, which results in improved post-treatment outcomes.

Keywords: artificial intelligence, deep learning, imaging, medical devices, ophthalmic devices, ophthalmology, retina

Procedia PDF Downloads 153
59 Similar Script Character Recognition on Kannada and Telugu

Authors: Gurukiran Veerapur, Nytik Birudavolu, Seetharam U. N., Chandravva Hebbi, R. Praneeth Reddy

Abstract:

This work presents a robust approach for the recognition of characters in Telugu and Kannada, two South Indian scripts with structural similarities in characters. To recognize the characters exhaustive datasets are required, but there are only a few publicly available datasets. As a result, we decided to create a dataset for one language (source language),train the model with it, and then test it with the target language.Telugu is the target language in this work, whereas Kannada is the source language. The suggested method makes use of Canny edge features to increase character identification accuracy on pictures with noise and different lighting. A dataset of 45,150 images containing printed Kannada characters was created. The Nudi software was used to automatically generate printed Kannada characters with different writing styles and variations. Manual labelling was employed to ensure the accuracy of the character labels. The deep learning models like CNN (Convolutional Neural Network) and Visual Attention neural network (VAN) are used to experiment with the dataset. A Visual Attention neural network (VAN) architecture was adopted, incorporating additional channels for Canny edge features as the results obtained were good with this approach. The model's accuracy on the combined Telugu and Kannada test dataset was an outstanding 97.3%. Performance was better with Canny edge characteristics applied than with a model that solely used the original grayscale images. The accuracy of the model was found to be 80.11% for Telugu characters and 98.01% for Kannada words when it was tested with these languages. This model, which makes use of cutting-edge machine learning techniques, shows excellent accuracy when identifying and categorizing characters from these scripts.

Keywords: base characters, modifiers, guninthalu, aksharas, vattakshara, VAN

Procedia PDF Downloads 28
58 Analysis of Travel Behavior Patterns of Frequent Passengers after the Section Shutdown of Urban Rail Transit - Taking the Huaqiao Section of Shanghai Metro Line 11 Shutdown During the COVID-19 Epidemic as an Example

Authors: Hongyun Li, Zhibin Jiang

Abstract:

The travel of passengers in the urban rail transit network is influenced by changes in network structure and operational status, and the response of individual travel preferences to these changes also varies. Firstly, the influence of the suspension of urban rail transit line sections on passenger travel along the line is analyzed. Secondly, passenger travel trajectories containing multi-dimensional semantics are described based on network UD data. Next, passenger panel data based on spatio-temporal sequences is constructed to achieve frequent passenger clustering. Then, the Graph Convolutional Network (GCN) is used to model and identify the changes in travel modes of different types of frequent passengers. Finally, taking Shanghai Metro Line 11 as an example, the travel behavior patterns of frequent passengers after the Huaqiao section shutdown during the COVID-19 epidemic are analyzed. The results showed that after the section shutdown, most passengers would transfer to the nearest Anting station for boarding, while some passengers would transfer to other stations for boarding or cancel their travels directly. Among the passengers who transferred to Anting station for boarding, most of passengers maintained the original normalized travel mode, a small number of passengers waited for a few days before transferring to Anting station for boarding, and only a few number of passengers stopped traveling at Anting station or transferred to other stations after a few days of boarding on Anting station. The results can provide a basis for understanding urban rail transit passenger travel patterns and improving the accuracy of passenger flow prediction in abnormal operation scenarios.

Keywords: urban rail transit, section shutdown, frequent passenger, travel behavior pattern

Procedia PDF Downloads 54
57 Loss Function Optimization for CNN-Based Fingerprint Anti-Spoofing

Authors: Yehjune Heo

Abstract:

As biometric systems become widely deployed, the security of identification systems can be easily attacked by various spoof materials. This paper contributes to finding a reliable and practical anti-spoofing method using Convolutional Neural Networks (CNNs) based on the types of loss functions and optimizers. The types of CNNs used in this paper include AlexNet, VGGNet, and ResNet. By using various loss functions including Cross-Entropy, Center Loss, Cosine Proximity, and Hinge Loss, and various loss optimizers which include Adam, SGD, RMSProp, Adadelta, Adagrad, and Nadam, we obtained significant performance changes. We realize that choosing the correct loss function for each model is crucial since different loss functions lead to different errors on the same evaluation. By using a subset of the Livdet 2017 database, we validate our approach to compare the generalization power. It is important to note that we use a subset of LiveDet and the database is the same across all training and testing for each model. This way, we can compare the performance, in terms of generalization, for the unseen data across all different models. The best CNN (AlexNet) with the appropriate loss function and optimizers result in more than 3% of performance gain over the other CNN models with the default loss function and optimizer. In addition to the highest generalization performance, this paper also contains the models with high accuracy associated with parameters and mean average error rates to find the model that consumes the least memory and computation time for training and testing. Although AlexNet has less complexity over other CNN models, it is proven to be very efficient. For practical anti-spoofing systems, the deployed version should use a small amount of memory and should run very fast with high anti-spoofing performance. For our deployed version on smartphones, additional processing steps, such as quantization and pruning algorithms, have been applied in our final model.

Keywords: anti-spoofing, CNN, fingerprint recognition, loss function, optimizer

Procedia PDF Downloads 109
56 Applying Multiplicative Weight Update to Skin Cancer Classifiers

Authors: Animish Jain

Abstract:

This study deals with using Multiplicative Weight Update within artificial intelligence and machine learning to create models that can diagnose skin cancer using microscopic images of cancer samples. In this study, the multiplicative weight update method is used to take the predictions of multiple models to try and acquire more accurate results. Logistic Regression, Convolutional Neural Network (CNN), and Support Vector Machine Classifier (SVMC) models are employed within the Multiplicative Weight Update system. These models are trained on pictures of skin cancer from the ISIC-Archive, to look for patterns to label unseen scans as either benign or malignant. These models are utilized in a multiplicative weight update algorithm which takes into account the precision and accuracy of each model through each successive guess to apply weights to their guess. These guesses and weights are then analyzed together to try and obtain the correct predictions. The research hypothesis for this study stated that there would be a significant difference in the accuracy of the three models and the Multiplicative Weight Update system. The SVMC model had an accuracy of 77.88%. The CNN model had an accuracy of 85.30%. The Logistic Regression model had an accuracy of 79.09%. Using Multiplicative Weight Update, the algorithm received an accuracy of 72.27%. The final conclusion that was drawn was that there was a significant difference in the accuracy of the three models and the Multiplicative Weight Update system. The conclusion was made that using a CNN model would be the best option for this problem rather than a Multiplicative Weight Update system. This is due to the possibility that Multiplicative Weight Update is not effective in a binary setting where there are only two possible classifications. In a categorical setting with multiple classes and groupings, a Multiplicative Weight Update system might become more proficient as it takes into account the strengths of multiple different models to classify images into multiple categories rather than only two categories, as shown in this study. This experimentation and computer science project can help to create better algorithms and models for the future of artificial intelligence in the medical imaging field.

Keywords: artificial intelligence, machine learning, multiplicative weight update, skin cancer

Procedia PDF Downloads 51
55 Improving Fingerprinting-Based Localization System Using Generative AI

Authors: Getaneh Berie Tarekegn

Abstract:

A precise localization system is crucial for many artificial intelligence Internet of Things (AI-IoT) applications in the era of smart cities. Their applications include traffic monitoring, emergency alarming, environmental monitoring, location-based advertising, intelligent transportation, and smart health care. The most common method for providing continuous positioning services in outdoor environments is by using a global navigation satellite system (GNSS). Due to nonline-of-sight, multipath, and weather conditions, GNSS systems do not perform well in dense urban, urban, and suburban areas.This paper proposes a generative AI-based positioning scheme for large-scale wireless settings using fingerprinting techniques. In this article, we presented a semi-supervised deep convolutional generative adversarial network (S-DCGAN)-based radio map construction method for real-time device localization. It also employed a reliable signal fingerprint feature extraction method with t-distributed stochastic neighbor embedding (t-SNE), which extracts dominant features while eliminating noise from hybrid WLAN and long-term evolution (LTE) fingerprints. The proposed scheme reduced the workload of site surveying required to build the fingerprint database by up to 78.5% and significantly improved positioning accuracy. The results show that the average positioning error of GAILoc is less than 0.39 m, and more than 90% of the errors are less than 0.82 m. According to numerical results, SRCLoc improves positioning performance and reduces radio map construction costs significantly compared to traditional methods.

Keywords: location-aware services, feature extraction technique, generative adversarial network, long short-term memory, support vector machine

Procedia PDF Downloads 25
54 Improving Chest X-Ray Disease Detection with Enhanced Data Augmentation Using Novel Approach of Diverse Conditional Wasserstein Generative Adversarial Networks

Authors: Malik Muhammad Arslan, Muneeb Ullah, Dai Shihan, Daniyal Haider, Xiaodong Yang

Abstract:

Chest X-rays are instrumental in the detection and monitoring of a wide array of diseases, including viral infections such as COVID-19, tuberculosis, pneumonia, lung cancer, and various cardiac and pulmonary conditions. To enhance the accuracy of diagnosis, artificial intelligence (AI) algorithms, particularly deep learning models like Convolutional Neural Networks (CNNs), are employed. However, these deep learning models demand a substantial and varied dataset to attain optimal precision. Generative Adversarial Networks (GANs) can be employed to create new data, thereby supplementing the existing dataset and enhancing the accuracy of deep learning models. Nevertheless, GANs have their limitations, such as issues related to stability, convergence, and the ability to distinguish between authentic and fabricated data. In order to overcome these challenges and advance the detection and classification of CXR normal and abnormal images, this study introduces a distinctive technique known as DCWGAN (Diverse Conditional Wasserstein GAN) for generating synthetic chest X-ray (CXR) images. The study evaluates the effectiveness of this Idiosyncratic DCWGAN technique using the ResNet50 model and compares its results with those obtained using the traditional GAN approach. The findings reveal that the ResNet50 model trained on the DCWGAN-generated dataset outperformed the model trained on the classic GAN-generated dataset. Specifically, the ResNet50 model utilizing DCWGAN synthetic images achieved impressive performance metrics with an accuracy of 0.961, precision of 0.955, recall of 0.970, and F1-Measure of 0.963. These results indicate the promising potential for the early detection of diseases in CXR images using this Inimitable approach.

Keywords: CNN, classification, deep learning, GAN, Resnet50

Procedia PDF Downloads 54
53 Digi-Buddy: A Smart Cane with Artificial Intelligence and Real-Time Assistance

Authors: Amaladhithyan Krishnamoorthy, Ruvaitha Banu

Abstract:

Vision is considered as the most important sense in humans, without which leading a normal can be often difficult. There are many existing smart canes for visually impaired with obstacle detection using ultrasonic transducer to help them navigate. Though the basic smart cane increases the safety of the users, it does not help in filling the void of visual loss. This paper introduces the concept of Digi-Buddy which is an evolved smart cane for visually impaired. The cane consists for several modules, apart from the basic obstacle detection features; the Digi-Buddy assists the user by capturing video/images and streams them to the server using a wide-angled camera, which then detects the objects using Deep Convolutional Neural Network. In addition to determining what the particular image/object is, the distance of the object is assessed by the ultrasonic transducer. The sound generation application, modelled with the help of Natural Language Processing is used to convert the processed images/object into audio. The object detected is signified by its name which is transmitted to the user with the help of Bluetooth hear phones. The object detection is extended to facial recognition which maps the faces of the person the user meets in the database of face images and alerts the user about the person. One of other crucial function consists of an automatic-intimation-alarm which is triggered when the user is in an emergency. If the user recovers within a set time, a button is provisioned in the cane to stop the alarm. Else an automatic intimation is sent to friends and family about the whereabouts of the user using GPS. In addition to safety and security by the existing smart canes, the proposed concept devices to be implemented as a prototype helping visually-impaired visualize their surroundings through audio more in an amicable way.

Keywords: artificial intelligence, facial recognition, natural language processing, internet of things

Procedia PDF Downloads 323
52 Improving Fingerprinting-Based Localization System Using Generative Artificial Intelligence

Authors: Getaneh Berie Tarekegn

Abstract:

A precise localization system is crucial for many artificial intelligence Internet of Things (AI-IoT) applications in the era of smart cities. Their applications include traffic monitoring, emergency alarming, environmental monitoring, location-based advertising, intelligent transportation, and smart health care. The most common method for providing continuous positioning services in outdoor environments is by using a global navigation satellite system (GNSS). Due to nonline-of-sight, multipath, and weather conditions, GNSS systems do not perform well in dense urban, urban, and suburban areas.This paper proposes a generative AI-based positioning scheme for large-scale wireless settings using fingerprinting techniques. In this article, we presented a novel semi-supervised deep convolutional generative adversarial network (S-DCGAN)-based radio map construction method for real-time device localization. We also employed a reliable signal fingerprint feature extraction method with t-distributed stochastic neighbor embedding (t-SNE), which extracts dominant features while eliminating noise from hybrid WLAN and long-term evolution (LTE) fingerprints. The proposed scheme reduced the workload of site surveying required to build the fingerprint database by up to 78.5% and significantly improved positioning accuracy. The results show that the average positioning error of GAILoc is less than 39 cm, and more than 90% of the errors are less than 82 cm. That is, numerical results proved that, in comparison to traditional methods, the proposed SRCLoc method can significantly improve positioning performance and reduce radio map construction costs.

Keywords: location-aware services, feature extraction technique, generative adversarial network, long short-term memory, support vector machine

Procedia PDF Downloads 45
51 Detecting Memory-Related Gene Modules in sc/snRNA-seq Data by Deep-Learning

Authors: Yong Chen

Abstract:

To understand the detailed molecular mechanisms of memory formation in engram cells is one of the most fundamental questions in neuroscience. Recent single-cell RNA-seq (scRNA-seq) and single-nucleus RNA-seq (snRNA-seq) techniques have allowed us to explore the sparsely activated engram ensembles, enabling access to the molecular mechanisms that underlie experience-dependent memory formation and consolidation. However, the absence of specific and powerful computational methods to detect memory-related genes (modules) and their regulatory relationships in the sc/snRNA-seq datasets has strictly limited the analysis of underlying mechanisms and memory coding principles in mammalian brains. Here, we present a deep-learning method named SCENTBOX, to detect memory-related gene modules and causal regulatory relationships among themfromsc/snRNA-seq datasets. SCENTBOX first constructs codifferential expression gene network (CEGN) from case versus control sc/snRNA-seq datasets. It then detects the highly correlated modules of differential expression genes (DEGs) in CEGN. The deep network embedding and attention-based convolutional neural network strategies are employed to precisely detect regulatory relationships among DEG genes in a module. We applied them on scRNA-seq datasets of TRAP; Ai14 mouse neurons with fear memory and detected not only known memory-related genes, but also the modules and potential causal regulations. Our results provided novel regulations within an interesting module, including Arc, Bdnf, Creb, Dusp1, Rgs4, and Btg2. Overall, our methods provide a general computational tool for processing sc/snRNA-seq data from case versus control studie and a systematic investigation of fear-memory-related gene modules.

Keywords: sc/snRNA-seq, memory formation, deep learning, gene module, causal inference

Procedia PDF Downloads 92
50 GAILoc: Improving Fingerprinting-Based Localization System Using Generative Artificial Intelligence

Authors: Getaneh Berie Tarekegn

Abstract:

A precise localization system is crucial for many artificial intelligence Internet of Things (AI-IoT) applications in the era of smart cities. Their applications include traffic monitoring, emergency alarming, environmental monitoring, location-based advertising, intelligent transportation, and smart health care. The most common method for providing continuous positioning services in outdoor environments is by using a global navigation satellite system (GNSS). Due to nonline-of-sight, multipath, and weather conditions, GNSS systems do not perform well in dense urban, urban, and suburban areas.This paper proposes a generative AI-based positioning scheme for large-scale wireless settings using fingerprinting techniques. In this article, we presented a novel semi-supervised deep convolutional generative adversarial network (S-DCGAN)-based radio map construction method for real-time device localization. We also employed a reliable signal fingerprint feature extraction method with t-distributed stochastic neighbor embedding (t-SNE), which extracts dominant features while eliminating noise from hybrid WLAN and long-term evolution (LTE) fingerprints. The proposed scheme reduced the workload of site surveying required to build the fingerprint database by up to 78.5% and significantly improved positioning accuracy. The results show that the average positioning error of GAILoc is less than 39 cm, and more than 90% of the errors are less than 82 cm. That is, numerical results proved that, in comparison to traditional methods, the proposed SRCLoc method can significantly improve positioning performance and reduce radio map construction costs.

Keywords: location-aware services, feature extraction technique, generative adversarial network, long short-term memory, support vector machine

Procedia PDF Downloads 41
49 American Sign Language Recognition System

Authors: Rishabh Nagpal, Riya Uchagaonkar, Venkata Naga Narasimha Ashish Mernedi, Ahmed Hambaba

Abstract:

The rapid evolution of technology in the communication sector continually seeks to bridge the gap between different communities, notably between the deaf community and the hearing world. This project develops a comprehensive American Sign Language (ASL) recognition system, leveraging the advanced capabilities of convolutional neural networks (CNNs) and vision transformers (ViTs) to interpret and translate ASL in real-time. The primary objective of this system is to provide an effective communication tool that enables seamless interaction through accurate sign language interpretation. The architecture of the proposed system integrates dual networks -VGG16 for precise spatial feature extraction and vision transformers for contextual understanding of the sign language gestures. The system processes live input, extracting critical features through these sophisticated neural network models, and combines them to enhance gesture recognition accuracy. This integration facilitates a robust understanding of ASL by capturing detailed nuances and broader gesture dynamics. The system is evaluated through a series of tests that measure its efficiency and accuracy in real-world scenarios. Results indicate a high level of precision in recognizing diverse ASL signs, substantiating the potential of this technology in practical applications. Challenges such as enhancing the system’s ability to operate in varied environmental conditions and further expanding the dataset for training were identified and discussed. Future work will refine the model’s adaptability and incorporate haptic feedback to enhance the interactivity and richness of the user experience. This project demonstrates the feasibility of an advanced ASL recognition system and lays the groundwork for future innovations in assistive communication technologies.

Keywords: sign language, computer vision, vision transformer, VGG16, CNN

Procedia PDF Downloads 1
48 Census and Mapping of Oil Palms Over Satellite Dataset Using Deep Learning Model

Authors: Gholba Niranjan Dilip, Anil Kumar

Abstract:

Conduct of accurate reliable mapping of oil palm plantations and census of individual palm trees is a huge challenge. This study addresses this challenge and developed an optimized solution implemented deep learning techniques on remote sensing data. The oil palm is a very important tropical crop. To improve its productivity and land management, it is imperative to have accurate census over large areas. Since, manual census is costly and prone to approximations, a methodology for automated census using panchromatic images from Cartosat-2, SkySat and World View-3 satellites is demonstrated. It is selected two different study sites in Indonesia. The customized set of training data and ground-truth data are created for this study from Cartosat-2 images. The pre-trained model of Single Shot MultiBox Detector (SSD) Lite MobileNet V2 Convolutional Neural Network (CNN) from the TensorFlow Object Detection API is subjected to transfer learning on this customized dataset. The SSD model is able to generate the bounding boxes for each oil palm and also do the counting of palms with good accuracy on the panchromatic images. The detection yielded an F-Score of 83.16 % on seven different images. The detections are buffered and dissolved to generate polygons demarcating the boundaries of the oil palm plantations. This provided the area under the plantations and also gave maps of their location, thereby completing the automated census, with a fairly high accuracy (≈100%). The trained CNN was found competent enough to detect oil palm crowns from images obtained from multiple satellite sensors and of varying temporal vintage. It helped to estimate the increase in oil palm plantations from 2014 to 2021 in the study area. The study proved that high-resolution panchromatic satellite image can successfully be used to undertake census of oil palm plantations using CNNs.

Keywords: object detection, oil palm tree census, panchromatic images, single shot multibox detector

Procedia PDF Downloads 141