Search results for: VGG16
18 A Comprehensive Study of Camouflaged Object Detection Using Deep Learning
Authors: Khalak Bin Khair, Saqib Jahir, Mohammed Ibrahim, Fahad Bin, Debajyoti Karmaker
Abstract:
Object detection is a computer technology that deals with searching through digital images and videos for occurrences of semantic elements of a particular class. It is associated with image processing and computer vision. On top of object detection, we detect camouflage objects within an image using Deep Learning techniques. Deep learning may be a subset of machine learning that's essentially a three-layer neural network Over 6500 images that possess camouflage properties are gathered from various internet sources and divided into 4 categories to compare the result. Those images are labeled and then trained and tested using vgg16 architecture on the jupyter notebook using the TensorFlow platform. The architecture is further customized using Transfer Learning. Methods for transferring information from one or more of these source tasks to increase learning in a related target task are created through transfer learning. The purpose of this transfer of learning methodologies is to aid in the evolution of machine learning to the point where it is as efficient as human learning.Keywords: deep learning, transfer learning, TensorFlow, camouflage, object detection, architecture, accuracy, model, VGG16
Procedia PDF Downloads 14617 Recognition of Gene Names from Gene Pathway Figures Using Siamese Network
Authors: Muhammad Azam, Micheal Olaolu Arowolo, Fei He, Mihail Popescu, Dong Xu
Abstract:
The number of biological papers is growing quickly, which means that the number of biological pathway figures in those papers is also increasing quickly. Each pathway figure shows extensive biological information, like the names of genes and how the genes are related. However, manually annotating pathway figures takes a lot of time and work. Even though using advanced image understanding models could speed up the process of curation, these models still need to be made more accurate. To improve gene name recognition from pathway figures, we applied a Siamese network to map image segments to a library of pictures containing known genes in a similar way to person recognition from photos in many photo applications. We used a triple loss function and a triplet spatial pyramid pooling network by combining the triplet convolution neural network and the spatial pyramid pooling (TSPP-Net). We compared VGG19 and VGG16 as the Siamese network model. VGG16 achieved better performance with an accuracy of 93%, which is much higher than OCR results.Keywords: biological pathway, image understanding, gene name recognition, object detection, Siamese network, VGG
Procedia PDF Downloads 28816 Optimizing Perennial Plants Image Classification by Fine-Tuning Deep Neural Networks
Authors: Khairani Binti Supyan, Fatimah Khalid, Mas Rina Mustaffa, Azreen Bin Azman, Amirul Azuani Romle
Abstract:
Perennial plant classification plays a significant role in various agricultural and environmental applications, assisting in plant identification, disease detection, and biodiversity monitoring. Nevertheless, attaining high accuracy in perennial plant image classification remains challenging due to the complex variations in plant appearance, the diverse range of environmental conditions under which images are captured, and the inherent variability in image quality stemming from various factors such as lighting conditions, camera settings, and focus. This paper proposes an adaptation approach to optimize perennial plant image classification by fine-tuning the pre-trained DNNs model. This paper explores the efficacy of fine-tuning prevalent architectures, namely VGG16, ResNet50, and InceptionV3, leveraging transfer learning to tailor the models to the specific characteristics of perennial plant datasets. A subset of the MYLPHerbs dataset consisted of 6 perennial plant species of 13481 images under various environmental conditions that were used in the experiments. Different strategies for fine-tuning, including adjusting learning rates, training set sizes, data augmentation, and architectural modifications, were investigated. The experimental outcomes underscore the effectiveness of fine-tuning deep neural networks for perennial plant image classification, with ResNet50 showcasing the highest accuracy of 99.78%. Despite ResNet50's superior performance, both VGG16 and InceptionV3 achieved commendable accuracy of 99.67% and 99.37%, respectively. The overall outcomes reaffirm the robustness of the fine-tuning approach across different deep neural network architectures, offering insights into strategies for optimizing model performance in the domain of perennial plant image classification.Keywords: perennial plants, image classification, deep neural networks, fine-tuning, transfer learning, VGG16, ResNet50, InceptionV3
Procedia PDF Downloads 6315 Brain Tumor Detection and Classification Using Pre-Trained Deep Learning Models
Authors: Aditya Karade, Sharada Falane, Dhananjay Deshmukh, Vijaykumar Mantri
Abstract:
Brain tumors pose a significant challenge in healthcare due to their complex nature and impact on patient outcomes. The application of deep learning (DL) algorithms in medical imaging have shown promise in accurate and efficient brain tumour detection. This paper explores the performance of various pre-trained DL models ResNet50, Xception, InceptionV3, EfficientNetB0, DenseNet121, NASNetMobile, VGG19, VGG16, and MobileNet on a brain tumour dataset sourced from Figshare. The dataset consists of MRI scans categorizing different types of brain tumours, including meningioma, pituitary, glioma, and no tumour. The study involves a comprehensive evaluation of these models’ accuracy and effectiveness in classifying brain tumour images. Data preprocessing, augmentation, and finetuning techniques are employed to optimize model performance. Among the evaluated deep learning models for brain tumour detection, ResNet50 emerges as the top performer with an accuracy of 98.86%. Following closely is Xception, exhibiting a strong accuracy of 97.33%. These models showcase robust capabilities in accurately classifying brain tumour images. On the other end of the spectrum, VGG16 trails with the lowest accuracy at 89.02%.Keywords: brain tumour, MRI image, detecting and classifying tumour, pre-trained models, transfer learning, image segmentation, data augmentation
Procedia PDF Downloads 7414 Autism Disease Detection Using Transfer Learning Techniques: Performance Comparison between Central Processing Unit vs. Graphics Processing Unit Functions for Neural Networks
Authors: Mst Shapna Akter, Hossain Shahriar
Abstract:
Neural network approaches are machine learning methods used in many domains, such as healthcare and cyber security. Neural networks are mostly known for dealing with image datasets. While training with the images, several fundamental mathematical operations are carried out in the Neural Network. The operation includes a number of algebraic and mathematical functions, including derivative, convolution, and matrix inversion and transposition. Such operations require higher processing power than is typically needed for computer usage. Central Processing Unit (CPU) is not appropriate for a large image size of the dataset as it is built with serial processing. While Graphics Processing Unit (GPU) has parallel processing capabilities and, therefore, has higher speed. This paper uses advanced Neural Network techniques such as VGG16, Resnet50, Densenet, Inceptionv3, Xception, Mobilenet, XGBOOST-VGG16, and our proposed models to compare CPU and GPU resources. A system for classifying autism disease using face images of an autistic and non-autistic child was used to compare performance during testing. We used evaluation matrices such as Accuracy, F1 score, Precision, Recall, and Execution time. It has been observed that GPU runs faster than the CPU in all tests performed. Moreover, the performance of the Neural Network models in terms of accuracy increases on GPU compared to CPU.Keywords: autism disease, neural network, CPU, GPU, transfer learning
Procedia PDF Downloads 11713 Melanoma and Non-Melanoma, Skin Lesion Classification, Using a Deep Learning Model
Authors: Shaira L. Kee, Michael Aaron G. Sy, Myles Joshua T. Tan, Hezerul Abdul Karim, Nouar AlDahoul
Abstract:
Skin diseases are considered the fourth most common disease, with melanoma and non-melanoma skin cancer as the most common type of cancer in Caucasians. The alarming increase in Skin Cancer cases shows an urgent need for further research to improve diagnostic methods, as early diagnosis can significantly improve the 5-year survival rate. Machine Learning algorithms for image pattern analysis in diagnosing skin lesions can dramatically increase the accuracy rate of detection and decrease possible human errors. Several studies have shown the diagnostic performance of computer algorithms outperformed dermatologists. However, existing methods still need improvements to reduce diagnostic errors and generate efficient and accurate results. Our paper proposes an ensemble method to classify dermoscopic images into benign and malignant skin lesions. The experiments were conducted using the International Skin Imaging Collaboration (ISIC) image samples. The dataset contains 3,297 dermoscopic images with benign and malignant categories. The results show improvement in performance with an accuracy of 88% and an F1 score of 87%, outperforming other existing models such as support vector machine (SVM), Residual network (ResNet50), EfficientNetB0, EfficientNetB4, and VGG16.Keywords: deep learning - VGG16 - efficientNet - CNN – ensemble – dermoscopic images - melanoma
Procedia PDF Downloads 8012 Neural Network based Risk Detection for Dyslexia and Dysgraphia in Sinhala Language Speaking Children
Authors: Budhvin T. Withana, Sulochana Rupasinghe
Abstract:
The educational system faces a significant concern with regards to Dyslexia and Dysgraphia, which are learning disabilities impacting reading and writing abilities. This is particularly challenging for children who speak the Sinhala language due to its complexity and uniqueness. Commonly used methods to detect the risk of Dyslexia and Dysgraphia rely on subjective assessments, leading to limited coverage and time-consuming processes. Consequently, delays in diagnoses and missed opportunities for early intervention can occur. To address this issue, the project developed a hybrid model that incorporates various deep learning techniques to detect the risk of Dyslexia and Dysgraphia. Specifically, Resnet50, VGG16, and YOLOv8 models were integrated to identify handwriting issues. The outputs of these models were then combined with other input data and fed into an MLP model. Hyperparameters of the MLP model were fine-tuned using Grid Search CV, enabling the identification of optimal values for the model. This approach proved to be highly effective in accurately predicting the risk of Dyslexia and Dysgraphia, providing a valuable tool for early detection and intervention. The Resnet50 model exhibited a training accuracy of 0.9804 and a validation accuracy of 0.9653. The VGG16 model achieved a training accuracy of 0.9991 and a validation accuracy of 0.9891. The MLP model demonstrated impressive results with a training accuracy of 0.99918, a testing accuracy of 0.99223, and a loss of 0.01371. These outcomes showcase the high accuracy achieved by the proposed hybrid model in predicting the risk of Dyslexia and Dysgraphia.Keywords: neural networks, risk detection system, dyslexia, dysgraphia, deep learning, learning disabilities, data science
Procedia PDF Downloads 6311 Neural Network-based Risk Detection for Dyslexia and Dysgraphia in Sinhala Language Speaking Children
Authors: Budhvin T. Withana, Sulochana Rupasinghe
Abstract:
The problem of Dyslexia and Dysgraphia, two learning disabilities that affect reading and writing abilities, respectively, is a major concern for the educational system. Due to the complexity and uniqueness of the Sinhala language, these conditions are especially difficult for children who speak it. The traditional risk detection methods for Dyslexia and Dysgraphia frequently rely on subjective assessments, making it difficult to cover a wide range of risk detection and time-consuming. As a result, diagnoses may be delayed and opportunities for early intervention may be lost. The project was approached by developing a hybrid model that utilized various deep learning techniques for detecting risk of Dyslexia and Dysgraphia. Specifically, Resnet50, VGG16 and YOLOv8 were integrated to detect the handwriting issues, and their outputs were fed into an MLP model along with several other input data. The hyperparameters of the MLP model were fine-tuned using Grid Search CV, which allowed for the optimal values to be identified for the model. This approach proved to be effective in accurately predicting the risk of Dyslexia and Dysgraphia, providing a valuable tool for early detection and intervention of these conditions. The Resnet50 model achieved an accuracy of 0.9804 on the training data and 0.9653 on the validation data. The VGG16 model achieved an accuracy of 0.9991 on the training data and 0.9891 on the validation data. The MLP model achieved an impressive training accuracy of 0.99918 and a testing accuracy of 0.99223, with a loss of 0.01371. These results demonstrate that the proposed hybrid model achieved a high level of accuracy in predicting the risk of Dyslexia and Dysgraphia.Keywords: neural networks, risk detection system, Dyslexia, Dysgraphia, deep learning, learning disabilities, data science
Procedia PDF Downloads 11210 American Sign Language Recognition System
Authors: Rishabh Nagpal, Riya Uchagaonkar, Venkata Naga Narasimha Ashish Mernedi, Ahmed Hambaba
Abstract:
The rapid evolution of technology in the communication sector continually seeks to bridge the gap between different communities, notably between the deaf community and the hearing world. This project develops a comprehensive American Sign Language (ASL) recognition system, leveraging the advanced capabilities of convolutional neural networks (CNNs) and vision transformers (ViTs) to interpret and translate ASL in real-time. The primary objective of this system is to provide an effective communication tool that enables seamless interaction through accurate sign language interpretation. The architecture of the proposed system integrates dual networks -VGG16 for precise spatial feature extraction and vision transformers for contextual understanding of the sign language gestures. The system processes live input, extracting critical features through these sophisticated neural network models, and combines them to enhance gesture recognition accuracy. This integration facilitates a robust understanding of ASL by capturing detailed nuances and broader gesture dynamics. The system is evaluated through a series of tests that measure its efficiency and accuracy in real-world scenarios. Results indicate a high level of precision in recognizing diverse ASL signs, substantiating the potential of this technology in practical applications. Challenges such as enhancing the system’s ability to operate in varied environmental conditions and further expanding the dataset for training were identified and discussed. Future work will refine the model’s adaptability and incorporate haptic feedback to enhance the interactivity and richness of the user experience. This project demonstrates the feasibility of an advanced ASL recognition system and lays the groundwork for future innovations in assistive communication technologies.Keywords: sign language, computer vision, vision transformer, VGG16, CNN
Procedia PDF Downloads 429 FracXpert: Ensemble Machine Learning Approach for Localization and Classification of Bone Fractures in Cricket Athletes
Authors: Madushani Rodrigo, Banuka Athuraliya
Abstract:
In today's world of medical diagnosis and prediction, machine learning stands out as a strong tool, transforming old ways of caring for health. This study analyzes the use of machine learning in the specialized domain of sports medicine, with a focus on the timely and accurate detection of bone fractures in cricket athletes. Failure to identify bone fractures in real time can result in malunion or non-union conditions. To ensure proper treatment and enhance the bone healing process, accurately identifying fracture locations and types is necessary. When interpreting X-ray images, it relies on the expertise and experience of medical professionals in the identification process. Sometimes, radiographic images are of low quality, leading to potential issues. Therefore, it is necessary to have a proper approach to accurately localize and classify fractures in real time. The research has revealed that the optimal approach needs to address the stated problem and employ appropriate radiographic image processing techniques and object detection algorithms. These algorithms should effectively localize and accurately classify all types of fractures with high precision and in a timely manner. In order to overcome the challenges of misidentifying fractures, a distinct model for fracture localization and classification has been implemented. The research also incorporates radiographic image enhancement and preprocessing techniques to overcome the limitations posed by low-quality images. A classification ensemble model has been implemented using ResNet18 and VGG16. In parallel, a fracture segmentation model has been implemented using the enhanced U-Net architecture. Combining the results of these two implemented models, the FracXpert system can accurately localize exact fracture locations along with fracture types from the available 12 different types of fracture patterns, which include avulsion, comminuted, compressed, dislocation, greenstick, hairline, impacted, intraarticular, longitudinal, oblique, pathological, and spiral. This system will generate a confidence score level indicating the degree of confidence in the predicted result. Using ResNet18 and VGG16 architectures, the implemented fracture segmentation model, based on the U-Net architecture, achieved a high accuracy level of 99.94%, demonstrating its precision in identifying fracture locations. Simultaneously, the classification ensemble model achieved an accuracy of 81.0%, showcasing its ability to categorize various fracture patterns, which is instrumental in the fracture treatment process. In conclusion, FracXpert has become a promising ML application in sports medicine, demonstrating its potential to revolutionize fracture detection processes. By leveraging the power of ML algorithms, this study contributes to the advancement of diagnostic capabilities in cricket athlete healthcare, ensuring timely and accurate identification of bone fractures for the best treatment outcomes.Keywords: multiclass classification, object detection, ResNet18, U-Net, VGG16
Procedia PDF Downloads 1188 Deep-Learning Based Approach to Facial Emotion Recognition through Convolutional Neural Network
Authors: Nouha Khediri, Mohammed Ben Ammar, Monji Kherallah
Abstract:
Recently, facial emotion recognition (FER) has become increasingly essential to understand the state of the human mind. Accurately classifying emotion from the face is a challenging task. In this paper, we present a facial emotion recognition approach named CV-FER, benefiting from deep learning, especially CNN and VGG16. First, the data is pre-processed with data cleaning and data rotation. Then, we augment the data and proceed to our FER model, which contains five convolutions layers and five pooling layers. Finally, a softmax classifier is used in the output layer to recognize emotions. Based on the above contents, this paper reviews the works of facial emotion recognition based on deep learning. Experiments show that our model outperforms the other methods using the same FER2013 database and yields a recognition rate of 92%. We also put forward some suggestions for future work.Keywords: CNN, deep-learning, facial emotion recognition, machine learning
Procedia PDF Downloads 947 Multi-Classification Deep Learning Model for Diagnosing Different Chest Diseases
Authors: Bandhan Dey, Muhsina Bintoon Yiasha, Gulam Sulaman Choudhury
Abstract:
Chest disease is one of the most problematic ailments in our regular life. There are many known chest diseases out there. Diagnosing them correctly plays a vital role in the process of treatment. There are many methods available explicitly developed for different chest diseases. But the most common approach for diagnosing these diseases is through X-ray. In this paper, we proposed a multi-classification deep learning model for diagnosing COVID-19, lung cancer, pneumonia, tuberculosis, and atelectasis from chest X-rays. In the present work, we used the transfer learning method for better accuracy and fast training phase. The performance of three architectures is considered: InceptionV3, VGG-16, and VGG-19. We evaluated these deep learning architectures using public digital chest x-ray datasets with six classes (i.e., COVID-19, lung cancer, pneumonia, tuberculosis, atelectasis, and normal). The experiments are conducted on six-classification, and we found that VGG16 outperforms other proposed models with an accuracy of 95%.Keywords: deep learning, image classification, X-ray images, Tensorflow, Keras, chest diseases, convolutional neural networks, multi-classification
Procedia PDF Downloads 916 Deep Learning based Image Classifiers for Detection of CSSVD in Cacao Plants
Authors: Atuhurra Jesse, N'guessan Yves-Roland Douha, Pabitra Lenka
Abstract:
The detection of diseases within plants has attracted a lot of attention from computer vision enthusiasts. Despite the progress made to detect diseases in many plants, there remains a research gap to train image classifiers to detect the cacao swollen shoot virus disease or CSSVD for short, pertinent to cacao plants. This gap has mainly been due to the unavailability of high quality labeled training data. Moreover, institutions have been hesitant to share their data related to CSSVD. To fill these gaps, image classifiers to detect CSSVD-infected cacao plants are presented in this study. The classifiers are based on VGG16, ResNet50 and Vision Transformer (ViT). The image classifiers are evaluated on a recently released and publicly accessible KaraAgroAI Cocoa dataset. The best performing image classifier, based on ResNet50, achieves 95.39\% precision, 93.75\% recall, 94.34\% F1-score and 94\% accuracy on only 20 epochs. There is a +9.75\% improvement in recall when compared to previous works. These results indicate that the image classifiers learn to identify cacao plants infected with CSSVD.Keywords: CSSVD, image classification, ResNet50, vision transformer, KaraAgroAI cocoa dataset
Procedia PDF Downloads 1025 An Empirical Study on Switching Activation Functions in Shallow and Deep Neural Networks
Authors: Apoorva Vinod, Archana Mathur, Snehanshu Saha
Abstract:
Though there exists a plethora of Activation Functions (AFs) used in single and multiple hidden layer Neural Networks (NN), their behavior always raised curiosity, whether used in combination or singly. The popular AFs –Sigmoid, ReLU, and Tanh–have performed prominently well for shallow and deep architectures. Most of the time, AFs are used singly in multi-layered NN, and, to the best of our knowledge, their performance is never studied and analyzed deeply when used in combination. In this manuscript, we experiment with multi-layered NN architecture (both on shallow and deep architectures; Convolutional NN and VGG16) and investigate how well the network responds to using two different AFs (Sigmoid-Tanh, Tanh-ReLU, ReLU-Sigmoid) used alternately against a traditional, single (Sigmoid-Sigmoid, Tanh-Tanh, ReLUReLU) combination. Our results show that using two different AFs, the network achieves better accuracy, substantially lower loss, and faster convergence on 4 computer vision (CV) and 15 Non-CV (NCV) datasets. When using different AFs, not only was the accuracy greater by 6-7%, but we also accomplished convergence twice as fast. We present a case study to investigate the probability of networks suffering vanishing and exploding gradients when using two different AFs. Additionally, we theoretically showed that a composition of two or more AFs satisfies Universal Approximation Theorem (UAT).Keywords: activation function, universal approximation function, neural networks, convergence
Procedia PDF Downloads 1574 Computer Aided Analysis of Breast Based Diagnostic Problems from Mammograms Using Image Processing and Deep Learning Methods
Authors: Ali Berkan Ural
Abstract:
This paper presents the analysis, evaluation, and pre-diagnosis of early stage breast based diagnostic problems (breast cancer, nodulesorlumps) by Computer Aided Diagnosing (CAD) system from mammogram radiological images. According to the statistics, the time factor is crucial to discover the disease in the patient (especially in women) as possible as early and fast. In the study, a new algorithm is developed using advanced image processing and deep learning method to detect and classify the problem at earlystagewithmoreaccuracy. This system first works with image processing methods (Image acquisition, Noiseremoval, Region Growing Segmentation, Morphological Operations, Breast BorderExtraction, Advanced Segmentation, ObtainingRegion Of Interests (ROIs), etc.) and segments the area of interest of the breast and then analyzes these partly obtained area for cancer detection/lumps in order to diagnosis the disease. After segmentation, with using the Spectrogramimages, 5 different deep learning based methods (specified Convolutional Neural Network (CNN) basedAlexNet, ResNet50, VGG16, DenseNet, Xception) are applied to classify the breast based problems.Keywords: computer aided diagnosis, breast cancer, region growing, segmentation, deep learning
Procedia PDF Downloads 943 Quality Analysis of Vegetables Through Image Processing
Authors: Abdul Khalique Baloch, Ali Okatan
Abstract:
The quality analysis of food and vegetable from image is hot topic now a day, where researchers make them better then pervious findings through different technique and methods. In this research we have review the literature, and find gape from them, and suggest better proposed approach, design the algorithm, developed a software to measure the quality from images, where accuracy of image show better results, and compare the results with Perouse work done so for. The Application we uses an open-source dataset and python language with tensor flow lite framework. In this research we focus to sort food and vegetable from image, in the images, the application can sorts and make them grading after process the images, it could create less errors them human base sorting errors by manual grading. Digital pictures datasets were created. The collected images arranged by classes. The classification accuracy of the system was about 94%. As fruits and vegetables play main role in day-to-day life, the quality of fruits and vegetables is necessary in evaluating agricultural produce, the customer always buy good quality fruits and vegetables. This document is about quality detection of fruit and vegetables using images. Most of customers suffering due to unhealthy foods and vegetables by suppliers, so there is no proper quality measurement level followed by hotel managements. it have developed software to measure the quality of the fruits and vegetables by using images, it will tell you how is your fruits and vegetables are fresh or rotten. Some algorithms reviewed in this thesis including digital images, ResNet, VGG16, CNN and Transfer Learning grading feature extraction. This application used an open source dataset of images and language used python, and designs a framework of system.Keywords: deep learning, computer vision, image processing, rotten fruit detection, fruits quality criteria, vegetables quality criteria
Procedia PDF Downloads 682 A Study on the Application of Machine Learning and Deep Learning Techniques for Skin Cancer Detection
Authors: Hritwik Ghosh, Irfan Sadiq Rahat, Sachi Nandan Mohanty, J. V. R. Ravindra
Abstract:
In the rapidly evolving landscape of medical diagnostics, the early detection and accurate classification of skin cancer remain paramount for effective treatment outcomes. This research delves into the transformative potential of Artificial Intelligence (AI), specifically Deep Learning (DL), as a tool for discerning and categorizing various skin conditions. Utilizing a diverse dataset of 3,000 images representing nine distinct skin conditions, we confront the inherent challenge of class imbalance. This imbalance, where conditions like melanomas are over-represented, is addressed by incorporating class weights during the model training phase, ensuring an equitable representation of all conditions in the learning process. Our pioneering approach introduces a hybrid model, amalgamating the strengths of two renowned Convolutional Neural Networks (CNNs), VGG16 and ResNet50. These networks, pre-trained on the ImageNet dataset, are adept at extracting intricate features from images. By synergizing these models, our research aims to capture a holistic set of features, thereby bolstering classification performance. Preliminary findings underscore the hybrid model's superiority over individual models, showcasing its prowess in feature extraction and classification. Moreover, the research emphasizes the significance of rigorous data pre-processing, including image resizing, color normalization, and segmentation, in ensuring data quality and model reliability. In essence, this study illuminates the promising role of AI and DL in revolutionizing skin cancer diagnostics, offering insights into its potential applications in broader medical domains.Keywords: artificial intelligence, machine learning, deep learning, skin cancer, dermatology, convolutional neural networks, image classification, computer vision, healthcare technology, cancer detection, medical imaging
Procedia PDF Downloads 861 Deep Learning for Image Correction in Sparse-View Computed Tomography
Authors: Shubham Gogri, Lucia Florescu
Abstract:
Medical diagnosis and radiotherapy treatment planning using Computed Tomography (CT) rely on the quantitative accuracy and quality of the CT images. At the same time, requirements for CT imaging include reducing the radiation dose exposure to patients and minimizing scanning time. A solution to this is the sparse-view CT technique, based on a reduced number of projection views. This, however, introduces a new problem— the incomplete projection data results in lower quality of the reconstructed images. To tackle this issue, deep learning methods have been applied to enhance the quality of the sparse-view CT images. A first approach involved employing Mir-Net, a dedicated deep neural network designed for image enhancement. This showed promise, utilizing an intricate architecture comprising encoder and decoder networks, along with the incorporation of the Charbonnier Loss. However, this approach was computationally demanding. Subsequently, a specialized Generative Adversarial Network (GAN) architecture, rooted in the Pix2Pix framework, was implemented. This GAN framework involves a U-Net-based Generator and a Discriminator based on Convolutional Neural Networks. To bolster the GAN's performance, both Charbonnier and Wasserstein loss functions were introduced, collectively focusing on capturing minute details while ensuring training stability. The integration of the perceptual loss, calculated based on feature vectors extracted from the VGG16 network pretrained on the ImageNet dataset, further enhanced the network's ability to synthesize relevant images. A series of comprehensive experiments with clinical CT data were conducted, exploring various GAN loss functions, including Wasserstein, Charbonnier, and perceptual loss. The outcomes demonstrated significant image quality improvements, confirmed through pertinent metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) between the corrected images and the ground truth. Furthermore, learning curves and qualitative comparisons added evidence of the enhanced image quality and the network's increased stability, while preserving pixel value intensity. The experiments underscored the potential of deep learning frameworks in enhancing the visual interpretation of CT scans, achieving outcomes with SSIM values close to one and PSNR values reaching up to 76.Keywords: generative adversarial networks, sparse view computed tomography, CT image correction, Mir-Net
Procedia PDF Downloads 159