Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 12021

Search results for: deep learning network

11601 An Improved Discrete Version of Teaching–Learning-Based ‎Optimization for Supply Chain Network Design

Abstract:

While there are several metaheuristics and exact approaches to solving the Supply Chain Network Design (SCND) problem, there still remains an unfilled gap in using the Teaching-Learning-Based Optimization (TLBO) algorithm. The algorithm has demonstrated desirable results with problems with complicated combinational optimization. The present study introduces a Discrete Self-Study TLBO (DSS-TLBO) with priority-based solution representation that can solve a supply chain network configuration model to lower the total expenses of establishing facilities and the flow of materials. The network features four layers, namely suppliers, plants, distribution centers (DCs), and customer zones. It is designed to meet the customer’s demand through transporting the material between layers of network and providing facilities in the best economic Potential locations. To have a higher quality of the solution and increase the speed of TLBO, a distinct operator was introduced that ensures self-adaptation (self-study) in the algorithm based on the four types of local search. In addition, while TLBO is used in continuous solution representation and priority-based solution representation is discrete, a few modifications were added to the algorithm to remove the solutions that are infeasible. As shown by the results of experiments, the superiority of DSS-TLBO compared to pure TLBO, genetic algorithm (GA) and firefly Algorithm (FA) was established.

Keywords: supply chain network design, teaching–learning-based optimization, improved metaheuristics, discrete solution representation

Procedia PDF Downloads 48

11600 A Global Organizational Theory for the 21st Century

Authors: Troy A. Tyre

Abstract:

Organizational behavior and organizational change are elements of the ever-changing global business environment. Leadership and organizational behavior are 21st century disciplines. Network marketing organizations need to understand the ever-changing nature of global business and be ready and willing to adapt to the environment. Network marketing organizations have a challenge keeping up with a rapid escalation in global growth. Network marketing growth has been steady and global. Network marketing organizations have been slow to develop a 21st century global strategy to manage the rapid escalation of growth degrading organizational behavior, job satisfaction, increasing attrition, and degrading customer service. Development of an organizational behavior and leadership theory for the 21st century to help network marketing develops a global business strategy to manage the rapid escalation in growth that affects organizational behavior. Managing growth means organizational leadership must develop and adapt to the organizational environment. Growth comes with an open mind and one’s departure from the comfort zone. Leadership growth operates in the tacit dimension. Systems thinking and adaptation of mental models can help shift organizational behavior. Shifting the organizational behavior requires organizational learning. Organizational learning occurs through single-loop, double-loop, and triple-loop learning. Triple-loop learning is the most difficult, but the most rewarding. Tools such as theory U can aid in developing a landscape for organizational behavioral development. Additionally, awareness to espoused and portrayed actions is imperatives. Theories of motivation, cross-cultural diversity, and communications are instrumental in founding an organizational behavior suited for the 21st century.

Keywords: global, leadership, network marketing, organizational behavior

Procedia PDF Downloads 550

11599 A Comparative Study of Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Algorithms for Robot Exploration and Navigation in Unseen Environments

Authors: Romisaa Ali

Abstract:

This paper presents a comparison between twin-delayed Deep Deterministic Policy Gradient (TD3) and Soft Actor-Critic (SAC) reinforcement learning algorithms in the context of training robust navigation policies for Jackal robots. By leveraging an open-source framework and custom motion control environments, the study evaluates the performance, robustness, and transferability of the trained policies across a range of scenarios. The primary focus of the experiments is to assess the training process, the adaptability of the algorithms, and the robot’s ability to navigate in previously unseen environments. Moreover, the paper examines the influence of varying environmental complexities on the learning process and the generalization capabilities of the resulting policies. The results of this study aim to inform and guide the development of more efficient and practical reinforcement learning-based navigation policies for Jackal robots in real-world scenarios.

Keywords: Jackal robot environments, reinforcement learning, TD3, SAC, robust navigation, transferability, custom environment

Procedia PDF Downloads 96

11598 Automating 2D CAD to 3D Model Generation Process: Wall pop-ups

Authors: Mohit Gupta, Chialing Wei, Thomas Czerniawski

Abstract:

In this paper, we have built a neural network that can detect walls on 2D sheets and subsequently create a 3D model in Revit using Dynamo. The training set includes 3500 labeled images, and the detection algorithm used is YOLO. Typically, engineers/designers make concentrated efforts to convert 2D cad drawings to 3D models. This costs a considerable amount of time and human effort. This paper makes a contribution in automating the task of 3D walls modeling. 1. Detecting Walls in 2D cad and generating 3D pop-ups in Revit. 2. Saving designer his/her modeling time in drafting elements like walls from 2D cad to 3D representation. An object detection algorithm YOLO is used for wall detection and localization. The neural network is trained over 3500 labeled images of size 256x256x3. Then, Dynamo is interfaced with the output of the neural network to pop-up 3D walls in Revit. The research uses modern technological tools like deep learning and artificial intelligence to automate the process of generating 3D walls without needing humans to manually model them. Thus, contributes to saving time, human effort, and money.

Keywords: neural networks, Yolo, 2D to 3D transformation, CAD object detection

Procedia PDF Downloads 141

11597 Implementation of Data Science in Field of Homologation

Authors: Shubham Bhonde, Nekzad Doctor, Shashwat Gawande

Abstract:

For the use and the import of Keys and ID Transmitter as well as Body Control Modules with radio transmission in a lot of countries, homologation is required. Final deliverables in homologation of the product are certificates. In considering the world of homologation, there are approximately 200 certificates per product, with most of the certificates in local languages. It is challenging to manually investigate each certificate and extract relevant data from the certificate, such as expiry date, approval date, etc. It is most important to get accurate data from the certificate as inaccuracy may lead to missing re-homologation of certificates that will result in an incompliance situation. There is a scope of automation in reading the certificate data in the field of homologation. We are using deep learning as a tool for automation. We have first trained a model using machine learning by providing all country's basic data. We have trained this model only once. We trained the model by feeding pdf and jpg files using the ETL process. Eventually, that trained model will give more accurate results later. As an outcome, we will get the expiry date and approval date of the certificate with a single click. This will eventually help to implement automation features on a broader level in the database where certificates are stored. This automation will help to minimize human error to almost negligible.

Keywords: homologation, re-homologation, data science, deep learning, machine learning, ETL (extract transform loading)

Procedia PDF Downloads 158

11596 Measuring Human Perception and Negative Elements of Public Space Quality Using Deep Learning: A Case Study of Area within the Inner Road of Tianjin City

Authors: Jiaxin Shi, Kaifeng Hao, Qingfan An, Zeng Peng

Abstract:

Due to a lack of data sources and data processing techniques, it has always been difficult to quantify public space quality, which includes urban construction quality and how it is perceived by people, especially in large urban areas. This study proposes a quantitative research method based on the consideration of emotional health and physical health of the built environment. It highlights the low quality of public areas in Tianjin, China, where there are many negative elements. Deep learning technology is then used to measure how effectively people perceive urban areas. First, this work suggests a deep learning model that might simulate how people can perceive the quality of urban construction. Second, we perform semantic segmentation on street images to identify visual elements influencing scene perception. Finally, this study correlated the scene perception score with the proportion of visual elements to determine the surrounding environmental elements that influence scene perception. Using a small-scale labeled Tianjin street view data set based on transfer learning, this study trains five negative spatial discriminant models in order to explore the negative space distribution and quality improvement of urban streets. Then it uses all Tianjin street-level imagery to make predictions and calculate the proportion of negative space. Visualizing the spatial distribution of negative space along the Tianjin Inner Ring Road reveals that the negative elements are mainly found close to the five key districts. The map of Tianjin was combined with the experimental data to perform the visual analysis. Based on the emotional assessment, the distribution of negative materials, and the direction of street guidelines, we suggest guidance content and design strategy points of the negative phenomena in Tianjin street space in the two dimensions of perception and substance. This work demonstrates the utilization of deep learning techniques to understand how people appreciate high-quality urban construction, and it complements both theory and practice in urban planning. It illustrates the connection between human perception and the actual physical public space environment, allowing researchers to make urban interventions.

Keywords: human perception, public space quality, deep learning, negative elements, street images

Procedia PDF Downloads 109

11595 Improving Chest X-Ray Disease Detection with Enhanced Data Augmentation Using Novel Approach of Diverse Conditional Wasserstein Generative Adversarial Networks

Authors: Malik Muhammad Arslan, Muneeb Ullah, Dai Shihan, Daniyal Haider, Xiaodong Yang

Abstract:

Chest X-rays are instrumental in the detection and monitoring of a wide array of diseases, including viral infections such as COVID-19, tuberculosis, pneumonia, lung cancer, and various cardiac and pulmonary conditions. To enhance the accuracy of diagnosis, artificial intelligence (AI) algorithms, particularly deep learning models like Convolutional Neural Networks (CNNs), are employed. However, these deep learning models demand a substantial and varied dataset to attain optimal precision. Generative Adversarial Networks (GANs) can be employed to create new data, thereby supplementing the existing dataset and enhancing the accuracy of deep learning models. Nevertheless, GANs have their limitations, such as issues related to stability, convergence, and the ability to distinguish between authentic and fabricated data. In order to overcome these challenges and advance the detection and classification of CXR normal and abnormal images, this study introduces a distinctive technique known as DCWGAN (Diverse Conditional Wasserstein GAN) for generating synthetic chest X-ray (CXR) images. The study evaluates the effectiveness of this Idiosyncratic DCWGAN technique using the ResNet50 model and compares its results with those obtained using the traditional GAN approach. The findings reveal that the ResNet50 model trained on the DCWGAN-generated dataset outperformed the model trained on the classic GAN-generated dataset. Specifically, the ResNet50 model utilizing DCWGAN synthetic images achieved impressive performance metrics with an accuracy of 0.961, precision of 0.955, recall of 0.970, and F1-Measure of 0.963. These results indicate the promising potential for the early detection of diseases in CXR images using this Inimitable approach.

Keywords: CNN, classification, deep learning, GAN, Resnet50

Procedia PDF Downloads 79

11594 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 208

11593 Rejuvenate: Face and Body Retouching Using Image Inpainting

Authors: Hossam Abdelrahman, Sama Rostom, Reem Yassein, Yara Mohamed, Salma Salah, Nour Awny

Abstract:

In today’s environment, people are becoming increasingly interested in their appearance. However, they are afraid of their unknown appearance after a plastic surgery or treatment. Accidents, burns and genetic problems such as bowing of body parts of people have a negative impact on their mental health with their appearance and this makes them feel uncomfortable and underestimated. The approach presents a revolutionary deep learning-based image inpainting method that analyses the various picture structures and corrects damaged images. In this study, A model is proposed based on the in-painting of medical images with Stable Diffusion Inpainting method. Reconstructing missing and damaged sections of an image is known as image inpainting is a key progress facilitated by deep neural networks. The system uses the input of the user of an image to indicate a problem, the system will then modify the image and output the fixed image, facilitating for the patient to see the final result.

Keywords: generative adversarial network, large mask inpainting, stable diffusion inpainting, plastic surgery

Procedia PDF Downloads 70

11592 Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcome

Authors: Yijun Shao, Yan Cheng, Rashmee U. Shah, Charlene R. Weir, Bruce E. Bray, Qing Zeng-Treitler

Abstract:

Deep neural network (DNN) models are being explored in the clinical domain, following the recent success in other domains such as image recognition. For clinical adoption, outcome prediction models require explanation, but due to the multiple non-linear inner transformations, DNN models are viewed by many as a black box. In this study, we developed a deep neural network model for predicting 1-year mortality of patients who underwent major cardio vascular procedures (MCVPs), using temporal image representation of past medical history as input. The dataset was obtained from the electronic medical data warehouse administered by Veteran Affairs Information and Computing Infrastructure (VINCI). We identified 21,355 veterans who had their first MCVP in 2014. Features for prediction included demographics, diagnoses, procedures, medication orders, hospitalizations, and frailty measures extracted from clinical notes. Temporal variables were created based on the patient history data in the 2-year window prior to the index MCVP. A temporal image was created based on these variables for each individual patient. To generate the explanation for the DNN model, we defined a new concept called impact score, based on the presence/value of clinical conditions’ impact on the predicted outcome. Like (log) odds ratio reported by the logistic regression (LR) model, impact scores are continuous variables intended to shed light on the black box model. For comparison, a logistic regression model was fitted on the same dataset. In our cohort, about 6.8% of patients died within one year. The prediction of the DNN model achieved an area under the curve (AUC) of 78.5% while the LR model achieved an AUC of 74.6%. A strong but not perfect correlation was found between the aggregated impact scores and the log odds ratios (Spearman’s rho = 0.74), which helped validate our explanation.

Keywords: deep neural network, temporal data, prediction, frailty, logistic regression model

Procedia PDF Downloads 152

11591 Data Augmentation for Early-Stage Lung Nodules Using Deep Image Prior and Pix2pix

Authors: Qasim Munye, Juned Islam, Haseeb Qureshi, Syed Jung

Abstract:

Lung nodules are commonly identified in computed tomography (CT) scans by experienced radiologists at a relatively late stage. Early diagnosis can greatly increase survival. We propose using a pix2pix conditional generative adversarial network to generate realistic images simulating early-stage lung nodule growth. We have applied deep images prior to 2341 slices from 895 computed tomography (CT) scans from the Lung Image Database Consortium (LIDC) dataset to generate pseudo-healthy medical images. From these images, 819 were chosen to train a pix2pix network. We observed that for most of the images, the pix2pix network was able to generate images where the nodule increased in size and intensity across epochs. To evaluate the images, 400 generated images were chosen at random and shown to a medical student beside their corresponding original image. Of these 400 generated images, 384 were defined as satisfactory - meaning they resembled a nodule and were visually similar to the corresponding image. We believe that this generated dataset could be used as training data for neural networks to detect lung nodules at an early stage or to improve the accuracy of such networks. This is particularly significant as datasets containing the growth of early-stage nodules are scarce. This project shows that the combination of deep image prior and generative models could potentially open the door to creating larger datasets than currently possible and has the potential to increase the accuracy of medical classification tasks.

Keywords: medical technology, artificial intelligence, radiology, lung cancer

Procedia PDF Downloads 62

11590 Predicting Subsurface Abnormalities Growth Using Physics-Informed Neural Networks

Authors: Mehrdad Shafiei Dizaji, Hoda Azari

Abstract:

The research explores the pioneering integration of Physics-Informed Neural Networks (PINNs) into the domain of Ground-Penetrating Radar (GPR) data prediction, akin to advancements in medical imaging for tracking tumor progression in the human body. This research presents a detailed development framework for a specialized PINN model proficient at interpreting and forecasting GPR data, much like how medical imaging models predict tumor behavior. By harnessing the synergy between deep learning algorithms and the physical laws governing subsurface structures—or, in medical terms, human tissues—the model effectively embeds the physics of electromagnetic wave propagation into its architecture. This ensures that predictions not only align with fundamental physical principles but also mirror the precision needed in medical diagnostics for detecting and monitoring tumors. The suggested deep learning structure comprises three components: a CNN, a spatial feature channel attention (SFCA) mechanism, and ConvLSTM, along with temporal feature frame attention (TFFA) modules. The attention mechanism computes channel attention and temporal attention weights using self-adaptation, thereby fine-tuning the visual and temporal feature responses to extract the most pertinent and significant visual and temporal features. By integrating physics directly into the neural network, our model has shown enhanced accuracy in forecasting GPR data. This improvement is vital for conducting effective assessments of bridge deck conditions and other evaluations related to civil infrastructure. The use of Physics-Informed Neural Networks (PINNs) has demonstrated the potential to transform the field of Non-Destructive Evaluation (NDE) by enhancing the precision of infrastructure deterioration predictions. Moreover, it offers a deeper insight into the fundamental mechanisms of deterioration, viewed through the prism of physics-based models.

Keywords: physics-informed neural networks, deep learning, ground-penetrating radar (GPR), NDE, ConvLSTM, physics, data driven

Procedia PDF Downloads 30

11589 Domain Adaptation Save Lives - Drowning Detection in Swimming Pool Scene Based on YOLOV8 Improved by Gaussian Poisson Generative Adversarial Network Augmentation

Authors: Simiao Ren, En Wei

Abstract:

Drowning is a significant safety issue worldwide, and a robust computer vision-based alert system can easily prevent such tragedies in swimming pools. However, due to domain shift caused by the visual gap (potentially due to lighting, indoor scene change, pool floor color etc.) between the training swimming pool and the test swimming pool, the robustness of such algorithms has been questionable. The annotation cost for labeling each new swimming pool is too expensive for mass adoption of such a technique. To address this issue, we propose a domain-aware data augmentation pipeline based on Gaussian Poisson Generative Adversarial Network (GP-GAN). Combined with YOLOv8, we demonstrate that such a domain adaptation technique can significantly improve the model performance (from 0.24 mAP to 0.82 mAP) on new test scenes. As the augmentation method only require background imagery from the new domain (no annotation needed), we believe this is a promising, practical route for preventing swimming pool drowning.

Keywords: computer vision, deep learning, YOLOv8, detection, swimming pool, drowning, domain adaptation, generative adversarial network, GAN, GP-GAN

Procedia PDF Downloads 89

11588 Multilayer Perceptron Neural Network for Rainfall-Water Level Modeling

Authors: Thohidul Islam, Md. Hamidul Haque, Robin Kumar Biswas

Abstract:

Floods are one of the deadliest natural disasters which are very complex to model; however, machine learning is opening the door for more reliable and accurate flood prediction. In this research, a multilayer perceptron neural network (MLP) is developed to model the rainfall-water level relation, in a subtropical monsoon climatic region of the Bangladesh-India border. Our experiments show promising empirical results to forecast the water level for 1 day lead time. Our best performing MLP model achieves 98.7% coefficient of determination with lower model complexity which surpasses previously reported results on similar forecasting problems.

Keywords: flood forecasting, machine learning, multilayer perceptron network, regression

Procedia PDF Downloads 167

11587 D3Advert: Data-Driven Decision Making for Ad Personalization through Personality Analysis Using BiLSTM Network

Authors: Sandesh Achar

Abstract:

Personalized advertising holds greater potential for higher conversion rates compared to generic advertisements. However, its widespread application in the retail industry faces challenges due to complex implementation processes. These complexities impede the swift adoption of personalized advertisement on a large scale. Personalized advertisement, being a data-driven approach, necessitates consumer-related data, adding to its complexity. This paper introduces an innovative data-driven decision-making framework, D3Advert, which personalizes advertisements by analyzing personalities using a BiLSTM network. The framework utilizes the Myers–Briggs Type Indicator (MBTI) dataset for development. The employed BiLSTM network, specifically designed and optimized for D3Advert, classifies user personalities into one of the sixteen MBTI categories based on their social media posts. The classification accuracy is 86.42%, with precision, recall, and F1-Score values of 85.11%, 84.14%, and 83.89%, respectively. The D3Advert framework personalizes advertisements based on these personality classifications. Experimental implementation and performance analysis of D3Advert demonstrate a 40% improvement in impressions. D3Advert’s innovative and straightforward approach has the potential to transform personalized advertising and foster widespread personalized advertisement adoption in marketing.

Keywords: personalized advertisement, deep Learning, MBTI dataset, BiLSTM network, NLP.

Procedia PDF Downloads 38

11586 Chinese Sentence Level Lip Recognition

Authors: Peng Wang, Tigang Jiang

Abstract:

The computer based lip reading method of different languages cannot be universal. At present, for the research of Chinese lip reading, whether the work on data sets or recognition algorithms, is far from mature. In this paper, we study the Chinese lipreading method based on machine learning, and propose a Chinese Sentence-level lip-reading network (CNLipNet) model which consists of spatio-temporal convolutional neural network(CNN), recurrent neural network(RNN) and Connectionist Temporal Classification (CTC) loss function. This model can map variable-length sequence of video frames to Chinese Pinyin sequence and is trained end-to-end. More over, We create CNLRS, a Chinese Lipreading Dataset, which contains 5948 samples and can be shared through github. The evaluation of CNLipNet on this dataset yielded a 41% word correct rate and a 70.6% character correct rate. This evaluation result is far superior to the professional human lip readers, indicating that CNLipNet performs well in lipreading.

Keywords: lipreading, machine learning, spatio-temporal, convolutional neural network, recurrent neural network

Procedia PDF Downloads 123

11585 DenseNet and Autoencoder Architecture for COVID-19 Chest X-Ray Image Classification and Improved U-Net Lung X-Ray Segmentation

Authors: Jonathan Gong

Abstract:

Purpose AI-driven solutions are at the forefront of many pathology and medical imaging methods. Using algorithms designed to better the experience of medical professionals within their respective fields, the efficiency and accuracy of diagnosis can improve. In particular, X-rays are a fast and relatively inexpensive test that can diagnose diseases. In recent years, X-rays have not been widely used to detect and diagnose COVID-19. The under use of Xrays is mainly due to the low diagnostic accuracy and confounding with pneumonia, another respiratory disease. However, research in this field has expressed a possibility that artificial neural networks can successfully diagnose COVID-19 with high accuracy. Models and Data The dataset used is the COVID-19 Radiography Database. This dataset includes images and masks of chest X-rays under the labels of COVID-19, normal, and pneumonia. The classification model developed uses an autoencoder and a pre-trained convolutional neural network (DenseNet201) to provide transfer learning to the model. The model then uses a deep neural network to finalize the feature extraction and predict the diagnosis for the input image. This model was trained on 4035 images and validated on 807 separate images from the ones used for training. The images used to train the classification model include an important feature: the pictures are cropped beforehand to eliminate distractions when training the model. The image segmentation model uses an improved U-Net architecture. This model is used to extract the lung mask from the chest X-ray image. The model is trained on 8577 images and validated on a validation split of 20%. These models are calculated using the external dataset for validation. The models’ accuracy, precision, recall, f1-score, IOU, and loss are calculated. Results The classification model achieved an accuracy of 97.65% and a loss of 0.1234 when differentiating COVID19-infected, pneumonia-infected, and normal lung X-rays. The segmentation model achieved an accuracy of 97.31% and an IOU of 0.928. Conclusion The models proposed can detect COVID-19, pneumonia, and normal lungs with high accuracy and derive the lung mask from a chest X-ray with similarly high accuracy. The hope is for these models to elevate the experience of medical professionals and provide insight into the future of the methods used.

Keywords: artificial intelligence, convolutional neural networks, deep learning, image processing, machine learning

Procedia PDF Downloads 125

11584 Detection of COVID-19 Cases From X-Ray Images Using Capsule-Based Network

Authors: Donya Ashtiani Haghighi, Amirali Baniasadi

Abstract:

Coronavirus (COVID-19) disease has spread abruptly all over the world since the end of 2019. Computed tomography (CT) scans and X-ray images are used to detect this disease. Different Deep Neural Network (DNN)-based diagnosis solutions have been developed, mainly based on Convolutional Neural Networks (CNNs), to accelerate the identification of COVID-19 cases. However, CNNs lose important information in intermediate layers and require large datasets. In this paper, Capsule Network (CapsNet) is used. Capsule Network performs better than CNNs for small datasets. Accuracy of 0.9885, f1-score of 0.9883, precision of 0.9859, recall of 0.9908, and Area Under the Curve (AUC) of 0.9948 are achieved on the Capsule-based framework with hyperparameter tuning. Moreover, different dropout rates are investigated to decrease overfitting. Accordingly, a dropout rate of 0.1 shows the best results. Finally, we remove one convolution layer and decrease the number of trainable parameters to 146,752, which is a promising result.

Keywords: capsule network, dropout, hyperparameter tuning, classification

Procedia PDF Downloads 72

11583 Enhancer: An Effective Transformer Architecture for Single Image Super Resolution

Authors: Pitigalage Chamath Chandira Peiris

Abstract:

A widely researched domain in the field of image processing in recent times has been single image super-resolution, which tries to restore a high-resolution image from a single low-resolution image. Many more single image super-resolution efforts have been completed utilizing equally traditional and deep learning methodologies, as well as a variety of other methodologies. Deep learning-based super-resolution methods, in particular, have received significant interest. As of now, the most advanced image restoration approaches are based on convolutional neural networks; nevertheless, only a few efforts have been performed using Transformers, which have demonstrated excellent performance on high-level vision tasks. The effectiveness of CNN-based algorithms in image super-resolution has been impressive. However, these methods cannot completely capture the non-local features of the data. Enhancer is a simple yet powerful Transformer-based approach for enhancing the resolution of images. A method for single image super-resolution was developed in this study, which utilized an efficient and effective transformer design. This proposed architecture makes use of a locally enhanced window transformer block to alleviate the enormous computational load associated with non-overlapping window-based self-attention. Additionally, it incorporates depth-wise convolution in the feed-forward network to enhance its ability to capture local context. This study is assessed by comparing the results obtained for popular datasets to those obtained by other techniques in the domain.

Keywords: single image super resolution, computer vision, vision transformers, image restoration

Procedia PDF Downloads 100

11582 Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks

Authors: B. R. Campomanes-Alvarez, P. Quiros, B. Fernandez

Abstract:

Automatic Speech Recognition (ASR) is a machine-based process of decoding and transcribing oral speech. A typical ASR system receives acoustic input from a speaker or an audio file, analyzes it using algorithms, and produces an output in the form of a text. Some speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian Mixture Models (GMMs) to determine how well each state of each HMM fits a short window of frames of coefficients that represents the acoustic input. Another way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition systems. Acoustic models for state-of-the-art ASR systems are usually training on massive amounts of data. However, audio files with their corresponding transcriptions can be difficult to obtain, especially in the Spanish language. Hence, in the case of these low-resource scenarios, building an ASR model is considered as a complex task due to the lack of labeled data, resulting in an under-trained system. Semi-supervised learning approaches arise as necessary tasks given the high cost of transcribing audio data. The main goal of this proposal is to develop a procedure based on acoustic semi-supervised learning for Spanish ASR systems by using DNNs. This semi-supervised learning approach consists of: (a) Training a seed ASR model with a DNN using a set of audios and their respective transcriptions. A DNN with a one-hidden-layer network was initialized; increasing the number of hidden layers in training, to a five. A refinement, which consisted of the weight matrix plus bias term and a Stochastic Gradient Descent (SGD) training were also performed. The objective function was the cross-entropy criterion. (b) Decoding/testing a set of unlabeled data with the obtained seed model. (c) Selecting a suitable subset of the validated data to retrain the seed model, thereby improving its performance on the target test set. To choose the most precise transcriptions, three confidence scores or metrics, regarding the lattice concept (based on the graph cost, the acoustic cost and a combination of both), was performed as selection technique. The performance of the ASR system will be calculated by means of the Word Error Rate (WER). The test dataset was renewed in order to extract the new transcriptions added to the training dataset. Some experiments were carried out in order to select the best ASR results. A comparison between a GMM-based model without retraining and the DNN proposed system was also made under the same conditions. Results showed that the semi-supervised ASR-model based on DNNs outperformed the GMM-model, in terms of WER, in all tested cases. The best result obtained an improvement of 6% relative WER. Hence, these promising results suggest that the proposed technique could be suitable for building ASR models in low-resource environments.

Keywords: automatic speech recognition, deep neural networks, machine learning, semi-supervised learning

Procedia PDF Downloads 336

11581 Cultivating Concentration and Flow: Evaluation of a Strategy for Mitigating Digital Distractions in University Education

Authors: Vera G. Dianova, Lori P. Montross, Charles M. Burke

Abstract:

In the digital age, the widespread and frequently excessive use of mobile phones amongst university students is recognized as a significant distractor which interferes with their ability to enter a deep state of concentration during studies and diminishes their prospects of experiencing the enjoyable and instrumental state of flow, as defined and described by psychologist M. Csikszentmihalyi. This study has targeted 50 university students with the aim of teaching them to cultivate their ability to engage in deep work and to attain the state of flow, fostering more effective and enjoyable learning experiences. Prior to the start of the intervention, all participating students completed a comprehensive survey based on a variety of validated scales assessing their inclination toward lifelong learning, frequency of flow experiences during study, frustration tolerance, sense of agency, as well as their love of learning and daily time devoted to non-academic mobile phone activities. Several days after this initial assessment, students received a 90-minute lecture on the principles of flow and deep work, accompanied by a critical discourse on the detrimental effects of excessive mobile phone usage. They were encouraged to practice deep work and strive for frequent flow states throughout the semester. Subsequently, students submitted weekly surveys, including the 10-item CORE Dispositional Flow Scale, a 3-item agency scale and furthermore disclosed their average daily hours spent on non-academic mobile phone usage. As a final step, at the end of the semester students engaged in reflective report writing, sharing their experiences and evaluating the intervention's effectiveness. They considered alterations in their love of learning, reflected on the implications of their mobile phone usage, contemplated improvements in their tolerance for boredom and perseverance in complex tasks, and pondered the concept of lifelong learning. Additionally, students assessed whether they actively took steps towards managing their recreational phone usage and towards improving their commitment to becoming lifelong learners. Employing a mixed-methods approach our study offers insights into the dynamics of concentration, flow, mobile phone usage and attitudes towards learning among undergraduate and graduate university students. The findings of this study aim to promote profound contemplation, on the part of both students and instructors, on the rapidly evolving digital-age higher education environment. In an era defined by digital and AI advancements, the ability to concentrate, to experience the state of flow, and to love learning has never been more crucial. This study underscores the significance of addressing mobile phone distractions and providing strategies for cultivating deep concentration. The insights gained can guide educators in shaping effective learning strategies for the digital age. By nurturing a love for learning and encouraging lifelong learning, educational institutions can better prepare students for a rapidly changing labor market, where adaptability and continuous learning are paramount for success in a dynamic career landscape.

Keywords: deep work, flow, higher education, lifelong learning, love of learning

Procedia PDF Downloads 66

11580 Road Condition Monitoring Using Built-in Vehicle Technology Data, Drones, and Deep Learning

Authors: Judith Mwakalonge, Geophrey Mbatta, Saidi Siuhi, Gurcan Comert, Cuthbert Ruseruka

Abstract:

Transportation agencies worldwide continuously monitor their roads' conditions to minimize road maintenance costs and maintain public safety and rideability quality. Existing methods for carrying out road condition surveys involve manual observations of roads using standard survey forms done by qualified road condition surveyors or engineers either on foot or by vehicle. Automated road condition survey vehicles exist; however, they are very expensive since they require special vehicles equipped with sensors for data collection together with data processing and computing devices. The manual methods are expensive, time-consuming, infrequent, and can hardly provide real-time information for road conditions. This study contributes to this arena by utilizing built-in vehicle technologies, drones, and deep learning to automate road condition surveys while using low-cost technology. A single model is trained to capture flexible pavement distresses (Potholes, Rutting, Cracking, and raveling), thereby providing a more cost-effective and efficient road condition monitoring approach that can also provide real-time road conditions. Additionally, data fusion is employed to enhance the road condition assessment with data from vehicles and drones.

Keywords: road conditions, built-in vehicle technology, deep learning, drones

Procedia PDF Downloads 118

11579 Application of Deep Learning Algorithms in Agriculture: Early Detection of Crop Diseases

Authors: Manaranjan Pradhan, Shailaja Grover, U. Dinesh Kumar

Abstract:

Farming community in India, as well as other parts of the world, is one of the highly stressed communities due to reasons such as increasing input costs (cost of seeds, fertilizers, pesticide), droughts, reduced revenue leading to farmer suicides. Lack of integrated farm advisory system in India adds to the farmers problems. Farmers need right information during the early stages of crop’s lifecycle to prevent damage and loss in revenue. In this paper, we use deep learning techniques to develop an early warning system for detection of crop diseases using images taken by farmers using their smart phone. The research work leads to building a smart assistant using analytics and big data which could help the farmers with early diagnosis of the crop diseases and corrective actions. The classical approach for crop disease management has been to identify diseases at crop level. Recently, ImageNet Classification using the convolutional neural network (CNN) has been successfully used to identify diseases at individual plant level. Our model uses convolution filters, max pooling, dense layers and dropouts (to avoid overfitting). The models are built for binary classification (healthy or not healthy) and multi class classification (identifying which disease). Transfer learning is used to modify the weights of parameters learnt through ImageNet dataset and apply them on crop diseases, which reduces number of epochs to learn. One shot learning is used to learn from very few images, while data augmentation techniques are used to improve accuracy with images taken from farms by using techniques such as rotation, zoom, shift and blurred images. Models built using combination of these techniques are more robust for deploying in the real world. Our model is validated using tomato crop. In India, tomato is affected by 10 different diseases. Our model achieves an accuracy of more than 95% in correctly classifying the diseases. The main contribution of our research is to create a personal assistant for farmers for managing plant disease, although the model was validated using tomato crop, it can be easily extended to other crops. The advancement of technology in computing and availability of large data has made possible the success of deep learning applications in computer vision, natural language processing, image recognition, etc. With these robust models and huge smartphone penetration, feasibility of implementation of these models is high resulting in timely advise to the farmers and thus increasing the farmers' income and reducing the input costs.

Keywords: analytics in agriculture, CNN, crop disease detection, data augmentation, image recognition, one shot learning, transfer learning

Procedia PDF Downloads 115

11578 Microgrid Design Under Optimal Control With Batch Reinforcement Learning

Authors: Valentin Père, Mathieu Milhé, Fabien Baillon, Jean-Louis Dirion

Abstract:

Microgrids offer potential solutions to meet the need for local grid stability and increase isolated networks autonomy with the integration of intermittent renewable energy production and storage facilities. In such a context, sizing production and storage for a given network is a complex task, highly depending on input data such as power load profile and renewable resource availability. This work aims at developing an operating cost computation methodology for different microgrid designs based on the use of deep reinforcement learning (RL) algorithms to tackle the optimal operation problem in stochastic environments. RL is a data-based sequential decision control method based on Markov decision processes that enable the consideration of random variables for control at a chosen time scale. Agents trained via RL constitute a promising class of Energy Management Systems (EMS) for the operation of microgrids with energy storage. Microgrid sizing (or design) is generally performed by minimizing investment costs and operational costs arising from the EMS behavior. The latter might include economic aspects (power purchase, facilities aging), social aspects (load curtailment), and ecological aspects (carbon emissions). Sizing variables are related to major constraints on the optimal operation of the network by the EMS. In this work, an islanded mode microgrid is considered. Renewable generation is done with photovoltaic panels; an electrochemical battery ensures short-term electricity storage. The controllable unit is a hydrogen tank that is used as a long-term storage unit. The proposed approach focus on the transfer of agent learning for the near-optimal operating cost approximation with deep RL for each microgrid size. Like most data-based algorithms, the training step in RL leads to important computer time. The objective of this work is thus to study the potential of Batch-Constrained Q-learning (BCQ) for the optimal sizing of microgrids and especially to reduce the computation time of operating cost estimation in several microgrid configurations. BCQ is an off-line RL algorithm that is known to be data efficient and can learn better policies than on-line RL algorithms on the same buffer. The general idea is to use the learned policy of agents trained in similar environments to constitute a buffer. The latter is used to train BCQ, and thus the agent learning can be performed without update during interaction sampling. A comparison between online RL and the presented method is performed based on the score by environment and on the computation time.

Keywords: batch-constrained reinforcement learning, control, design, optimal

Procedia PDF Downloads 117

11577 Improving Axial-Attention Network via Cross-Channel Weight Sharing

Authors: Nazmul Shahadat, Anthony S. Maida

Abstract:

In recent years, hypercomplex inspired neural networks improved deep CNN architectures due to their ability to share weights across input channels and thus improve cohesiveness of representations within the layers. The work described herein studies the effect of replacing existing layers in an Axial Attention ResNet with their quaternion variants that use cross-channel weight sharing to assess the effect on image classification. We expect the quaternion enhancements to produce improved feature maps with more interlinked representations. We experiment with the stem of the network, the bottleneck layer, and the fully connected backend by replacing them with quaternion versions. These modifications lead to novel architectures which yield improved accuracy performance on the ImageNet300k classification dataset. Our baseline networks for comparison were the original real-valued ResNet, the original quaternion-valued ResNet, and the Axial Attention ResNet. Since improvement was observed regardless of which part of the network was modified, there is a promise that this technique may be generally useful in improving classification accuracy for a large class of networks.

Keywords: axial attention, representational networks, weight sharing, cross-channel correlations, quaternion-enhanced axial attention, deep networks

Procedia PDF Downloads 79

11576 Thick Data Analytics for Learning Cataract Severity: A Triplet Loss Siamese Neural Network Model

Authors: Jinan Fiaidhi, Sabah Mohammed

Abstract:

Diagnosing cataract severity is an important factor in deciding to undertake surgery. It is usually conducted by an ophthalmologist or through taking a variety of fundus photography that needs to be examined by the ophthalmologist. This paper carries out an investigation using a Siamese neural net that can be trained with small anchor samples to score cataract severity. The model used in this paper is based on a triplet loss function that takes the ophthalmologist best experience in rating positive and negative anchors to a specific cataract scaling system. This approach that takes the heuristics of the ophthalmologist is generally called the thick data approach, which is a kind of machine learning approach that learn from a few shots. Clinical Relevance: The lens of the eye is mostly made up of water and proteins. A cataract occurs when these proteins at the eye lens start to clump together and block lights causing impair vision. This research aims at employing thick data machine learning techniques to rate the severity of the cataract using Siamese neural network.

Keywords: thick data analytics, siamese neural network, triplet-loss model, few shot learning

Procedia PDF Downloads 104

11575 Deep Reinforcement Learning for Advanced Pressure Management in Water Distribution Networks

Authors: Ahmed Negm, George Aggidis, Xiandong Ma

Abstract:

With the diverse nature of urban cities, customer demand patterns, landscape topologies or even seasonal weather trends; managing our water distribution networks (WDNs) has proved a complex task. These unpredictable circumstances manifest as pipe failures, intermittent supply and burst events thus adding to water loss, energy waste and increased carbon emissions. Whilst these events are unavoidable, advanced pressure management has proved an effective tool to control and mitigate them. Henceforth, water utilities have struggled with developing a real-time control method that is resilient when confronting the challenges of water distribution. In this paper we use deep reinforcement learning (DRL) algorithms as a novel pressure control strategy to minimise pressure violations and leakage under both burst and background leakage conditions. Agents based on asynchronous actor critic (A2C) and recurrent proximal policy optimisation (Recurrent PPO) were trained and compared to benchmarked optimisation algorithms (differential evolution, particle swarm optimisation. A2C manages to minimise leakage by 32.48% under burst conditions and 67.17% under background conditions which was the highest performance in the DRL algorithms. A2C and Recurrent PPO performed well in comparison to the benchmarks with higher processing speed and lower computational effort.

Keywords: deep reinforcement learning, pressure management, water distribution networks, leakage management

Procedia PDF Downloads 81

11574 Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Authors: Ulas Vural, M. Ergun Okay, E. Mesut Yildiz

Abstract:

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Keywords: customer relationship management, churn prediction, telecom industry, deep learning, artificial neural networks

Procedia PDF Downloads 140

11573 Real-Time Pedestrian Detection Method Based on Improved YOLOv3

Authors: Jingting Luo, Yong Wang, Ying Wang

Abstract:

Pedestrian detection in image or video data is a very important and challenging task in security surveillance. The difficulty of this task is to locate and detect pedestrians of different scales in complex scenes accurately. To solve these problems, a deep neural network (RT-YOLOv3) is proposed to realize real-time pedestrian detection at different scales in security monitoring. RT-YOLOv3 improves the traditional YOLOv3 algorithm. Firstly, the deep residual network is added to extract vehicle features. Then six convolutional neural networks with different scales are designed and fused with the corresponding scale feature maps in the residual network to form the final feature pyramid to perform pedestrian detection tasks. This method can better characterize pedestrians. In order to further improve the accuracy and generalization ability of the model, a hybrid pedestrian data set training method is used to extract pedestrian data from the VOC data set and train with the INRIA pedestrian data set. Experiments show that the proposed RT-YOLOv3 method achieves 93.57% accuracy of mAP (mean average precision) and 46.52f/s (number of frames per second). In terms of accuracy, RT-YOLOv3 performs better than Fast R-CNN, Faster R-CNN, YOLO, SSD, YOLOv2, and YOLOv3. This method reduces the missed detection rate and false detection rate, improves the positioning accuracy, and meets the requirements of real-time detection of pedestrian objects.

Keywords: pedestrian detection, feature detection, convolutional neural network, real-time detection, YOLOv3

Procedia PDF Downloads 138

11572 Distributed Coverage Control by Robot Networks in Unknown Environments Using a Modified EM Algorithm

Authors: Mohammadhosein Hasanbeig, Lacra Pavel

Abstract:

In this paper, we study a distributed control algorithm for the problem of unknown area coverage by a network of robots. The coverage objective is to locate a set of targets in the area and to minimize the robots’ energy consumption. The robots have no prior knowledge about the location and also about the number of the targets in the area. One efficient approach that can be used to relax the robots’ lack of knowledge is to incorporate an auxiliary learning algorithm into the control scheme. A learning algorithm actually allows the robots to explore and study the unknown environment and to eventually overcome their lack of knowledge. The control algorithm itself is modeled based on game theory where the network of the robots use their collective information to play a non-cooperative potential game. The algorithm is tested via simulations to verify its performance and adaptability.

Keywords: distributed control, game theory, multi-agent learning, reinforcement learning

Procedia PDF Downloads 454