Search results for: images processing
5375 Traffic Density Measurement by Automatic Detection of the Vehicles Using Gradient Vectors from Aerial Images
Authors: Saman Ghaffarian, Ilgin Gökaşar
Abstract:
This paper presents a new automatic vehicle detection method from very high resolution aerial images to measure traffic density. The proposed method starts by extracting road regions from image using road vector data. Then, the road image is divided into equal sections considering resolution of the images. Gradient vectors of the road image are computed from edge map of the corresponding image. Gradient vectors on the each boundary of the sections are divided where the gradient vectors significantly change their directions. Finally, number of vehicles in each section is carried out by calculating the standard deviation of the gradient vectors in each group and accepting the group as vehicle that has standard deviation above predefined threshold value. The proposed method was tested in four very high resolution aerial images acquired from Istanbul, Turkey which illustrate roads and vehicles with diverse characteristics. The results show the reliability of the proposed method in detecting vehicles by producing 86% overall F1 accuracy value.Keywords: aerial images, intelligent transportation systems, traffic density measurement, vehicle detection
Procedia PDF Downloads 3795374 Optimization of Solar Tracking Systems
Authors: A. Zaher, A. Traore, F. Thiéry, T. Talbert, B. Shaer
Abstract:
In this paper, an intelligent approach is proposed to optimize the orientation of continuous solar tracking systems on cloudy days. Considering the weather case, the direct sunlight is more important than the diffuse radiation in case of clear sky. Thus, the panel is always pointed towards the sun. In case of an overcast sky, the solar beam is close to zero, and the panel is placed horizontally to receive the maximum of diffuse radiation. Under partly covered conditions, the panel must be pointed towards the source that emits the maximum of solar energy and it may be anywhere in the sky dome. Thus, the idea of our approach is to analyze the images, captured by ground-based sky camera system, in order to detect the zone in the sky dome which is considered as the optimal source of energy under cloudy conditions. The proposed approach is implemented using experimental setup developed at PROMES-CNRS laboratory in Perpignan city (France). Under overcast conditions, the results were very satisfactory, and the intelligent approach has provided efficiency gains of up to 9% relative to conventional continuous sun tracking systems.Keywords: clouds detection, fuzzy inference systems, images processing, sun trackers
Procedia PDF Downloads 1925373 Classifier for Liver Ultrasound Images
Authors: Soumya Sajjan
Abstract:
Liver cancer is the most common cancer disease worldwide in men and women, and is one of the few cancers still on the rise. Liver disease is the 4th leading cause of death. According to new NHS (National Health Service) figures, deaths from liver diseases have reached record levels, rising by 25% in less than a decade; heavy drinking, obesity, and hepatitis are believed to be behind the rise. In this study, we focus on Development of Diagnostic Classifier for Ultrasound liver lesion. Ultrasound (US) Sonography is an easy-to-use and widely popular imaging modality because of its ability to visualize many human soft tissues/organs without any harmful effect. This paper will provide an overview of underlying concepts, along with algorithms for processing of liver ultrasound images Naturaly, Ultrasound liver lesion images are having more spackle noise. Developing classifier for ultrasound liver lesion image is a challenging task. We approach fully automatic machine learning system for developing this classifier. First, we segment the liver image by calculating the textural features from co-occurrence matrix and run length method. For classification, Support Vector Machine is used based on the risk bounds of statistical learning theory. The textural features for different features methods are given as input to the SVM individually. Performance analysis train and test datasets carried out separately using SVM Model. Whenever an ultrasonic liver lesion image is given to the SVM classifier system, the features are calculated, classified, as normal and diseased liver lesion. We hope the result will be helpful to the physician to identify the liver cancer in non-invasive method.Keywords: segmentation, Support Vector Machine, ultrasound liver lesion, co-occurance Matrix
Procedia PDF Downloads 4115372 Dark and Bright Envelopes for Dehazing Images
Authors: Zihan Yu, Kohei Inoue, Kiichi Urahama
Abstract:
We present a method for de-hazing images. A dark envelope image is derived with the bilateral minimum filter and a bright envelope is derived with the bilateral maximum filter. The ambient light and transmission of the scene are estimated from these two envelope images. An image without haze is reconstructed from the estimated ambient light and transmission.Keywords: image dehazing, bilateral minimum filter, bilateral maximum filter, local contrast
Procedia PDF Downloads 2635371 Differences in the Processing of Sentences with Lexical Ambiguity and Structural Ambiguity: An Experimental Study
Authors: Mariana T. Teixeira, Joana P. Luz
Abstract:
This paper is based on assumptions of psycholinguistics and investigates the processing of ambiguous sentences in Brazilian Portuguese. Specifically, it aims to verify if there is a difference in processing time between sentences with lexical ambiguity and sentences with structural (or syntactic) ambiguity. We hypothesize, based on the Garden Path Theory, that the two types of ambiguity entail different cognitive efforts, since sentences with structural ambiguity require that two structures be processed, whereas ambiguous phrases whose root of ambiguity is in a word require the processing of a single structure, which admits a variation of punctual meaning, within the scope of only one lexical item. In order to test this hypothesis, 25 undergraduate students, whose average age was 27.66 years, native speakers of Brazilian Portuguese, performed a self-monitoring reading task of ambiguous sentences, which had lexical and structural ambiguity. The results suggest that unambiguous sentence processing is faster than ambiguous sentence processing, whether it has lexical or structural ambiguity. In addition, participants presented a mean reading time greater for sentences with syntactic ambiguity than for sentences with lexical ambiguity, evidencing a greater cognitive effort in sentence processing with structural ambiguity.Keywords: Brazilian portuguese, lexical ambiguity, sentence processing, syntactic ambiguity
Procedia PDF Downloads 2285370 Blind Data Hiding Technique Using Interpolation of Subsampled Images
Authors: Singara Singh Kasana, Pankaj Garg
Abstract:
In this paper, a blind data hiding technique based on interpolation of sub sampled versions of a cover image is proposed. Sub sampled image is taken as a reference image and an interpolated image is generated from this reference image. Then difference between original cover image and interpolated image is used to embed secret data. Comparisons with the existing interpolation based techniques show that proposed technique provides higher embedding capacity and better visual quality marked images. Moreover, the performance of the proposed technique is more stable for different images.Keywords: interpolation, image subsampling, PSNR, SIM
Procedia PDF Downloads 5785369 Using Machine Learning to Classify Different Body Parts and Determine Healthiness
Authors: Zachary Pan
Abstract:
Our general mission is to solve the problem of classifying images into different body part types and deciding if each of them is healthy or not. However, for now, we will determine healthiness for only one-sixth of the body parts, specifically the chest. We will detect pneumonia in X-ray scans of those chest images. With this type of AI, doctors can use it as a second opinion when they are taking CT or X-ray scans of their patients. Another ad-vantage of using this machine learning classifier is that it has no human weaknesses like fatigue. The overall ap-proach to this problem is to split the problem into two parts: first, classify the image, then determine if it is healthy. In order to classify the image into a specific body part class, the body parts dataset must be split into test and training sets. We can then use many models, like neural networks or logistic regression models, and fit them using the training set. Now, using the test set, we can obtain a realistic accuracy the models will have on images in the real world since these testing images have never been seen by the models before. In order to increase this testing accuracy, we can also apply many complex algorithms to the models, like multiplicative weight update. For the second part of the problem, to determine if the body part is healthy, we can have another dataset consisting of healthy and non-healthy images of the specific body part and once again split that into the test and training sets. We then use another neural network to train on those training set images and use the testing set to figure out its accuracy. We will do this process only for the chest images. A major conclusion reached is that convolutional neural networks are the most reliable and accurate at image classification. In classifying the images, the logistic regression model, the neural network, neural networks with multiplicative weight update, neural networks with the black box algorithm, and the convolutional neural network achieved 96.83 percent accuracy, 97.33 percent accuracy, 97.83 percent accuracy, 96.67 percent accuracy, and 98.83 percent accuracy, respectively. On the other hand, the overall accuracy of the model that de-termines if the images are healthy or not is around 78.37 percent accuracy.Keywords: body part, healthcare, machine learning, neural networks
Procedia PDF Downloads 1035368 A Comparative Study of Medical Image Segmentation Methods for Tumor Detection
Authors: Mayssa Bensalah, Atef Boujelben, Mouna Baklouti, Mohamed Abid
Abstract:
Image segmentation has a fundamental role in analysis and interpretation for many applications. The automated segmentation of organs and tissues throughout the body using computed imaging has been rapidly increasing. Indeed, it represents one of the most important parts of clinical diagnostic tools. In this paper, we discuss a thorough literature review of recent methods of tumour segmentation from medical images which are briefly explained with the recent contribution of various researchers. This study was followed by comparing these methods in order to define new directions to develop and improve the performance of the segmentation of the tumour area from medical images.Keywords: features extraction, image segmentation, medical images, tumor detection
Procedia PDF Downloads 1685367 An Event-Related Potentials Study on the Processing of English Subjunctive Mood by Chinese ESL Learners
Authors: Yan Huang
Abstract:
Event-related potentials (ERPs) technique helps researchers to make continuous measures on the whole process of language comprehension, with an excellent temporal resolution at the level of milliseconds. The research on sentence processing has developed from the behavioral level to the neuropsychological level, which brings about a variety of sentence processing theories and models. However, the applicability of these models to L2 learners is still under debate. Therefore, the present study aims to investigate the neural mechanisms underlying English subjunctive mood processing by Chinese ESL learners. To this end, English subject clauses with subjunctive moods are used as the stimuli, all of which follow the same syntactic structure, “It is + adjective + that … + (should) do + …” Besides, in order to examine the role that language proficiency plays on L2 processing, this research deals with two groups of Chinese ESL learners (18 males and 22 females, mean age=21.68), namely, high proficiency group (Group H) and low proficiency group (Group L). Finally, the behavioral and neurophysiological data analysis reveals the following findings: 1) Syntax and semantics interact with each other on the SECOND phase (300-500ms) of sentence processing, which is partially in line with the Three-phase Sentence Model; 2) Language proficiency does affect L2 processing. Specifically, for Group H, it is the syntactic processing that plays the dominant role in sentence processing while for Group L, semantic processing also affects the syntactic parsing during the THIRD phase of sentence processing (500-700ms). Besides, Group H, compared to Group L, demonstrates a richer native-like ERPs pattern, which further demonstrates the role of language proficiency in L2 processing. Based on the research findings, this paper also provides some enlightenment for the L2 pedagogy as well as the L2 proficiency assessment.Keywords: Chinese ESL learners, English subjunctive mood, ERPs, L2 processing
Procedia PDF Downloads 1315366 Tumor Size and Lymph Node Metastasis Detection in Colon Cancer Patients Using MR Images
Authors: Mohammadreza Hedyehzadeh, Mahdi Yousefi
Abstract:
Colon cancer is one of the most common cancer, which predicted to increase its prevalence due to the bad eating habits of peoples. Nowadays, due to the busyness of people, the use of fast foods is increasing, and therefore, diagnosis of this disease and its treatment are of particular importance. To determine the best treatment approach for each specific colon cancer patients, the oncologist should be known the stage of the tumor. The most common method to determine the tumor stage is TNM staging system. In this system, M indicates the presence of metastasis, N indicates the extent of spread to the lymph nodes, and T indicates the size of the tumor. It is clear that in order to determine all three of these parameters, an imaging method must be used, and the gold standard imaging protocols for this purpose are CT and PET/CT. In CT imaging, due to the use of X-rays, the risk of cancer and the absorbed dose of the patient is high, while in the PET/CT method, there is a lack of access to the device due to its high cost. Therefore, in this study, we aimed to estimate the tumor size and the extent of its spread to the lymph nodes using MR images. More than 1300 MR images collected from the TCIA portal, and in the first step (pre-processing), histogram equalization to improve image qualities and resizing to get the same image size was done. Two expert radiologists, which work more than 21 years on colon cancer cases, segmented the images and extracted the tumor region from the images. The next step is feature extraction from segmented images and then classify the data into three classes: T0N0، T3N1 و T3N2. In this article, the VGG-16 convolutional neural network has been used to perform both of the above-mentioned tasks, i.e., feature extraction and classification. This network has 13 convolution layers for feature extraction and three fully connected layers with the softmax activation function for classification. In order to validate the proposed method, the 10-fold cross validation method used in such a way that the data was randomly divided into three parts: training (70% of data), validation (10% of data) and the rest for testing. It is repeated 10 times, each time, the accuracy, sensitivity and specificity of the model are calculated and the average of ten repetitions is reported as the result. The accuracy, specificity and sensitivity of the proposed method for testing dataset was 89/09%, 95/8% and 96/4%. Compared to previous studies, using a safe imaging technique (MRI) and non-use of predefined hand-crafted imaging features to determine the stage of colon cancer patients are some of the study advantages.Keywords: colon cancer, VGG-16, magnetic resonance imaging, tumor size, lymph node metastasis
Procedia PDF Downloads 595365 Evaluation of Fusion Sonar and Stereo Camera System for 3D Reconstruction of Underwater Archaeological Object
Authors: Yadpiroon Onmek, Jean Triboulet, Sebastien Druon, Bruno Jouvencel
Abstract:
The objective of this paper is to develop the 3D underwater reconstruction of archaeology object, which is based on the fusion between a sonar system and stereo camera system. The underwater images are obtained from a calibrated camera system. The multiples image pairs are input, and we first solve the problem of image processing by applying the well-known filter, therefore to improve the quality of underwater images. The features of interest between image pairs are selected by well-known methods: a FAST detector and FLANN descriptor. Subsequently, the RANSAC method is applied to reject outlier points. The putative inliers are matched by triangulation to produce the local sparse point clouds in 3D space, using a pinhole camera model and Euclidean distance estimation. The SFM technique is used to carry out the global sparse point clouds. Finally, the ICP method is used to fusion the sonar information with the stereo model. The final 3D models have a précised by measurement comparing with the real object.Keywords: 3D reconstruction, archaeology, fusion, stereo system, sonar system, underwater
Procedia PDF Downloads 2995364 Computer-Aided Exudate Diagnosis for the Screening of Diabetic Retinopathy
Authors: Shu-Min Tsao, Chung-Ming Lo, Shao-Chun Chen
Abstract:
Most diabetes patients tend to suffer from its complication of retina diseases. Therefore, early detection and early treatment are important. In clinical examinations, using color fundus image was the most convenient and available examination method. According to the exudates appeared in the retinal image, the status of retina can be confirmed. However, the routine screening of diabetic retinopathy by color fundus images would bring time-consuming tasks to physicians. This study thus proposed a computer-aided exudate diagnosis for the screening of diabetic retinopathy. After removing vessels and optic disc in the retinal image, six quantitative features including region number, region area, and gray-scale values etc… were extracted from the remaining regions for classification. As results, all six features were evaluated to be statistically significant (p-value < 0.001). The accuracy of classifying the retinal images into normal and diabetic retinopathy achieved 82%. Based on this system, the clinical workload could be reduced. The examination procedure may also be improved to be more efficient.Keywords: computer-aided diagnosis, diabetic retinopathy, exudate, image processing
Procedia PDF Downloads 2715363 Rigorous Photogrammetric Push-Broom Sensor Modeling for Lunar and Planetary Image Processing
Authors: Ahmed Elaksher, Islam Omar
Abstract:
Accurate geometric relation algorithms are imperative in Earth and planetary satellite and aerial image processing, particularly for high-resolution images that are used for topographic mapping. Most of these satellites carry push-broom sensors. These sensors are optical scanners equipped with linear arrays of CCDs. These sensors have been deployed on most EOSs. In addition, the LROC is equipped with two push NACs that provide 0.5 meter-scale panchromatic images over a 5 km swath of the Moon. The HiRISE carried by the MRO and the HRSC carried by MEX are examples of push-broom sensor that produces images of the surface of Mars. Sensor models developed in photogrammetry relate image space coordinates in two or more images with the 3D coordinates of ground features. Rigorous sensor models use the actual interior orientation parameters and exterior orientation parameters of the camera, unlike approximate models. In this research, we generate a generic push-broom sensor model to process imageries acquired through linear array cameras and investigate its performance, advantages, and disadvantages in generating topographic models for the Earth, Mars, and the Moon. We also compare and contrast the utilization, effectiveness, and applicability of available photogrammetric techniques and softcopies with the developed model. We start by defining an image reference coordinate system to unify image coordinates from all three arrays. The transformation from an image coordinate system to a reference coordinate system involves a translation and three rotations. For any image point within the linear array, its image reference coordinates, the coordinates of the exposure center of the array in the ground coordinate system at the imaging epoch (t), and the corresponding ground point coordinates are related through the collinearity condition that states that all these three points must be on the same line. The rotation angles for each CCD array at the epoch t are defined and included in the transformation model. The exterior orientation parameters of an image line, i.e., coordinates of exposure station and rotation angles, are computed by a polynomial interpolation function in time (t). The parameter (t) is the time at a certain epoch from a certain orbit position. Depending on the types of observations, coordinates, and parameters may be treated as knowns or unknowns differently in various situations. The unknown coefficients are determined in a bundle adjustment. The orientation process starts by extracting the sensor position and, orientation and raw images from the PDS. The parameters of each image line are then estimated and imported into the push-broom sensor model. We also define tie points between image pairs to aid the bundle adjustment model, determine the refined camera parameters, and generate highly accurate topographic maps. The model was tested on different satellite images such as IKONOS, QuickBird, and WorldView-2, HiRISE. It was found that the accuracy of our model is comparable to those of commercial and open-source software, the computational efficiency of the developed model is high, the model could be used in different environments with various sensors, and the implementation process is much more cost-and effort-consuming.Keywords: photogrammetry, push-broom sensors, IKONOS, HiRISE, collinearity condition
Procedia PDF Downloads 635362 Leukocyte Detection Using Image Stitching and Color Overlapping Windows
Authors: Lina, Arlends Chris, Bagus Mulyawan, Agus B. Dharmawan
Abstract:
Blood cell analysis plays a significant role in the diagnosis of human health. As an alternative to the traditional technique conducted by laboratory technicians, this paper presents an automatic white blood cell (leukocyte) detection system using Image Stitching and Color Overlapping Windows. The advantage of this method is to present a detection technique of white blood cells that are robust to imperfect shapes of blood cells with various image qualities. The input for this application is images from a microscope-slide translation video. The preprocessing stage is performed by stitching the input images. First, the overlapping parts of the images are determined, then stitching and blending processes of two input images are performed. Next, the Color Overlapping Windows is performed for white blood cell detection which consists of color filtering, window candidate checking, window marking, finds window overlaps, and window cropping processes. Experimental results show that this method could achieve an average of 82.12% detection accuracy of the leukocyte images.Keywords: color overlapping windows, image stitching, leukocyte detection, white blood cell detection
Procedia PDF Downloads 3105361 A Transformer-Based Approach for Multi-Human 3D Pose Estimation Using Color and Depth Images
Authors: Qiang Wang, Hongyang Yu
Abstract:
Multi-human 3D pose estimation is a challenging task in computer vision, which aims to recover the 3D joint locations of multiple people from multi-view images. In contrast to traditional methods, which typically only use color (RGB) images as input, our approach utilizes both color and depth (D) information contained in RGB-D images. We also employ a transformer-based model as the backbone of our approach, which is able to capture long-range dependencies and has been shown to perform well on various sequence modeling tasks. Our method is trained and tested on the Carnegie Mellon University (CMU) Panoptic dataset, which contains a diverse set of indoor and outdoor scenes with multiple people in varying poses and clothing. We evaluate the performance of our model on the standard 3D pose estimation metrics of mean per-joint position error (MPJPE). Our results show that the transformer-based approach outperforms traditional methods and achieves competitive results on the CMU Panoptic dataset. We also perform an ablation study to understand the impact of different design choices on the overall performance of the model. In summary, our work demonstrates the effectiveness of using a transformer-based approach with RGB-D images for multi-human 3D pose estimation and has potential applications in real-world scenarios such as human-computer interaction, robotics, and augmented reality.Keywords: multi-human 3D pose estimation, RGB-D images, transformer, 3D joint locations
Procedia PDF Downloads 805360 Smartphone Photography in Urban China
Authors: Wen Zhang
Abstract:
The smartphone plays a significant role in media convergence, and smartphone photography is reconstructing the way we communicate and think. This article aims to explore the smartphone photography practices of urban Chinese smartphone users and images produced by smartphones from a techno-cultural perspective. The analysis consists of two types of data: One is a semi-structured interview of 21 participants, and the other consists of the images created by the participants. The findings are organised in two parts. The first part summarises the current tendencies of capturing, editing, sharing and archiving digital images via smartphones. The second part shows that food and selfie/anti-selfie are the preferred subjects of smartphone photographic images from a technical and multi-purpose perspective and demonstrates that screenshots and image texts are new genres of non-photographic images that are frequently made by smartphones, which contributes to improving operational efficiency, disseminating information and sharing knowledge. The analyses illustrate the positive impacts between smartphones and photography enthusiasm and practices based on the diffusion of innovation theory, which also makes us rethink the value of photographs and the practice of ‘photographic seeing’ from the screen itself.Keywords: digital photography, image-text, media convergence, photographic- seeing, selfie/anti-selfie, smartphone, technological innovation
Procedia PDF Downloads 3545359 Investigating the Relationship and Interaction between Auditory Processing Disorder and Auditory Attention
Authors: Amirreza Razzaghipour Sorkhab
Abstract:
The exploration of the connection between cognition and Auditory Processing Disorder (APD) holds significant value. Individuals with APD experience challenges in processing auditory information through the central auditory nervous system's varied pathways. Understanding the importance of auditory attention in individuals with APD, as well as the primary diagnostic tools such as language and auditory attention tests, highlights the critical need for assessing their auditory attention abilities. While not all children with Auditory Processing Disorder (APD) show deficits in auditory attention, there are often deficiencies in cognitive and attentional performance. The link between various types of attention deficits and APD suggests impairments in sustained and divided auditory attention. Research into the origins of APD should also encompass higher-level processes, such as auditory attention. It is evident that investigating the interaction between APD and auditory and cognitive functions holds significant value. Furthermore, it was demonstrated that APD tests may be influenced by cognitive factors, but despite signs of auditory attention interaction with auditory processing skills and the influence of cognitive factors on tests for this disorder, auditory attention measures are not typically included in APD diagnostic protocols. Therefore, incorporating attention assessment tests into the battery of tests for individuals with auditory processing disorder will be beneficial for obtaining useful insights into their attentional abilities.Keywords: auditory processing disorder, auditory attention, central auditory processing disorder, top-down pathway
Procedia PDF Downloads 665358 Multi-Atlas Segmentation Based on Dynamic Energy Model: Application to Brain MR Images
Authors: Jie Huo, Jonathan Wu
Abstract:
Segmentation of anatomical structures in medical images is essential for scientific inquiry into the complex relationships between biological structure and clinical diagnosis, treatment and assessment. As a method of incorporating the prior knowledge and the anatomical structure similarity between a target image and atlases, multi-atlas segmentation has been successfully applied in segmenting a variety of medical images, including the brain, cardiac, and abdominal images. The basic idea of multi-atlas segmentation is to transfer the labels in atlases to the coordinate of the target image by matching the target patch to the atlas patch in the neighborhood. However, this technique is limited by the pairwise registration between target image and atlases. In this paper, a novel multi-atlas segmentation approach is proposed by introducing a dynamic energy model. First, the target is mapped to each atlas image by minimizing the dynamic energy function, then the segmentation of target image is generated by weighted fusion based on the energy. The method is tested on MICCAI 2012 Multi-Atlas Labeling Challenge dataset which includes 20 target images and 15 atlases images. The paper also analyzes the influence of different parameters of the dynamic energy model on the segmentation accuracy and measures the dice coefficient by using different feature terms with the energy model. The highest mean dice coefficient obtained with the proposed method is 0.861, which is competitive compared with the recently published method.Keywords: brain MRI segmentation, dynamic energy model, multi-atlas segmentation, energy minimization
Procedia PDF Downloads 3365357 Best Timing for Capturing Satellite Thermal Images, Asphalt, and Concrete Objects
Authors: Toufic Abd El-Latif Sadek
Abstract:
The asphalt object represents the asphalted areas like roads, and the concrete object represents the concrete areas like concrete buildings. The efficient extraction of asphalt and concrete objects from one satellite thermal image occurred at a specific time, by preventing the gaps in times which give the close and same brightness values between asphalt and concrete, and among other objects. So that to achieve efficient extraction and then better analysis. Seven sample objects were used un this study, asphalt, concrete, metal, rock, dry soil, vegetation, and water. It has been found that, the best timing for capturing satellite thermal images to extract the two objects asphalt and concrete from one satellite thermal image, saving time and money, occurred at a specific time in different months. A table is deduced shows the optimal timing for capturing satellite thermal images to extract effectively these two objects.Keywords: asphalt, concrete, satellite thermal images, timing
Procedia PDF Downloads 3225356 Deep-Learning to Generation of Weights for Image Captioning Using Part-of-Speech Approach
Authors: Tiago do Carmo Nogueira, Cássio Dener Noronha Vinhal, Gélson da Cruz Júnior, Matheus Rudolfo Diedrich Ullmann
Abstract:
Generating automatic image descriptions through natural language is a challenging task. Image captioning is a task that consistently describes an image by combining computer vision and natural language processing techniques. To accomplish this task, cutting-edge models use encoder-decoder structures. Thus, Convolutional Neural Networks (CNN) are used to extract the characteristics of the images, and Recurrent Neural Networks (RNN) generate the descriptive sentences of the images. However, cutting-edge approaches still suffer from problems of generating incorrect captions and accumulating errors in the decoders. To solve this problem, we propose a model based on the encoder-decoder structure, introducing a module that generates the weights according to the importance of the word to form the sentence, using the part-of-speech (PoS). Thus, the results demonstrate that our model surpasses state-of-the-art models.Keywords: gated recurrent units, caption generation, convolutional neural network, part-of-speech
Procedia PDF Downloads 1025355 Leaf Image Processing: Review
Authors: T. Vijayashree, A. Gopal
Abstract:
The aim of the work is to classify and authenticate medicinal plant materials and herbs widely used for Indian herbal medicinal preparation. The quality and authenticity of these raw materials are to be ensured for the preparation of herbal medicines. These raw materials are to be carefully screened, analyzed and documented due to mistaken of look-alike materials which do not have medicinal characteristics.Keywords: authenticity, standardization, principal component analysis, imaging processing, signal processing
Procedia PDF Downloads 2465354 A New 3D Shape Descriptor Based on Multi-Resolution and Multi-Block CS-LBP
Authors: Nihad Karim Chowdhury, Mohammad Sanaullah Chowdhury, Muhammed Jamshed Alam Patwary, Rubel Biswas
Abstract:
In content-based 3D shape retrieval system, achieving high search performance has become an important research problem. A challenging aspect of this problem is to find an effective shape descriptor which can discriminate similar shapes adequately. To address this problem, we propose a new shape descriptor for 3D shape models by combining multi-resolution with multi-block center-symmetric local binary pattern operator. Given an arbitrary 3D shape, we first apply pose normalization, and generate a set of multi-viewed 2D rendered images. Second, we apply Gaussian multi-resolution filter to generate several levels of images from each of 2D rendered image. Then, overlapped sub-images are computed for each image level of a multi-resolution image. Our unique multi-block CS-LBP comes next. It allows the center to be composed of m-by-n rectangular pixels, instead of a single pixel. This process is repeated for all the 2D rendered images, derived from both ‘depth-buffer’ and ‘silhouette’ rendering. Finally, we concatenate all the features vectors into one dimensional histogram as our proposed 3D shape descriptor. Through several experiments, we demonstrate that our proposed 3D shape descriptor outperform the previous methods by using a benchmark dataset.Keywords: 3D shape retrieval, 3D shape descriptor, CS-LBP, overlapped sub-images
Procedia PDF Downloads 4455353 Strength Evaluation by Finite Element Analysis of Mesoscale Concrete Models Developed from CT Scan Images of Concrete Cube
Authors: Nirjhar Dhang, S. Vinay Kumar
Abstract:
Concrete is a non-homogeneous mix of coarse aggregates, sand, cement, air-voids and interfacial transition zone (ITZ) around aggregates. Adoption of these complex structures and material properties in numerical simulation would lead us to better understanding and design of concrete. In this work, the mesoscale model of concrete has been prepared from X-ray computerized tomography (CT) image. These images are converted into computer model and numerically simulated using commercially available finite element software. The mesoscale models are simulated under the influence of compressive displacement. The effect of shape and distribution of aggregates, continuous and discrete ITZ thickness, voids, and variation of mortar strength has been investigated. The CT scan of concrete cube consists of series of two dimensional slices. Total 49 slices are obtained from a cube of 150mm and the interval of slices comes approximately 3mm. In CT scan images, the same cube can be CT scanned in a non-destructive manner and later the compression test can be carried out in a universal testing machine (UTM) for finding its strength. The image processing and extraction of mortar and aggregates from CT scan slices are performed by programming in Python. The digital colour image consists of red, green and blue (RGB) pixels. The conversion of RGB image to black and white image (BW) is carried out, and identification of mesoscale constituents is made by putting value between 0-255. The pixel matrix is created for modeling of mortar, aggregates, and ITZ. Pixels are normalized to 0-9 scale considering the relative strength. Here, zero is assigned to voids, 4-6 for mortar and 7-9 for aggregates. The value between 1-3 identifies boundary between aggregates and mortar. In the next step, triangular and quadrilateral elements for plane stress and plane strain models are generated depending on option given. Properties of materials, boundary conditions, and analysis scheme are specified in this module. The responses like displacement, stresses, and damages are evaluated by ABAQUS importing the input file. This simulation evaluates compressive strengths of 49 slices of the cube. The model is meshed with more than sixty thousand elements. The effect of shape and distribution of aggregates, inclusion of voids and variation of thickness of ITZ layer with relation to load carrying capacity, stress-strain response and strain localizations of concrete have been studied. The plane strain condition carried more load than plane stress condition due to confinement. The CT scan technique can be used to get slices from concrete cores taken from the actual structure, and the digital image processing can be used for finding the shape and contents of aggregates in concrete. This may be further compared with test results of concrete cores and can be used as an important tool for strength evaluation of concrete.Keywords: concrete, image processing, plane strain, interfacial transition zone
Procedia PDF Downloads 2415352 Using Deep Learning Real-Time Object Detection Convolution Neural Networks for Fast Fruit Recognition in the Tree
Authors: K. Bresilla, L. Manfrini, B. Morandi, A. Boini, G. Perulli, L. C. Grappadelli
Abstract:
Image/video processing for fruit in the tree using hard-coded feature extraction algorithms have shown high accuracy during recent years. While accurate, these approaches even with high-end hardware are computationally intensive and too slow for real-time systems. This paper details the use of deep convolution neural networks (CNNs), specifically an algorithm (YOLO - You Only Look Once) with 24+2 convolution layers. Using deep-learning techniques eliminated the need for hard-code specific features for specific fruit shapes, color and/or other attributes. This CNN is trained on more than 5000 images of apple and pear fruits on 960 cores GPU (Graphical Processing Unit). Testing set showed an accuracy of 90%. After this, trained data were transferred to an embedded device (Raspberry Pi gen.3) with camera for more portability. Based on correlation between number of visible fruits or detected fruits on one frame and the real number of fruits on one tree, a model was created to accommodate this error rate. Speed of processing and detection of the whole platform was higher than 40 frames per second. This speed is fast enough for any grasping/harvesting robotic arm or other real-time applications.Keywords: artificial intelligence, computer vision, deep learning, fruit recognition, harvesting robot, precision agriculture
Procedia PDF Downloads 4205351 Image Quality and Dose Optimisations in Digital and Computed Radiography X-ray Radiography Using Lumbar Spine Phantom
Authors: Elhussaien Elshiekh
Abstract:
A study was performed to management and compare radiation doses and image quality during Lumbar spine PA and Lumbar spine LAT, x- ray radiography using Computed Radiography (CR) and Digital Radiography (DR). Standard exposure factors such as kV, mAs and FFD used for imaging the Lumbar spine anthropomorphic phantom obtained from average exposure factors that were used with CR in five radiology centres. Lumbar spine phantom was imaged using CR and DR systems. Entrance surface air kerma (ESAK) was calculated X-ray tube output and patient exposure factor. Images were evaluated using visual grading system based on the European Guidelines on Quality Criteria for diagnostic radiographic images. The ESAK corresponding to each image was measured at the surface of the phantom. Six experienced specialists evaluated hard copies of all the images, the image score (IS) was calculated for each image by finding the average score of the Six evaluators. The IS value also was used to determine whether an image was diagnostically acceptable. The optimum recommended exposure factors founded here for Lumbar spine PA and Lumbar spine LAT, with respectively (80 kVp,25 mAs at 100 cm FFD) and (75 kVp,15 mAs at 100 cm FFD) for CR system, and (80 kVp,15 mAs at100 cm FFD) and (75 kVp,10 mAs at 100 cm FFD) for DR system. For Lumbar spine PA, the lowest ESAK value required to obtain a diagnostically acceptable image were 0.80 mGy for DR and 1.20 mGy for CR systems. Similarly for Lumbar spine LAT projection, the lowest ESAK values to obtain a diagnostically acceptable image were 0.62 mGy for DR and 0.76 mGy for CR systems. At standard kVp and mAs values, the image quality did not vary significantly between the CR and the DR system, but at higher kVp and mAs values, the DR images were found to be of better quality than CR images. In addition, the lower limit of entrance skin dose consistent with diagnostically acceptable DR images was 40% lower than that for CR images.Keywords: image quality, dosimetry, radiation protection, optimization, digital radiography, computed radiography
Procedia PDF Downloads 515350 Influence of Processing Regime and Contaminants on the Properties of Postconsumer Thermoplastics
Authors: Fares Alsewailem
Abstract:
Material recycling of thermoplastic waste offers practical solution for municipal solid waste reduction. Post-consumer plastics such as polyethylene (PE), polyethyleneterephtalate (PET), and polystyrene (PS) may be separated from each other by physical methods such as density difference and hence processed as single plastic, however one should be cautious about the contaminants presence in the waste stream inform of paper, glue, etc. since these articles even in trace amount may deteriorate properties of the recycled plastics especially the mechanical properties. furthermore, melt processing methods used to recycle thermoplastics such as extrusion and compression molding may induce degradation of some of the recycled plastics such as PET and PS. In this research, it is shown that care should be taken when processing recycled plastics by melt processing means in two directions, first contaminants should be extremely minimized, and secondly melt processing steps should also be minimum.Keywords: Recycling, PET, PS, HDPE, mechanical
Procedia PDF Downloads 2845349 Object Detection Based on Plane Segmentation and Features Matching for a Service Robot
Authors: António J. R. Neves, Rui Garcia, Paulo Dias, Alina Trifan
Abstract:
With the aging of the world population and the continuous growth in technology, service robots are more and more explored nowadays as alternatives to healthcare givers or personal assistants for the elderly or disabled people. Any service robot should be capable of interacting with the human companion, receive commands, navigate through the environment, either known or unknown, and recognize objects. This paper proposes an approach for object recognition based on the use of depth information and color images for a service robot. We present a study on two of the most used methods for object detection, where 3D data is used to detect the position of objects to classify that are found on horizontal surfaces. Since most of the objects of interest accessible for service robots are on these surfaces, the proposed 3D segmentation reduces the processing time and simplifies the scene for object recognition. The first approach for object recognition is based on color histograms, while the second is based on the use of the SIFT and SURF feature descriptors. We present comparative experimental results obtained with a real service robot.Keywords: object detection, feature, descriptors, SIFT, SURF, depth images, service robots
Procedia PDF Downloads 5465348 Object Detection in Digital Images under Non-Standardized Conditions Using Illumination and Shadow Filtering
Authors: Waqqas-ur-Rehman Butt, Martin Servin, Marion Pause
Abstract:
In recent years, object detection has gained much attention and very encouraging research area in the field of computer vision. The robust object boundaries detection in an image is demanded in numerous applications of human computer interaction and automated surveillance systems. Many methods and approaches have been developed for automatic object detection in various fields, such as automotive, quality control management and environmental services. Inappropriately, to the best of our knowledge, object detection under illumination with shadow consideration has not been well solved yet. Furthermore, this problem is also one of the major hurdles to keeping an object detection method from the practical applications. This paper presents an approach to automatic object detection in images under non-standardized environmental conditions. A key challenge is how to detect the object, particularly under uneven illumination conditions. Image capturing conditions the algorithms need to consider a variety of possible environmental factors as the colour information, lightening and shadows varies from image to image. Existing methods mostly failed to produce the appropriate result due to variation in colour information, lightening effects, threshold specifications, histogram dependencies and colour ranges. To overcome these limitations we propose an object detection algorithm, with pre-processing methods, to reduce the interference caused by shadow and illumination effects without fixed parameters. We use the Y CrCb colour model without any specific colour ranges and predefined threshold values. The segmented object regions are further classified using morphological operations (Erosion and Dilation) and contours. Proposed approach applied on a large image data set acquired under various environmental conditions for wood stack detection. Experiments show the promising result of the proposed approach in comparison with existing methods.Keywords: image processing, illumination equalization, shadow filtering, object detection
Procedia PDF Downloads 2165347 Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images
Authors: Elham Bagheri, Yalda Mohsenzadeh
Abstract:
Image memorability refers to the phenomenon where certain images are more likely to be remembered by humans than others. It is a quantifiable and intrinsic attribute of an image. Understanding how visual perception and memory interact is important in both cognitive science and artificial intelligence. It reveals the complex processes that support human cognition and helps to improve machine learning algorithms by mimicking the brain's efficient data processing and storage mechanisms. To explore the computational underpinnings of image memorability, this study examines the relationship between an image's reconstruction error, distinctiveness in latent space, and its memorability score. A trained autoencoder is used to replicate human-like memorability assessment inspired by the visual memory game employed in memorability estimations. This study leverages a VGG-based autoencoder that is pre-trained on the vast ImageNet dataset, enabling it to recognize patterns and features that are common to a wide and diverse range of images. An empirical analysis is conducted using the MemCat dataset, which includes 10,000 images from five broad categories: animals, sports, food, landscapes, and vehicles, along with their corresponding memorability scores. The memorability score assigned to each image represents the probability of that image being remembered by participants after a single exposure. The autoencoder is finetuned for one epoch with a batch size of one, attempting to create a scenario similar to human memorability experiments where memorability is quantified by the likelihood of an image being remembered after being seen only once. The reconstruction error, which is quantified as the difference between the original and reconstructed images, serves as a measure of how well the autoencoder has learned to represent the data. The reconstruction error of each image, the error reduction, and its distinctiveness in latent space are calculated and correlated with the memorability score. Distinctiveness is measured as the Euclidean distance between each image's latent representation and its nearest neighbor within the autoencoder's latent space. Different structural and perceptual loss functions are considered to quantify the reconstruction error. The results indicate that there is a strong correlation between the reconstruction error and the distinctiveness of images and their memorability scores. This suggests that images with more unique distinct features that challenge the autoencoder's compressive capacities are inherently more memorable. There is also a negative correlation between the reduction in reconstruction error compared to the autoencoder pre-trained on ImageNet, which suggests that highly memorable images are harder to reconstruct, probably due to having features that are more difficult to learn by the autoencoder. These insights suggest a new pathway for evaluating image memorability, which could potentially impact industries reliant on visual content and mark a step forward in merging the fields of artificial intelligence and cognitive science. The current research opens avenues for utilizing neural representations as instruments for understanding and predicting visual memory.Keywords: autoencoder, computational vision, image memorability, image reconstruction, memory retention, reconstruction error, visual perception
Procedia PDF Downloads 915346 Mapping of Geological Structures Using Aerial Photography
Authors: Ankit Sharma, Mudit Sachan, Anurag Prakash
Abstract:
Rapid growth in data acquisition technologies through drones, have led to advances and interests in collecting high-resolution images of geological fields. Being advantageous in capturing high volume of data in short flights, a number of challenges have to overcome for efficient analysis of this data, especially while data acquisition, image interpretation and processing. We introduce a method that allows effective mapping of geological fields using photogrammetric data of surfaces, drainage area, water bodies etc, which will be captured by airborne vehicles like UAVs, we are not taking satellite images because of problems in adequate resolution, time when it is captured may be 1 yr back, availability problem, difficult to capture exact image, then night vision etc. This method includes advanced automated image interpretation technology and human data interaction to model structures and. First Geological structures will be detected from the primary photographic dataset and the equivalent three dimensional structures would then be identified by digital elevation model. We can calculate dip and its direction by using the above information. The structural map will be generated by adopting a specified methodology starting from choosing the appropriate camera, camera’s mounting system, UAVs design ( based on the area and application), Challenge in air borne systems like Errors in image orientation, payload problem, mosaicing and geo referencing and registering of different images to applying DEM. The paper shows the potential of using our method for accurate and efficient modeling of geological structures, capture particularly from remote, of inaccessible and hazardous sites.Keywords: digital elevation model, mapping, photogrammetric data analysis, geological structures
Procedia PDF Downloads 686