Search results for: synthetic dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2180

Search results for: synthetic dataset

1880 3D Interferometric Imaging Using Compressive Hardware Technique

Authors: Mor Diama L. O., Matthieu Davy, Laurent Ferro-Famil

Abstract:

In this article, inverse synthetic aperture radar (ISAR) is combined with compressive imaging techniques in order to perform 3D interferometric imaging. Interferometric ISAR (InISAR) imaging relies on a two-dimensional antenna array providing diversities in the elevation and azimuth directions. However, the signals measured over several antennas must be acquired by coherent receivers resulting in costly and complex hardware. This paper proposes to use a chaotic cavity as a compressive device to encode the signals arising from several antennas into a single output port. These signals are then reconstructed by solving an inverse problem. Our approach is demonstrated experimentally with a 3-elements L-shape array connected to a metallic compressive enclosure. The interferometric phases estimated from a unique broadband signal are used to jointly estimate the target’s effective rotation rate and the height of the dominant scattering centers of our target. Our experimental results show that the use of the compressive device does not adversely affect the performance of our imaging process. This study opens new perspectives to reduce the hardware complexity of high-resolution ISAR systems.

Keywords: interferometric imaging, inverse synthetic aperture radar, compressive device, computational imaging

Procedia PDF Downloads 155
1879 DMBR-Net: Deep Multiple-Resolution Bilateral Networks for Real-Time and Accurate Semantic Segmentation

Authors: Pengfei Meng, Shuangcheng Jia, Qian Li

Abstract:

We proposed a real-time high-precision semantic segmentation network based on a multi-resolution feature fusion module, the auxiliary feature extracting module, upsampling module, and atrous spatial pyramid pooling (ASPP) module. We designed a feature fusion structure, which is integrated with sufficient features of different resolutions. We also studied the effect of side-branch structure on the network and made discoveries. Based on the discoveries about the side-branch of the network structure, we used a side-branch auxiliary feature extraction layer in the network to improve the effectiveness of the network. We also designed upsampling module, which has better results than the original upsampling module. In addition, we also re-considered the locations and number of atrous spatial pyramid pooling (ASPP) modules and modified the network structure according to the experimental results to further improve the effectiveness of the network. The network presented in this paper takes the backbone network of Bisenetv2 as a basic network, based on which we constructed a network structure on which we made improvements. We named this network deep multiple-resolution bilateral networks for real-time, referred to as DMBR-Net. After experimental testing, our proposed DMBR-Net network achieved 81.2% mIoU at 119FPS on the Cityscapes validation dataset, 80.7% mIoU at 109FPS on the CamVid test dataset, 29.9% mIoU at 78FPS on the COCOStuff test dataset. Compared with all lightweight real-time semantic segmentation networks, our network achieves the highest accuracy at an appropriate speed.

Keywords: multi-resolution feature fusion, atrous convolutional, bilateral networks, pyramid pooling

Procedia PDF Downloads 146
1878 Discrete Group Search Optimizer for the Travelling Salesman Problem

Authors: Raed Alnajjar, Mohd Zakree, Ahmad Nazri

Abstract:

In this study, we apply Discrete Group Search Optimizer (DGSO) for solving Traveling Salesman Problem (TSP). The DGSO is a nature inspired optimization algorithm that imitates the animal behavior, especially animal searching behavior. The proposed DGSO uses a vector representation and some discrete operators, such as destruction, construction, differential evolution, swap and insert. The TSP is a well-known hard combinatorial optimization problem, which seeks to find the shortest path among numbers of cities. The performance of the proposed DGSO is evaluated and tested on benchmark instances which listed in LIBTSP dataset. The experimental results show that the performance of the proposed DGSO is comparable with the other methods in the state of the art for some instances. The results show that DGSO outperform Ant Colony System (ACS) in some instances whilst outperform other metaheuristic in most instances. In addition to that, the new results obtained a number of optimal solutions and some best known results. DGSO was able to obtain feasible and good quality solution across all dataset.

Keywords: discrete group search optimizer (DGSO); Travelling salesman problem (TSP); Variable neighborhood search(VNS)

Procedia PDF Downloads 320
1877 Solvent Extraction, Spectrophotometric Determination of Antimony(III) from Real Samples and Synthetic Mixtures Using O-Methylphenyl Thiourea as a Sensitive Reagent

Authors: Shashikant R. Kuchekar, Shivaji D. Pulate, Vishwas B. Gaikwad

Abstract:

A simple and selective method is developed for solvent extraction spectrophotometric determination of antimony(III) using O-Methylphenyl Thiourea (OMPT) as a sensitive chromogenic chelating agent. The basis of proposed method is formation of antimony(III)-OMPT complex was extracted with 0.0025 M OMPT in chloroform from aqueous solution of antimony(III) in 1.0 M perchloric acid. The absorbance of this complex was measured at 297 nm against reagent blank. Beer’s law was obeyed up to 15µg mL-1 of antimony(III). The Molar absorptivity and Sandell’s sensitivity of the antimony(III)-OMPT complex in chloroform are 16.6730 × 103 L mol-1 cm-1 and 0.00730282 µg cm-2 respectively. The stoichiometry of antimony(III)-OMPT complex was established from slope ratio method, mole ratio method and Job’s continuous variation method was 1:2. The complex was stable for more than 48 h. The interfering effect of various foreign ions was studied and suitable masking agents are used wherever necessary to enhance selectivity of the method. The proposed method is successfully applied for determination of antimony(III) from real samples alloy and synthetic mixtures. Repetition of the method was checked by finding relative standard deviation (RSD) for 10 determinations which was 0.42%.

Keywords: solvent extraction, antimony, spectrophotometry, real sample analysis

Procedia PDF Downloads 330
1876 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 45
1875 A Dynamic Neural Network Model for Accurate Detection of Masked Faces

Authors: Oladapo Tolulope Ibitoye

Abstract:

Neural networks have become prominent and widely engaged in algorithmic-based machine learning networks. They are perfect in solving day-to-day issues to a certain extent. Neural networks are computing systems with several interconnected nodes. One of the numerous areas of application of neural networks is object detection. This is a prominent area due to the coronavirus disease pandemic and the post-pandemic phases. Wearing a face mask in public slows the spread of the virus, according to experts’ submission. This calls for the development of a reliable and effective model for detecting face masks on people's faces during compliance checks. The existing neural network models for facemask detection are characterized by their black-box nature and large dataset requirement. The highlighted challenges have compromised the performance of the existing models. The proposed model utilized Faster R-CNN Model on Inception V3 backbone to reduce system complexity and dataset requirement. The model was trained and validated with very few datasets and evaluation results shows an overall accuracy of 96% regardless of skin tone.

Keywords: convolutional neural network, face detection, face mask, masked faces

Procedia PDF Downloads 62
1874 A Phishing Email Detection Approach Using Machine Learning Techniques

Authors: Kenneth Fon Mbah, Arash Habibi Lashkari, Ali A. Ghorbani

Abstract:

Phishing e-mails are a security issue that not only annoys online users, but has also resulted in significant financial losses for businesses. Phishing advertisements and pornographic e-mails are difficult to detect as attackers have been becoming increasingly intelligent and professional. Attackers track users and adjust their attacks based on users’ attractions and hot topics that can be extracted from community news and journals. This research focuses on deceptive Phishing attacks and their variants such as attacks through advertisements and pornographic e-mails. We propose a framework called Phishing Alerting System (PHAS) to accurately classify e-mails as Phishing, advertisements or as pornographic. PHAS has the ability to detect and alert users for all types of deceptive e-mails to help users in decision making. A well-known email dataset has been used for these experiments and based on previously extracted features, 93.11% detection accuracy is obtainable by using J48 and KNN machine learning techniques. Our proposed framework achieved approximately the same accuracy as the benchmark while using this dataset.

Keywords: phishing e-mail, phishing detection, anti phishing, alarm system, machine learning

Procedia PDF Downloads 334
1873 Analysis of Diabetes Patients Using Pearson, Cost Optimization, Control Chart Methods

Authors: Devatha Kalyan Kumar, R. Poovarasan

Abstract:

In this paper, we have taken certain important factors and health parameters of diabetes patients especially among children by birth (pediatric congenital) where using the above three metrics methods we are going to assess the importance of each attributes in the dataset and thereby determining the most highly responsible and co-related attribute causing diabetics among young patients. We use cost optimization, control chart and Spearmen methodologies for the real-time application of finding the data efficiency in this diabetes dataset. The Spearmen methodology is the correlation methodologies used in software development process to identify the complexity between the various modules of the software. Identifying the complexity is important because if the complexity is higher, then there is a higher chance of occurrence of the risk in the software. With the use of control; chart mean, variance and standard deviation of data are calculated. With the use of Cost optimization model, we find to optimize the variables. Hence we choose the Spearmen, control chart and cost optimization methods to assess the data efficiency in diabetes datasets.

Keywords: correlation, congenital diabetics, linear relationship, monotonic function, ranking samples, pediatric

Procedia PDF Downloads 252
1872 Intelligent Computing with Bayesian Regularization Artificial Neural Networks for a Nonlinear System of COVID-19 Epidemic Model for Future Generation Disease Control

Authors: Tahir Nawaz Cheema, Dumitru Baleanu, Ali Raza

Abstract:

In this research work, we design intelligent computing through Bayesian Regularization artificial neural networks (BRANNs) introduced to solve the mathematical modeling of infectious diseases (Covid-19). The dynamical transmission is due to the interaction of people and its mathematical representation based on the system's nonlinear differential equations. The generation of the dataset of the Covid-19 model is exploited by the power of the explicit Runge Kutta method for different countries of the world like India, Pakistan, Italy, and many more. The generated dataset is approximately used for training, testing, and validation processes for every frequent update in Bayesian Regularization backpropagation for numerical behavior of the dynamics of the Covid-19 model. The performance and effectiveness of designed methodology BRANNs are checked through mean squared error, error histograms, numerical solutions, absolute error, and regression analysis.

Keywords: mathematical models, beysian regularization, bayesian-regularization backpropagation networks, regression analysis, numerical computing

Procedia PDF Downloads 137
1871 Biosorption of Heavy Metals by Low Cost Adsorbents

Authors: Azam Tabatabaee, Fereshteh Dastgoshadeh, Akram Tabatabaee

Abstract:

This paper describes the use of by-products as adsorbents for removing heavy metals from aqueous effluent solutions. Products of almond skin, walnut shell, saw dust, rice bran and egg shell were evaluated as metal ion adsorbents in aqueous solutions. A comparative study was done with commercial adsorbents like ion exchange resins and activated carbon too. Batch experiments were investigated to determine the affinity of all of biomasses for, Cd(ΙΙ), Cr(ΙΙΙ), Ni(ΙΙ), and Pb(ΙΙ) metal ions at pH 5. The rate of metal ion removal in the synthetic wastewater by the biomass was evaluated by measuring final concentration of synthetic wastewater. At a concentration of metal ion (50 mg/L), egg shell adsorbed high levels (98.6 – 99.7%) of Pb(ΙΙ) and Cr(ΙΙΙ) and walnut shell adsorbed high levels (35.3 – 65.4%) of Ni(ΙΙ) and Cd(ΙΙ). In this study, it has been shown that by-products were excellent adsorbents for removal of toxic ions from wastewater with efficiency comparable to commercially available adsorbents, but at a reduced cost. Also statistical studies using Independent Sample t Test and ANOVA Oneway for statistical comparison between various elements adsorption showed that there isn’t a significant difference in some elements adsorption percentage by by-products and commercial adsorbents.

Keywords: adsorbents, heavy metals, commercial adsorbents, wastewater, by-products

Procedia PDF Downloads 407
1870 Application of Multilinear Regression Analysis for Prediction of Synthetic Shear Wave Velocity Logs in Upper Assam Basin

Authors: Triveni Gogoi, Rima Chatterjee

Abstract:

Shear wave velocity (Vs) estimation is an important approach in the seismic exploration and characterization of a hydrocarbon reservoir. There are varying methods for prediction of S-wave velocity, if recorded S-wave log is not available. But all the available methods for Vs prediction are empirical mathematical models. Shear wave velocity can be estimated using P-wave velocity by applying Castagna’s equation, which is the most common approach. The constants used in Castagna’s equation vary for different lithologies and geological set-ups. In this study, multiple regression analysis has been used for estimation of S-wave velocity. The EMERGE module from Hampson-Russel software has been used here for generation of S-wave log. Both single attribute and multi attributes analysis have been carried out for generation of synthetic S-wave log in Upper Assam basin. Upper Assam basin situated in North Eastern India is one of the most important petroleum provinces of India. The present study was carried out using four wells of the study area. Out of these wells, S-wave velocity was available for three wells. The main objective of the present study is a prediction of shear wave velocities for wells where S-wave velocity information is not available. The three wells having S-wave velocity were first used to test the reliability of the method and the generated S-wave log was compared with actual S-wave log. Single attribute analysis has been carried out for these three wells within the depth range 1700-2100m, which corresponds to Barail group of Oligocene age. The Barail Group is the main target zone in this study, which is the primary producing reservoir of the basin. A system generated list of attributes with varying degrees of correlation appeared and the attribute with the highest correlation was concerned for the single attribute analysis. Crossplot between the attributes shows the variation of points from line of best fit. The final result of the analysis was compared with the available S-wave log, which shows a good visual fit with a correlation of 72%. Next multi-attribute analysis has been carried out for the same data using all the wells within the same analysis window. A high correlation of 85% has been observed between the output log from the analysis and the recorded S-wave. The almost perfect fit between the synthetic S-wave and the recorded S-wave log validates the reliability of the method. For further authentication, the generated S-wave data from the wells have been tied to the seismic and correlated them. Synthetic share wave log has been generated for the well M2 where S-wave is not available and it shows a good correlation with the seismic. Neutron porosity, density, AI and P-wave velocity are proved to be the most significant variables in this statistical method for S-wave generation. Multilinear regression method thus can be considered as a reliable technique for generation of shear wave velocity log in this study.

Keywords: Castagna's equation, multi linear regression, multi attribute analysis, shear wave logs

Procedia PDF Downloads 222
1869 A Ferutinin Analogue with Enhanced Potency and Selectivity against Estrogen Receptor Positive Breast Cancer Cells in vitro

Authors: Remi Safi, Aline Hamade, Najat Bteich, Jamal El Saghir, Mona Diab Assaf, Marwan El-Sabban, Fadia Najjar

Abstract:

Estrogen is considered a risk factor for breast cancer since it promotes breast-cell proliferation. The jaesckeanadiol-3-p-hydroxyphenylpropanoate, a hemi-synthetic analogue of the natural phytoestrogen ferutinin (jaesckeanadiol-p-hydroxybenzoate), is designed to be devoid of estrogenic activity. This analogue induces a cytotoxic effect 30 times higher than that of ferutinin towards MCF-7 breast cancer cell line. We compared these two compounds with respect to their effect on proliferation, cell cycle distribution and cancer stem-like cells in the MCF-7 cell line. Treatment with ferutinin (30 μM) and its analogue (1 μM) produced a significant accumulation of cells at the pre G0/G1 cell cycle phase and triggered apoptosis. Importantly, this compound retains its anti-proliferative activity against breast cancer stem/progenitor cells that are naturally insensitive to ferutinin at the same dose. These results position ferutinin analogue as an effective compound inhibiting the proliferation of estrogen-dependent breast cancer cells and consistently targeting their stem-like cells.

Keywords: ferutinin, hemi-synthetic analogue, breast cancer, estrogen, stem/progenitor cells

Procedia PDF Downloads 182
1868 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing

Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti

Abstract:

Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.

Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis

Procedia PDF Downloads 131
1867 Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli

Abstract:

Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are playing an increasingly important role in automated customer service. These models, which are able to recognize complex relationships between input and output sequences, are crucial for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the focus of the model during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the domain of chatbots using the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Utilizing the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k=3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k=3). These results emphasize the crucial influence of selecting an appropriate attention-scoring function in improving the performance of seq2seq models for chatbots. Particularly, the model that integrates tanh activation proves to be a promising approach to improve the quality of chatbots in the customer support context.

Keywords: attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence

Procedia PDF Downloads 75
1866 The Clarification of Palm Oil Wastewater Treatment by Coagulant Composite from Palm Oil Ash

Authors: Rewadee Anuwattana, Narumol Soparatana, Pattamaphorn Phuangngamphan, Worapong Pattayawan, Atiporn Jinprayoon, Saroj Klangkongsap, Supinya Sutthima

Abstract:

In this work focus on clarification in palm oil wastewater treatment by using coagulant composite from palm oil ash. The design of this study was carried out by two steps; first, synthesis of new coagulant composite from palm oil ash which was fused by using Al source combined with Fe source and form to the crystal by the hydrothermal crystallization process. The characterization of coagulant composite from palm oil ash was analyzed by advanced instruments, and The pattern was analyzed by X-ray Diffraction (XRD), chemical composition by X-Ray Fluorescence (XRFS) and morphology characterized by SEM. The second step, the clarification wastewater treatment efficiency of synthetic coagulant composite, was evaluated by coagulation/flocculation process based on the COD, turbidity, phosphate and color removal of wastewater from palm oil factory by varying the coagulant dosage (1-8 %w/v) with no adjusted pH and commercial coagulants (Alum, Ferric Chloride and poly aluminum chloride) which adjusted the pH (6). The results found that the maximum removal of 6% w/v of synthetic coagulant from palm oil ash can remove COD, turbidity, phosphate and color was 88.44%, 93.32%, 93.32% and 93.32%, respectively. The experiments were compared using 6% w/v of commercial coagulants (Alum, Ferric Chloride and Polyaluminum Chloride) can remove COD of 74.29%, 71.43% and 57.14%, respectively.

Keywords: coagulation, coagulant, wastewater treatment, waste utilization, palm oil ash

Procedia PDF Downloads 185
1865 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: Gaelle Candel, David Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning

Procedia PDF Downloads 140
1864 Deep Learning based Image Classifiers for Detection of CSSVD in Cacao Plants

Authors: Atuhurra Jesse, N'guessan Yves-Roland Douha, Pabitra Lenka

Abstract:

The detection of diseases within plants has attracted a lot of attention from computer vision enthusiasts. Despite the progress made to detect diseases in many plants, there remains a research gap to train image classifiers to detect the cacao swollen shoot virus disease or CSSVD for short, pertinent to cacao plants. This gap has mainly been due to the unavailability of high quality labeled training data. Moreover, institutions have been hesitant to share their data related to CSSVD. To fill these gaps, image classifiers to detect CSSVD-infected cacao plants are presented in this study. The classifiers are based on VGG16, ResNet50 and Vision Transformer (ViT). The image classifiers are evaluated on a recently released and publicly accessible KaraAgroAI Cocoa dataset. The best performing image classifier, based on ResNet50, achieves 95.39\% precision, 93.75\% recall, 94.34\% F1-score and 94\% accuracy on only 20 epochs. There is a +9.75\% improvement in recall when compared to previous works. These results indicate that the image classifiers learn to identify cacao plants infected with CSSVD.

Keywords: CSSVD, image classification, ResNet50, vision transformer, KaraAgroAI cocoa dataset

Procedia PDF Downloads 96
1863 A Radioprotective Effect of Nanoceria (CNPs), Magnetic Flower-Like Iron Oxide Microparticles (FIOMPs), and Vitamins C and E on Irradiated BSA Protein

Authors: Hajar Zarei, AliAkbar Zarenejadatashgah, Vuk Uskoković, Hiroshi Watabe

Abstract:

The reactive oxygen species (ROS) generated by radiation in nuclear diagnostic imaging and radiotherapy could damage the structure of the proteins in noncancerous cells surrounding the tumor. The critical factor in many age-related diseases, such as Alzheimer, Parkinson, or Huntington diseases, is the oxidation of proteins by the ROS as molecular triggers of the given pathologies. Our studies by spectroscopic experiments showed doses close to therapeutic ones (1 to 5 Gy) could lead to changes of secondary and tertiary structures in BSA protein macromolecule as a protein model as well as the aggregation of polypeptide chain but without the fragmentation. For this reason, we investigated the radioprotective effects of natural (vitamin C and E) and synthetic materials (CNPs and FIOMPs) on the structural changes in BSA protein induced by gamma irradiation at a therapeutic dose (3Gy). In the presence of both vitamins and synthetic materials, the spectroscopic studies revealed that irradiated BSA was protected from the structural changes caused by ROS, according to in vitro research. The radioprotective property of CNPs and FIOMPs arises from enzyme mimetic activities (catalase, superoxide dismutase, and peroxidase) and their antioxidant capability against hydroxyl radicals. In the case of FIOMPs, a porous structure also leads to increased ROS recombination with each other in the same radiolytic track and subsequently decreased encounters with BSA. The hydrophilicity of vitamin C resulted in the major scavenging of ROS in the solvent, whereas hydrophobic vitamin E localized on the nonpolar patches of the BSA surface, where it did not only neutralize them thanks to the moderate BSA binding constant but also formed a barrier for diffusing ROS. To the best of our knowledge, there has been a persistent lack of studies investigating the radioactive effect of mentioned materials on proteins. Therefore, the results of our studies can open a new widow for application of these common dietary ingredients and new synthetic NPs in improving the safety of radiotherapy.

Keywords: reactive oxygen species, spectroscopy, bovine serum albumin, gamma radiation, radioprotection

Procedia PDF Downloads 81
1862 Automatic Identification and Classification of Contaminated Biodegradable Plastics using Machine Learning Algorithms and Hyperspectral Imaging Technology

Authors: Nutcha Taneepanichskul, Helen C. Hailes, Mark Miodownik

Abstract:

Plastic waste has emerged as a critical global environmental challenge, primarily driven by the prevalent use of conventional plastics derived from petrochemical refining and manufacturing processes in modern packaging. While these plastics serve vital functions, their persistence in the environment post-disposal poses significant threats to ecosystems. Addressing this issue necessitates approaches, one of which involves the development of biodegradable plastics designed to degrade under controlled conditions, such as industrial composting facilities. It is imperative to note that compostable plastics are engineered for degradation within specific environments and are not suited for uncontrolled settings, including natural landscapes and aquatic ecosystems. The full benefits of compostable packaging are realized when subjected to industrial composting, preventing environmental contamination and waste stream pollution. Therefore, effective sorting technologies are essential to enhance composting rates for these materials and diminish the risk of contaminating recycling streams. In this study, it leverage hyperspectral imaging technology (HSI) coupled with advanced machine learning algorithms to accurately identify various types of plastics, encompassing conventional variants like Polyethylene terephthalate (PET), Polypropylene (PP), Low density polyethylene (LDPE), High density polyethylene (HDPE) and biodegradable alternatives such as Polybutylene adipate terephthalate (PBAT), Polylactic acid (PLA), and Polyhydroxyalkanoates (PHA). The dataset is partitioned into three subsets: a training dataset comprising uncontaminated conventional and biodegradable plastics, a validation dataset encompassing contaminated plastics of both types, and a testing dataset featuring real-world packaging items in both pristine and contaminated states. Five distinct machine learning algorithms, namely Partial Least Squares Discriminant Analysis (PLS-DA), Support Vector Machine (SVM), Convolutional Neural Network (CNN), Logistic Regression, and Decision Tree Algorithm, were developed and evaluated for their classification performance. Remarkably, the Logistic Regression and CNN model exhibited the most promising outcomes, achieving a perfect accuracy rate of 100% for the training and validation datasets. Notably, the testing dataset yielded an accuracy exceeding 80%. The successful implementation of this sorting technology within recycling and composting facilities holds the potential to significantly elevate recycling and composting rates. As a result, the envisioned circular economy for plastics can be established, thereby offering a viable solution to mitigate plastic pollution.

Keywords: biodegradable plastics, sorting technology, hyperspectral imaging technology, machine learning algorithms

Procedia PDF Downloads 75
1861 Graph Based Traffic Analysis and Delay Prediction Using a Custom Built Dataset

Authors: Gabriele Borg, Alexei Debono, Charlie Abela

Abstract:

There on a constant rise in the availability of high volumes of data gathered from multiple sources, resulting in an abundance of unprocessed information that can be used to monitor patterns and trends in user behaviour. Similarly, year after year, Malta is also constantly experiencing ongoing population growth and an increase in mobilization demand. This research takes advantage of data which is continuously being sourced and converting it into useful information related to the traffic problem on the Maltese roads. The scope of this paper is to provide a methodology to create a custom dataset (MalTra - Malta Traffic) compiled from multiple participants from various locations across the island to identify the most common routes taken to expose the main areas of activity. This use of big data is seen being used in various technologies and is referred to as ITSs (Intelligent Transportation Systems), which has been concluded that there is significant potential in utilising such sources of data on a nationwide scale. Furthermore, a series of traffic prediction graph neural network models are conducted to compare MalTra to large-scale traffic datasets.

Keywords: graph neural networks, traffic management, big data, mobile data patterns

Procedia PDF Downloads 123
1860 Agile Software Effort Estimation Using Regression Techniques

Authors: Mikiyas Adugna

Abstract:

Effort estimation is among the activities carried out in software development processes. An accurate model of estimation leads to project success. The method of agile effort estimation is a complex task because of the dynamic nature of software development. Researchers are still conducting studies on agile effort estimation to enhance prediction accuracy. Due to these reasons, we investigated and proposed a model on LASSO and Elastic Net regression to enhance estimation accuracy. The proposed model has major components: preprocessing, train-test split, training with default parameters, and cross-validation. During the preprocessing phase, the entire dataset is normalized. After normalization, a train-test split is performed on the dataset, setting training at 80% and testing set to 20%. We chose two different phases for training the two algorithms (Elastic Net and LASSO) regression following the train-test-split. In the first phase, the two algorithms are trained using their default parameters and evaluated on the testing data. In the second phase, the grid search technique (the grid is used to search for tuning and select optimum parameters) and 5-fold cross-validation to get the final trained model. Finally, the final trained model is evaluated using the testing set. The experimental work is applied to the agile story point dataset of 21 software projects collected from six firms. The results show that both Elastic Net and LASSO regression outperformed the compared ones. Compared to the proposed algorithms, LASSO regression achieved better predictive performance and has acquired PRED (8%) and PRED (25%) results of 100.0, MMRE of 0.0491, MMER of 0.0551, MdMRE of 0.0593, MdMER of 0.063, and MSE of 0.0007. The result implies LASSO regression algorithm trained model is the most acceptable, and higher estimation performance exists in the literature.

Keywords: agile software development, effort estimation, elastic net regression, LASSO

Procedia PDF Downloads 63
1859 Deep Learning for Qualitative and Quantitative Grain Quality Analysis Using Hyperspectral Imaging

Authors: Ole-Christian Galbo Engstrøm, Erik Schou Dreier, Birthe Møller Jespersen, Kim Steenstrup Pedersen

Abstract:

Grain quality analysis is a multi-parameterized problem that includes a variety of qualitative and quantitative parameters such as grain type classification, damage type classification, and nutrient regression. Currently, these parameters require human inspection, a multitude of instruments employing a variety of sensor technologies, and predictive model types or destructive and slow chemical analysis. This paper investigates the feasibility of applying near-infrared hyperspectral imaging (NIR-HSI) to grain quality analysis. For this study two datasets of NIR hyperspectral images in the wavelength range of 900 nm - 1700 nm have been used. Both datasets contain images of sparsely and densely packed grain kernels. The first dataset contains ~87,000 image crops of bulk wheat samples from 63 harvests where protein value has been determined by the FOSS Infratec NOVA which is the golden industry standard for protein content estimation in bulk samples of cereal grain. The second dataset consists of ~28,000 image crops of bulk grain kernels from seven different wheat varieties and a single rye variety. In the first dataset, protein regression analysis is the problem to solve while variety classification analysis is the problem to solve in the second dataset. Deep convolutional neural networks (CNNs) have the potential to utilize spatio-spectral correlations within a hyperspectral image to simultaneously estimate the qualitative and quantitative parameters. CNNs can autonomously derive meaningful representations of the input data reducing the need for advanced preprocessing techniques required for classical chemometric model types such as artificial neural networks (ANNs) and partial least-squares regression (PLS-R). A comparison between different CNN architectures utilizing 2D and 3D convolution is conducted. These results are compared to the performance of ANNs and PLS-R. Additionally, a variety of preprocessing techniques from image analysis and chemometrics are tested. These include centering, scaling, standard normal variate (SNV), Savitzky-Golay (SG) filtering, and detrending. The results indicate that the combination of NIR-HSI and CNNs has the potential to be the foundation for an automatic system unifying qualitative and quantitative grain quality analysis within a single sensor technology and predictive model type.

Keywords: deep learning, grain analysis, hyperspectral imaging, preprocessing techniques

Procedia PDF Downloads 96
1858 Early Recognition and Grading of Cataract Using a Combined Log Gabor/Discrete Wavelet Transform with ANN and SVM

Authors: Hadeer R. M. Tawfik, Rania A. K. Birry, Amani A. Saad

Abstract:

Eyes are considered to be the most sensitive and important organ for human being. Thus, any eye disorder will affect the patient in all aspects of life. Cataract is one of those eye disorders that lead to blindness if not treated correctly and quickly. This paper demonstrates a model for automatic detection, classification, and grading of cataracts based on image processing techniques and artificial intelligence. The proposed system is developed to ease the cataract diagnosis process for both ophthalmologists and patients. The wavelet transform combined with 2D Log Gabor Wavelet transform was used as feature extraction techniques for a dataset of 120 eye images followed by a classification process that classified the image set into three classes; normal, early, and advanced stage. A comparison between the two used classifiers, the support vector machine SVM and the artificial neural network ANN were done for the same dataset of 120 eye images. It was concluded that SVM gave better results than ANN. SVM success rate result was 96.8% accuracy where ANN success rate result was 92.3% accuracy.

Keywords: cataract, classification, detection, feature extraction, grading, log-gabor, neural networks, support vector machines, wavelet

Procedia PDF Downloads 329
1857 Targeting Mre11 Nuclease Overcomes Platinum Resistance and Induces Synthetic Lethality in Platinum Sensitive XRCC1 Deficient Epithelial Ovarian Cancers

Authors: Adel Alblihy, Reem Ali, Mashael Algethami, Ahmed Shoqafi, Michael S. Toss, Juliette Brownlie, Natalie J. Tatum, Ian Hickson, Paloma Ordonez Moran, Anna Grabowska, Jennie N. Jeyapalan, Nigel P. Mongan, Emad A. Rakha, Srinivasan Madhusudan

Abstract:

Platinum resistance is a clinical challenge in ovarian cancer. Platinating agents induce DNA damage which activate Mre11 nuclease directed DNA damage signalling and response (DDR). Upregulation of DDR may promote chemotherapy resistance. Here we have comprehensively evaluated Mre11 in epithelial ovarian cancers. In clinical cohort that received platinum- based chemotherapy (n=331), Mre11 protein overexpression was associated with aggressive phenotype and poor progression free survival (PFS) (p=0.002). In the ovarian cancer genome atlas (TCGA) cohort (n=498), Mre11 gene amplification was observed in a subset of serous tumours (5%) which correlated highly with Mre11 mRNA levels (p<0.0001). Altered Mre11 levels was linked with genome wide alterations that can influence platinum sensitivity. At the transcriptomic level (n=1259), Mre11 overexpression was associated with poor PFS (p=0.003). ROC analysis showed an area under the curve (AUC) of 0.642 for response to platinum-based chemotherapy. Pre-clinically, Mre11 depletion by gene knock down or blockade by small molecule inhibitor (Mirin) reversed platinum resistance in ovarian cancer cells and in 3D spheroid models. Importantly, Mre11 inhibition was synthetically lethal in platinum sensitive XRCC1 deficient ovarian cancer cells and 3D-spheroids. Selective cytotoxicity was associated with DNA double strand break (DSB) accumulation, S-phase cell cycle arrest and increased apoptosis. We conclude that pharmaceutical development of Mre11 inhibitors is a viable clinical strategy for platinum sensitization and synthetic lethality in ovarian cancer.

Keywords: MRE11; XRCC1, ovarian cancer, platinum sensitization, synthetic lethality

Procedia PDF Downloads 125
1856 Use of Fuzzy Logic in the Corporate Reputation Assessment: Stock Market Investors’ Perspective

Authors: Tomasz L. Nawrocki, Danuta Szwajca

Abstract:

The growing importance of reputation in building enterprise value and achieving long-term competitive advantage creates the need for its measurement and evaluation for the management purposes (effective reputation and its risk management). The paper presents practical application of self-developed corporate reputation assessment model from the viewpoint of stock market investors. The model has a pioneer character and example analysis performed for selected industry is a form of specific test for this tool. In the proposed solution, three aspects - informational, financial and development, as well as social ones - were considered. It was also assumed that the individual sub-criteria will be based on public sources of information, and as the calculation apparatus, capable of obtaining synthetic final assessment, fuzzy logic will be used. The main reason for developing this model was to fulfill the gap in the scope of synthetic measure of corporate reputation that would provide higher degree of objectivity by relying on "hard" (not from surveys) and publicly available data. It should be also noted that results obtained on the basis of proposed corporate reputation assessment method give possibilities of various internal as well as inter-branch comparisons and analysis of corporate reputation impact.

Keywords: corporate reputation, fuzzy logic, fuzzy model, stock market investors

Procedia PDF Downloads 243
1855 Investigating the Effectiveness of Multilingual NLP Models for Sentiment Analysis

Authors: Othmane Touri, Sanaa El Filali, El Habib Benlahmar

Abstract:

Natural Language Processing (NLP) has gained significant attention lately. It has proved its ability to analyze and extract insights from unstructured text data in various languages. It is found that one of the most popular NLP applications is sentiment analysis which aims to identify the sentiment expressed in a piece of text, such as positive, negative, or neutral, in multiple languages. While there are several multilingual NLP models available for sentiment analysis, there is a need to investigate their effectiveness in different contexts and applications. In this study, we aim to investigate the effectiveness of different multilingual NLP models for sentiment analysis on a dataset of online product reviews in multiple languages. The performance of several NLP models, including Google Cloud Natural Language API, Microsoft Azure Cognitive Services, Amazon Comprehend, Stanford CoreNLP, spaCy, and Hugging Face Transformers are being compared. The models based on several metrics, including accuracy, precision, recall, and F1 score, are being evaluated and compared to their performance across different categories of product reviews. In order to run the study, preprocessing of the dataset has been performed by cleaning and tokenizing the text data in multiple languages. Then training and testing each model has been applied using a cross-validation approach where randomly dividing the dataset into training and testing sets and repeating the process multiple times has been used. A grid search approach to optimize the hyperparameters of each model and select the best-performing model for each category of product reviews and language has been applied. The findings of this study provide insights into the effectiveness of different multilingual NLP models for Multilingual Sentiment Analysis and their suitability for different languages and applications. The strengths and limitations of each model were identified, and recommendations for selecting the most performant model based on the specific requirements of a project were provided. This study contributes to the advancement of research methods in multilingual NLP and provides a practical guide for researchers and practitioners in the field.

Keywords: NLP, multilingual, sentiment analysis, texts

Procedia PDF Downloads 95
1854 Enhanced Extra Trees Classifier for Epileptic Seizure Prediction

Authors: Maurice Ntahobari, Levin Kuhlmann, Mario Boley, Zhinoos Razavi Hesabi

Abstract:

For machine learning based epileptic seizure prediction, it is important for the model to be implemented in small implantable or wearable devices that can be used to monitor epilepsy patients; however, current state-of-the-art methods are complex and computationally intensive. We use Shapley Additive Explanation (SHAP) to find relevant intracranial electroencephalogram (iEEG) features and improve the computational efficiency of a state-of-the-art seizure prediction method based on the extra trees classifier while maintaining prediction performance. Results for a small contest dataset and a much larger dataset with continuous recordings of up to 3 years per patient from 15 patients yield better than chance prediction performance (p < 0.004). Moreover, while the performance of the SHAP-based model is comparable to that of the benchmark, the overall training and prediction time of the model has been reduced by a factor of 1.83. It can also be noted that the feature called zero crossing value is the best EEG feature for seizure prediction. These results suggest state-of-the-art seizure prediction performance can be achieved using efficient methods based on optimal feature selection.

Keywords: machine learning, seizure prediction, extra tree classifier, SHAP, epilepsy

Procedia PDF Downloads 107
1853 Short Text Classification for Saudi Tweets

Authors: Asma A. Alsufyani, Maram A. Alharthi, Maha J. Althobaiti, Manal S. Alharthi, Huda Rizq

Abstract:

Twitter is one of the most popular microblogging sites that allows users to publish short text messages called 'tweets'. Increasing the number of accounts to follow (followings) increases the number of tweets that will be displayed from different topics in an unclassified manner in the timeline of the user. Therefore, it can be a vital solution for many Twitter users to have their tweets in a timeline classified into general categories to save the user’s time and to provide easy and quick access to tweets based on topics. In this paper, we developed a classifier for timeline tweets trained on a dataset consisting of 3600 tweets in total, which were collected from Saudi Twitter and annotated manually. We experimented with the well-known Bag-of-Words approach to text classification, and we used support vector machines (SVM) in the training process. The trained classifier performed well on a test dataset, with an average F1-measure equal to 92.3%. The classifier has been integrated into an application, which practically proved the classifier’s ability to classify timeline tweets of the user.

Keywords: corpus creation, feature extraction, machine learning, short text classification, social media, support vector machine, Twitter

Procedia PDF Downloads 151
1852 Mapping Iron Content in the Brain with Magnetic Resonance Imaging and Machine Learning

Authors: Gabrielle Robertson, Matthew Downs, Joseph Dagher

Abstract:

Iron deposition in the brain has been linked with a host of neurological disorders such as Alzheimer’s, Parkinson’s, and Multiple Sclerosis. While some treatment options exist, there are no objective measurement tools that allow for the monitoring of iron levels in the brain in vivo. An emerging Magnetic Resonance Imaging (MRI) method has been recently proposed to deduce iron concentration through quantitative measurement of magnetic susceptibility. This is a multi-step process that involves repeated modeling of physical processes via approximate numerical solutions. For example, the last two steps of this Quantitative Susceptibility Mapping (QSM) method involve I) mapping magnetic field into magnetic susceptibility and II) mapping magnetic susceptibility into iron concentration. Process I involves solving an ill-posed inverse problem by using regularization via injection of prior belief. The end result from Process II highly depends on the model used to describe the molecular content of each voxel (type of iron, water fraction, etc.) Due to these factors, the accuracy and repeatability of QSM have been an active area of research in the MRI and medical imaging community. This work aims to estimate iron concentration in the brain via a single step. A synthetic numerical model of the human head was created by automatically and manually segmenting the human head on a high-resolution grid (640x640x640, 0.4mm³) yielding detailed structures such as microvasculature and subcortical regions as well as bone, soft tissue, Cerebral Spinal Fluid, sinuses, arteries, and eyes. Each segmented region was then assigned tissue properties such as relaxation rates, proton density, electromagnetic tissue properties and iron concentration. These tissue property values were randomly selected from a Probability Distribution Function derived from a thorough literature review. In addition to having unique tissue property values, different synthetic head realizations also possess unique structural geometry created by morphing the boundary regions of different areas within normal physical constraints. This model of the human brain is then used to create synthetic MRI measurements. This is repeated thousands of times, for different head shapes, volume, tissue properties and noise realizations. Collectively, this constitutes a training-set that is similar to in vivo data, but larger than datasets available from clinical measurements. This 3D convolutional U-Net neural network architecture was used to train data-driven Deep Learning models to solve for iron concentrations from raw MRI measurements. The performance was then tested on both synthetic data not used in training as well as real in vivo data. Results showed that the model trained on synthetic MRI measurements is able to directly learn iron concentrations in areas of interest more effectively than other existing QSM reconstruction methods. For comparison, models trained on random geometric shapes (as proposed in the Deep QSM method) are less effective than models trained on realistic synthetic head models. Such an accurate method for the quantitative measurement of iron deposits in the brain would be of important value in clinical studies aiming to understand the role of iron in neurological disease.

Keywords: magnetic resonance imaging, MRI, iron deposition, machine learning, quantitative susceptibility mapping

Procedia PDF Downloads 131
1851 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 308