Search results for: dataset development
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 17198

Search results for: dataset development

17048 Machine Learning Approach for Yield Prediction in Semiconductor Production

Authors: Heramb Somthankar, Anujoy Chakraborty

Abstract:

This paper presents a classification study on yield prediction in semiconductor production using machine learning approaches. A complicated semiconductor production process is generally monitored continuously by signals acquired from sensors and measurement sites. A monitoring system contains a variety of signals, all of which contain useful information, irrelevant information, and noise. In the case of each signal being considered a feature, "Feature Selection" is used to find the most relevant signals. The open-source UCI SECOM Dataset provides 1567 such samples, out of which 104 fail in quality assurance. Feature extraction and selection are performed on the dataset, and useful signals were considered for further study. Afterward, common machine learning algorithms were employed to predict whether the signal yields pass or fail. The most relevant algorithm is selected for prediction based on the accuracy and loss of the ML model.

Keywords: deep learning, feature extraction, feature selection, machine learning classification algorithms, semiconductor production monitoring, signal processing, time-series analysis

Procedia PDF Downloads 109
17047 Food Consumption and Adaptation to Climate Change: Evidence from Ghana

Authors: Frank Adusah-Poku, John Bosco Dramani, Prince Boakye Frimpong

Abstract:

Climate change is considered a principal threat to human existence and livelihood. The persistence and intensity of droughts and floods in recent years have adversely affected food production systems and value chains, making it impossible to end global hunger by 2030. Thus, this study aims to examine the effect of climate change on food consumption for both farm and non-farm households in Ghana. An important focus of the analysis is to investigate how climate change affects alternative dimensions of food security, examine the extent to which these effects vary across heterogeneous groups, and explore the channels through which climate change affects food consumption. Finally, we conducted a pilot study to understand the significance of farm and non-farm diversification measures in reducing the harmful impact of climate change on farm households. The approach of this article is to use two secondary and one primary datasets. The first secondary dataset is the Ghana Socioeconomic Panel Survey (GSPS). The GSPS is a household panel dataset collected during the period 2009 to 2019. The second dataset is monthly district rainfall and temperature gridded data from the Ghana Meteorological Agency. This data was matched to the GSPS dataset at the district level. Finally, the primary data was obtained from a survey of farm and non-farm adaptation practices used by farmers in three regions in Northern Ghana. The study employed the household fixed effects model to estimate the effect of climate change (measured by temperature and rainfall) on food consumption in Ghana. Again, it used the spatial and temporal variation in temperature and rainfall across the districts in Ghana to estimate the household-level model. Evidence of potential mechanisms through which climate change affects food consumption was explored using two steps. First, the potential mechanism variables were regressed on temperature, rainfall, and the control variables. In the second and final step, the potential mechanism variables were included as extra covariates in the first model. The results revealed that extreme average temperature and drought had caused a decrease in food consumption as well as reduced the intake of important food nutrients such as carbohydrates, protein and vitamins. The results further indicated that low rainfall increased food insecurity among households with no education compared with those with primary and secondary education. Again, non-farm activity and silos have been revealed as the transmission pathways through which the effect of climate change on farm households can be moderated. Finally, the results indicated over 90% of the small-holder farmers interviewed had no farm diversification adaptation strategies for climate change, and a little over 50% of the farmers owned unskilled or manual non-farm economic ventures. This makes it very difficult for the majority of the farmers to withstand climate-related shocks. These findings suggest that achieving the Sustainable Development Goal of Zero Hunger by 2030 needs an integrated approach, such as reducing the over-reliance on rainfed agriculture, educating farmers, and implementing non-farm interventions to improve food consumption in Ghana.

Keywords: climate change, food consumption, Ghana, non-farm activity

Procedia PDF Downloads 6
17046 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 80
17045 A Machine Learning-Based Model to Screen Antituberculosis Compound Targeted against LprG Lipoprotein of Mycobacterium tuberculosis

Authors: Syed Asif Hassan, Syed Atif Hassan

Abstract:

Multidrug-resistant Tuberculosis (MDR-TB) is an infection caused by the resistant strains of Mycobacterium tuberculosis that do not respond either to isoniazid or rifampicin, which are the most important anti-TB drugs. The increase in the occurrence of a drug-resistance strain of MTB calls for an intensive search of novel target-based therapeutics. In this context LprG (Rv1411c) a lipoprotein from MTB plays a pivotal role in the immune evasion of Mtb leading to survival and propagation of the bacterium within the host cell. Therefore, a machine learning method will be developed for generating a computational model that could predict for a potential anti LprG activity of the novel antituberculosis compound. The present study will utilize dataset from PubChem database maintained by National Center for Biotechnology Information (NCBI). The dataset involves compounds screened against MTB were categorized as active and inactive based upon PubChem activity score. PowerMV, a molecular descriptor generator, and visualization tool will be used to generate the 2D molecular descriptors for the actives and inactive compounds present in the dataset. The 2D molecular descriptors generated from PowerMV will be used as features. We feed these features into three different classifiers, namely, random forest, a deep neural network, and a recurring neural network, to build separate predictive models and choosing the best performing model based on the accuracy of predicting novel antituberculosis compound with an anti LprG activity. Additionally, the efficacy of predicted active compounds will be screened using SMARTS filter to choose molecule with drug-like features.

Keywords: antituberculosis drug, classifier, machine learning, molecular descriptors, prediction

Procedia PDF Downloads 391
17044 Chinese Sentence Level Lip Recognition

Authors: Peng Wang, Tigang Jiang

Abstract:

The computer based lip reading method of different languages cannot be universal. At present, for the research of Chinese lip reading, whether the work on data sets or recognition algorithms, is far from mature. In this paper, we study the Chinese lipreading method based on machine learning, and propose a Chinese Sentence-level lip-reading network (CNLipNet) model which consists of spatio-temporal convolutional neural network(CNN), recurrent neural network(RNN) and Connectionist Temporal Classification (CTC) loss function. This model can map variable-length sequence of video frames to Chinese Pinyin sequence and is trained end-to-end. More over, We create CNLRS, a Chinese Lipreading Dataset, which contains 5948 samples and can be shared through github. The evaluation of CNLipNet on this dataset yielded a 41% word correct rate and a 70.6% character correct rate. This evaluation result is far superior to the professional human lip readers, indicating that CNLipNet performs well in lipreading.

Keywords: lipreading, machine learning, spatio-temporal, convolutional neural network, recurrent neural network

Procedia PDF Downloads 128
17043 Identification of Thermally Critical Zones Based on Inter Seasonal Variation in Temperature

Authors: Sakti Mandal

Abstract:

Varying distribution of land surface temperature in an urbanized environment is a globally addressed phenomenon. Usually has been noticed that criticality of surface temperature increases from the periphery to the urban centre. As the centre experiences maximum severity of heat throughout the year, it also represents most critical zone in terms of thermal condition. In this present study, an attempt has been taken to propose a quantitative approach of thermal critical zonation (TCZ) on the basis of seasonal temperature variation. Here the zonation is done by calculating thermal critical value (TCV). From the Landsat 8 thermal digital data of summer and winter seasons for the year 2014, the land surface temperature maps and thermally critical zonation has been prepared, and corresponding dataset has been computed to conduct the overall study of that particular study area. It is shown that TCZ can be clearly identified and analyzed by the help of inter-seasonal temperature range. The results of this study can be utilized effectively in future urban development and planning projects as well as a framework for implementing rules and regulations by the authorities for a sustainable urban development through an environmentally affable approach.

Keywords: thermal critical values (TCV), thermally critical zonation (TCZ), land surface temperature (LST), Landsat 8, Kolkata Municipal Corporation (KMC)

Procedia PDF Downloads 197
17042 Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

Authors: Haonan Hu, Shuge Lei, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang

Abstract:

This paper focuses on the classification task of breast ultrasound images and conducts research on the reliability measurement of classification results. A dual-channel evaluation framework was developed based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationals based on the improved feature attribution algorithm SP-RISA are gracefully applied. Uncertainty quantification is used to evaluate the predictive reliability via the test time enhancement. The effectiveness of this reliability evaluation framework has been verified on the breast ultrasound clinical dataset YBUS, and its robustness is verified on the public dataset BUSI. The expected calibration errors on both datasets are significantly lower than traditional evaluation methods, which proves the effectiveness of the proposed reliability measurement.

Keywords: medical imaging, ultrasound imaging, XAI, uncertainty measurement, trustworthy AI

Procedia PDF Downloads 101
17041 Keypoint Detection Method Based on Multi-Scale Feature Fusion of Attention Mechanism

Authors: Xiaoxiao Li, Shuangcheng Jia, Qian Li

Abstract:

Keypoint detection has always been a challenge in the field of image recognition. This paper proposes a novelty keypoint detection method which is called Multi-Scale Feature Fusion Convolutional Network with Attention (MFFCNA). We verified that the multi-scale features with the attention mechanism module have better feature expression capability. The feature fusion between different scales makes the information that the network model can express more abundant, and the network is easier to converge. On our self-made street sign corner dataset, we validate the MFFCNA model with an accuracy of 97.8% and a recall of 81%, which are 5 and 8 percentage points higher than the HRNet network, respectively. On the COCO dataset, the AP is 71.9%, and the AR is 75.3%, which are 3 points and 2 points higher than HRNet, respectively. Extensive experiments show that our method has a remarkable improvement in the keypoint recognition tasks, and the recognition effect is better than the existing methods. Moreover, our method can be applied not only to keypoint detection but also to image classification and semantic segmentation with good generality.

Keywords: keypoint detection, feature fusion, attention, semantic segmentation

Procedia PDF Downloads 119
17040 The Effect of Artificial Intelligence on Human Rights Resources and Development

Authors: Tharwat Girgis Farag Girgis

Abstract:

The link between development and human rights has long been the subject of scholarly debate. As a result, a number of principles have been adopted, from the right to development to the human rights-based development approach, to understand the dynamics between the two concepts. Despite the initiatives taken, the exact relationship between development and human rights remains unclear. However, the rapprochement between the two concepts and the need for development efforts regarding human rights have increased in recent years. On the other hand, the emergence of sustainable development as an acceptable method in development goals and policies makes this consensus even more unstable. The place of sustainable development in the legal debate on human rights and its role in promoting sustainable development programs require further research. Therefore, this article attempts to map the relationship between development and human rights, with particular emphasis on the place given to sustainable development principles in international human rights law. It will continue to investigate whether it recognizes sustainable development rights. The article will therefore give a positive answer to question mentioned here. The jurisprudence and interpretive guidelines of human rights institutions travel to confirm this hypothesis.

Keywords: sustainable development, human rights, the right to development, the human rights-based approach to development, environmental rights, economic development, social sustainability human rights protection, human rights violations, workers’ rights, justice, security

Procedia PDF Downloads 55
17039 Domain-Specific Deep Neural Network Model for Classification of Abnormalities on Chest Radiographs

Authors: Nkechinyere Joy Olawuyi, Babajide Samuel Afolabi, Bola Ibitoye

Abstract:

This study collected a preprocessed dataset of chest radiographs and formulated a deep neural network model for detecting abnormalities. It also evaluated the performance of the formulated model and implemented a prototype of the formulated model. This was with the view to developing a deep neural network model to automatically classify abnormalities in chest radiographs. In order to achieve the overall purpose of this research, a large set of chest x-ray images were sourced for and collected from the CheXpert dataset, which is an online repository of annotated chest radiographs compiled by the Machine Learning Research Group, Stanford University. The chest radiographs were preprocessed into a format that can be fed into a deep neural network. The preprocessing techniques used were standardization and normalization. The classification problem was formulated as a multi-label binary classification model, which used convolutional neural network architecture to make a decision on whether an abnormality was present or not in the chest radiographs. The classification model was evaluated using specificity, sensitivity, and Area Under Curve (AUC) score as the parameter. A prototype of the classification model was implemented using Keras Open source deep learning framework in Python Programming Language. The AUC ROC curve of the model was able to classify Atelestasis, Support devices, Pleural effusion, Pneumonia, A normal CXR (no finding), Pneumothorax, and Consolidation. However, Lung opacity and Cardiomegaly had a probability of less than 0.5 and thus were classified as absent. Precision, recall, and F1 score values were 0.78; this implies that the number of False Positive and False Negative is the same, revealing some measure of label imbalance in the dataset. The study concluded that the developed model is sufficient to classify abnormalities present in chest radiographs into present or absent.

Keywords: transfer learning, convolutional neural network, radiograph, classification, multi-label

Procedia PDF Downloads 127
17038 Incorporating Spatial Transcriptome Data into Ligand-Receptor Analyses to Discover Regional Activation in Cells

Authors: Eric Bang

Abstract:

Interactions between receptors and ligands are crucial for many essential biological processes, including neurotransmission and metabolism. Ligand-receptor analyses that examine cell behavior and interactions often utilize cell type-specific RNA expressions from single-cell RNA sequencing (scRNA-seq) data. Using CellPhoneDB, a public repository consisting of ligands, receptors, and ligand-receptor interactions, the cell-cell interactions were explored in a specific scRNA-seq dataset from kidney tissue and portrayed the results with dot plots and heat maps. Depending on the type of cell, each ligand-receptor pair was aligned with the interacting cell type and calculated the positori probabilities of these associations, with corresponding P values reflecting average expression values between the triads and their significance. Using single-cell data (sample kidney cell references), genes in the dataset were cross-referenced with ones in the existing CellPhoneDB dataset. For example, a gene such as Pleiotrophin (PTN) present in the single-cell data also needed to be present in the CellPhoneDB dataset. Using the single-cell transcriptomics data via slide-seq and reference data, the CellPhoneDB program defines cell types and plots them in different formats, with the two main ones being dot plots and heat map plots. The dot plot displays derived measures of the cell to cell interaction scores and p values. For the dot plot, each row shows a ligand-receptor pair, and each column shows the two interacting cell types. CellPhoneDB defines interactions and interaction levels from the gene expression level, so since the p-value is on a -log10 scale, the larger dots represent more significant interactions. By performing an interaction analysis, a significant interaction was discovered for myeloid and T-cell ligand-receptor pairs, including those between Secreted Phosphoprotein 1 (SPP1) and Fibronectin 1 (FN1), which is consistent with previous findings. It was proposed that an effective protocol would involve a filtration step where cell types would be filtered out, depending on which ligand-receptor pair is activated in that part of the tissue, as well as the incorporation of the CellPhoneDB data in a streamlined workflow pipeline. The filtration step would be in the form of a Python script that expedites the manual process necessary for dataset filtration. Being in Python allows it to be integrated with the CellPhoneDB dataset for future workflow analysis. The manual process involves filtering cell types based on what ligand/receptor pair is activated in kidney cells. One limitation of this would be the fact that some pairings are activated in multiple cells at a time, so the manual manipulation of the data is reflected prior to analysis. Using the filtration script, accurate sorting is incorporated into the CellPhoneDB database rather than waiting until the output is produced and then subsequently applying spatial data. It was envisioned that this would reveal wherein the cell various ligands and receptors are interacting with different cell types, allowing for easier identification of which cells are being impacted and why, for the purpose of disease treatment. The hope is this new computational method utilizing spatially explicit ligand-receptor association data can be used to uncover previously unknown specific interactions within kidney tissue.

Keywords: bioinformatics, Ligands, kidney tissue, receptors, spatial transcriptome

Procedia PDF Downloads 139
17037 Influence of HDI in the Spread of RSV Bronchiolitis in Children Aged 0 to 2 Years

Authors: Chloé Kernaléguen, Laura Kundun, Tessie Lery, Ryan Laleg, Zhangyun Tan

Abstract:

This study explores global disparities in respiratory syncytial virus (RSV) bronchiolitis incidence among children aged 0-2 years, focusing on the human development index (HDI) as a key determinant. RSV bronchiolitis poses a significant health risk to young children, influenced by factors, including socio-economic conditions captured by the HDI. Through a comprehensive systematic review and dataset selection (Switzerland, Brazil, United States of America), we formulated an HDI-SEIRS numerical model within the SEIRS framework. Results show variations in RSV bronchiolitis dynamics across countries, emphasizing the influence of HDI. Modelling reveals a correlation between higher HDI and increased bronchiolitis spread, notably in the USA and Switzerland. The ratios HDIcountry over HDImax strengthen this association, while climate disparities contribute to variations, especially in colder climates like the USA and Switzerland. The study raises the hypothesis of an indirect link between higher HDI and more frequent bronchiolitis, underlining the need for nuanced understanding. Factors like improved healthcare access, population density, mobility, and social behaviors in higher HDI countries might contribute to unexpected trends. Limitations include dataset quality and restricted RSV bronchiolitis data. Future research should encompass diverse HDI datasets to refine HDI's role in bronchiolitis dynamics. In conclusion, HDI-SEIRS models offer insights into factors influencing RSV bronchiolitis spread. While HDI is a significant indicator, its impact is indirect, necessitating a holistic approach to effective public health policies. This analysis sets the stage for further investigations into multifaceted interactions shaping bronchiolitis dynamics in diverse socio-economic contexts.

Keywords: bronchiolitis propagation, HDI influence, respiratory syncytial virus, SEIRS model

Procedia PDF Downloads 67
17036 DMBR-Net: Deep Multiple-Resolution Bilateral Networks for Real-Time and Accurate Semantic Segmentation

Authors: Pengfei Meng, Shuangcheng Jia, Qian Li

Abstract:

We proposed a real-time high-precision semantic segmentation network based on a multi-resolution feature fusion module, the auxiliary feature extracting module, upsampling module, and atrous spatial pyramid pooling (ASPP) module. We designed a feature fusion structure, which is integrated with sufficient features of different resolutions. We also studied the effect of side-branch structure on the network and made discoveries. Based on the discoveries about the side-branch of the network structure, we used a side-branch auxiliary feature extraction layer in the network to improve the effectiveness of the network. We also designed upsampling module, which has better results than the original upsampling module. In addition, we also re-considered the locations and number of atrous spatial pyramid pooling (ASPP) modules and modified the network structure according to the experimental results to further improve the effectiveness of the network. The network presented in this paper takes the backbone network of Bisenetv2 as a basic network, based on which we constructed a network structure on which we made improvements. We named this network deep multiple-resolution bilateral networks for real-time, referred to as DMBR-Net. After experimental testing, our proposed DMBR-Net network achieved 81.2% mIoU at 119FPS on the Cityscapes validation dataset, 80.7% mIoU at 109FPS on the CamVid test dataset, 29.9% mIoU at 78FPS on the COCOStuff test dataset. Compared with all lightweight real-time semantic segmentation networks, our network achieves the highest accuracy at an appropriate speed.

Keywords: multi-resolution feature fusion, atrous convolutional, bilateral networks, pyramid pooling

Procedia PDF Downloads 149
17035 Discrete Group Search Optimizer for the Travelling Salesman Problem

Authors: Raed Alnajjar, Mohd Zakree, Ahmad Nazri

Abstract:

In this study, we apply Discrete Group Search Optimizer (DGSO) for solving Traveling Salesman Problem (TSP). The DGSO is a nature inspired optimization algorithm that imitates the animal behavior, especially animal searching behavior. The proposed DGSO uses a vector representation and some discrete operators, such as destruction, construction, differential evolution, swap and insert. The TSP is a well-known hard combinatorial optimization problem, which seeks to find the shortest path among numbers of cities. The performance of the proposed DGSO is evaluated and tested on benchmark instances which listed in LIBTSP dataset. The experimental results show that the performance of the proposed DGSO is comparable with the other methods in the state of the art for some instances. The results show that DGSO outperform Ant Colony System (ACS) in some instances whilst outperform other metaheuristic in most instances. In addition to that, the new results obtained a number of optimal solutions and some best known results. DGSO was able to obtain feasible and good quality solution across all dataset.

Keywords: discrete group search optimizer (DGSO); Travelling salesman problem (TSP); Variable neighborhood search(VNS)

Procedia PDF Downloads 324
17034 The Impact of Artificial Intelligence on Human Rights Development

Authors: Romany Wagih Farag Zaky

Abstract:

The relationship between development and human rights has long been the subject of academic debate. To understand the dynamics between these two concepts, various principles are adopted, from the right to development to development-based human rights. Despite the initiatives taken, the relationship between development and human rights remains unclear. However, the overlap between these two views and the idea that efforts should be made in the field of human rights have increased in recent years. It is then evaluated whether the right to sustainable development is acceptable or not. The article concludes that the principles of sustainable development are directly or indirectly recognized in various human rights instruments, which is a good answer to the question posed above. This book therefore cites regional and international human rights agreements such as , as well as the jurisprudence and interpretative guidelines of human rights institutions, to prove this hypothesis.

Keywords: sustainable development, human rights, the right to development, the human rights-based approach to development, environmental rights, economic development, social sustainability human rights protection, human rights violations, workers’ rights, justice, security

Procedia PDF Downloads 55
17033 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 51
17032 A Phishing Email Detection Approach Using Machine Learning Techniques

Authors: Kenneth Fon Mbah, Arash Habibi Lashkari, Ali A. Ghorbani

Abstract:

Phishing e-mails are a security issue that not only annoys online users, but has also resulted in significant financial losses for businesses. Phishing advertisements and pornographic e-mails are difficult to detect as attackers have been becoming increasingly intelligent and professional. Attackers track users and adjust their attacks based on users’ attractions and hot topics that can be extracted from community news and journals. This research focuses on deceptive Phishing attacks and their variants such as attacks through advertisements and pornographic e-mails. We propose a framework called Phishing Alerting System (PHAS) to accurately classify e-mails as Phishing, advertisements or as pornographic. PHAS has the ability to detect and alert users for all types of deceptive e-mails to help users in decision making. A well-known email dataset has been used for these experiments and based on previously extracted features, 93.11% detection accuracy is obtainable by using J48 and KNN machine learning techniques. Our proposed framework achieved approximately the same accuracy as the benchmark while using this dataset.

Keywords: phishing e-mail, phishing detection, anti phishing, alarm system, machine learning

Procedia PDF Downloads 338
17031 Intelligent Computing with Bayesian Regularization Artificial Neural Networks for a Nonlinear System of COVID-19 Epidemic Model for Future Generation Disease Control

Authors: Tahir Nawaz Cheema, Dumitru Baleanu, Ali Raza

Abstract:

In this research work, we design intelligent computing through Bayesian Regularization artificial neural networks (BRANNs) introduced to solve the mathematical modeling of infectious diseases (Covid-19). The dynamical transmission is due to the interaction of people and its mathematical representation based on the system's nonlinear differential equations. The generation of the dataset of the Covid-19 model is exploited by the power of the explicit Runge Kutta method for different countries of the world like India, Pakistan, Italy, and many more. The generated dataset is approximately used for training, testing, and validation processes for every frequent update in Bayesian Regularization backpropagation for numerical behavior of the dynamics of the Covid-19 model. The performance and effectiveness of designed methodology BRANNs are checked through mean squared error, error histograms, numerical solutions, absolute error, and regression analysis.

Keywords: mathematical models, beysian regularization, bayesian-regularization backpropagation networks, regression analysis, numerical computing

Procedia PDF Downloads 146
17030 The Impact of Artificial Intelligence on Human Rights Development

Authors: Kerols Seif Said Botros

Abstract:

The relationship between development and human rights has been debated for a long time. Various principles, from the right to development to development-based human rights, are applied to understand the dynamics between these two concepts. Despite the measures calculated, the connection between enhancement and human rights remains vague. Despite, the connection between these two opinions and the need to strengthen human rights have increased in recent years. It will then be examined whether the right to sustainable development is acceptable or not. In various human rights instruments and this is a good vibe to the request cited above. The book then cites domestic and international human rights treaties, as well as jurisprudence and regulations defining human rights institutions, to support this view.

Keywords: sustainable development, human rights, the right to development, the human rights-based approach to development, environmental rights, economic development, social sustainability human rights protection, human rights violations, workers’ rights, justice, security.

Procedia PDF Downloads 54
17029 The Specificity of Employee Development in Polish Small Enterprises

Authors: E. Rak

Abstract:

The aim of the paper is to identify some of the specific characteristics of employee development, as observed in the practice of small enterprises in Poland. Results suggest that a sizeable percentage of employers are not interested in improving the development of their employee base. This aspect is often perceived as insignificant. In addition, many employers have no theoretical or practical knowledge of employee development methods. Lack of sufficient financial support is reported as third on the list of the most important barriers to employee development. Employees, on the other hand, typically offload the responsibility of initiating this type of activities onto the employer. Employee development plans are typically flexible and accommodating. The original value offered by this research comes in the form of a detailed characteristics of employee development in small enterprises, accompanied by identification of specificity of human resource development in Polish companies.

Keywords: employee development, human resources development, small enterprises, trainings

Procedia PDF Downloads 374
17028 Hate Speech Detection in Tunisian Dialect

Authors: Helmi Baazaoui, Mounir Zrigui

Abstract:

This study addresses the challenge of hate speech detection in Tunisian Arabic text, a critical issue for online safety and moderation. Leveraging the strengths of the AraBERT model, we fine-tuned and evaluated its performance against the Bi-LSTM model across four distinct datasets: T-HSAB, TNHS, TUNIZI-Dataset, and a newly compiled dataset with diverse labels such as Offensive Language, Racism, and Religious Intolerance. Our experimental results demonstrate that AraBERT significantly outperforms Bi-LSTM in terms of Recall, Precision, F1-Score, and Accuracy across all datasets. The findings underline the robustness of AraBERT in capturing the nuanced features of Tunisian Arabic and its superior capability in classification tasks. This research not only advances the technology for hate speech detection but also provides practical implications for social media moderation and policy-making in Tunisia. Future work will focus on expanding the datasets and exploring more sophisticated architectures to further enhance detection accuracy, thus promoting safer online interactions.

Keywords: hate speech detection, Tunisian Arabic, AraBERT, Bi-LSTM, Gemini annotation tool, social media moderation

Procedia PDF Downloads 11
17027 Effective Survey Designing for Conducting Opinion Survey to Follow Participatory Approach in a Study of Transport Infrastructure Projects: A Case Study of the City of Kolkata

Authors: Jayanti De

Abstract:

Users of any urban road infrastructure may be classified into various categories. The current paper intends to see whether the opinions on different environmental and transportation criteria vary significantly among different types of road users or not. The paper addresses this issue by using a unique survey data that has been collected from Kolkata, a highly populated city in India. Multiple criteria should be taken into account while planning on infrastructure development programs. Given limited resources, a welfare maximizing government typically resorts to public opinion by designing surveys for prioritization of one project over another. Designing such surveys can be challenging and costly. Deciding upon whom to include in a survey and how to represent each group of consumers/road-users depend crucially on how opinion for different criteria vary across consumer groups. A unique dataset has been collected from various parts of Kolkata to statistically test (using Kolmogorov-Smirnov test) whether assigning of weights to rank the transportation criteria like congestion, air pollution, noise pollution, and morning/evening delay vary significantly across the various groups of users of such infrastructure. The different consumer/user groups in the dataset include pedestrian, private car owner, para-transit (taxi /auto rickshaw) user, public transport (bus) user and freight transporter among others. Very little evidence has been found that ranking of different criteria among these groups vary significantly. This also supports the hypothesis that road- users/consumers form their opinion by using their long-run rather than immediate experience. As a policy prescription, this implies that under-representation or over-representation of a specific consumer group in a survey may not necessarily distort the overall opinion, since opinions across different consumer groups are highly correlated as evident from this particular case study.

Keywords: multi criteria analysis, project-prioritization, road- users, survey designing

Procedia PDF Downloads 280
17026 Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli

Abstract:

Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are playing an increasingly important role in automated customer service. These models, which are able to recognize complex relationships between input and output sequences, are crucial for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the focus of the model during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the domain of chatbots using the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Utilizing the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k=3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k=3). These results emphasize the crucial influence of selecting an appropriate attention-scoring function in improving the performance of seq2seq models for chatbots. Particularly, the model that integrates tanh activation proves to be a promising approach to improve the quality of chatbots in the customer support context.

Keywords: attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence

Procedia PDF Downloads 78
17025 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: Gaelle Candel, David Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning

Procedia PDF Downloads 143
17024 Deep Learning based Image Classifiers for Detection of CSSVD in Cacao Plants

Authors: Atuhurra Jesse, N'guessan Yves-Roland Douha, Pabitra Lenka

Abstract:

The detection of diseases within plants has attracted a lot of attention from computer vision enthusiasts. Despite the progress made to detect diseases in many plants, there remains a research gap to train image classifiers to detect the cacao swollen shoot virus disease or CSSVD for short, pertinent to cacao plants. This gap has mainly been due to the unavailability of high quality labeled training data. Moreover, institutions have been hesitant to share their data related to CSSVD. To fill these gaps, image classifiers to detect CSSVD-infected cacao plants are presented in this study. The classifiers are based on VGG16, ResNet50 and Vision Transformer (ViT). The image classifiers are evaluated on a recently released and publicly accessible KaraAgroAI Cocoa dataset. The best performing image classifier, based on ResNet50, achieves 95.39\% precision, 93.75\% recall, 94.34\% F1-score and 94\% accuracy on only 20 epochs. There is a +9.75\% improvement in recall when compared to previous works. These results indicate that the image classifiers learn to identify cacao plants infected with CSSVD.

Keywords: CSSVD, image classification, ResNet50, vision transformer, KaraAgroAI cocoa dataset

Procedia PDF Downloads 103
17023 Graph Based Traffic Analysis and Delay Prediction Using a Custom Built Dataset

Authors: Gabriele Borg, Alexei Debono, Charlie Abela

Abstract:

There on a constant rise in the availability of high volumes of data gathered from multiple sources, resulting in an abundance of unprocessed information that can be used to monitor patterns and trends in user behaviour. Similarly, year after year, Malta is also constantly experiencing ongoing population growth and an increase in mobilization demand. This research takes advantage of data which is continuously being sourced and converting it into useful information related to the traffic problem on the Maltese roads. The scope of this paper is to provide a methodology to create a custom dataset (MalTra - Malta Traffic) compiled from multiple participants from various locations across the island to identify the most common routes taken to expose the main areas of activity. This use of big data is seen being used in various technologies and is referred to as ITSs (Intelligent Transportation Systems), which has been concluded that there is significant potential in utilising such sources of data on a nationwide scale. Furthermore, a series of traffic prediction graph neural network models are conducted to compare MalTra to large-scale traffic datasets.

Keywords: graph neural networks, traffic management, big data, mobile data patterns

Procedia PDF Downloads 128
17022 Deep Learning for Qualitative and Quantitative Grain Quality Analysis Using Hyperspectral Imaging

Authors: Ole-Christian Galbo Engstrøm, Erik Schou Dreier, Birthe Møller Jespersen, Kim Steenstrup Pedersen

Abstract:

Grain quality analysis is a multi-parameterized problem that includes a variety of qualitative and quantitative parameters such as grain type classification, damage type classification, and nutrient regression. Currently, these parameters require human inspection, a multitude of instruments employing a variety of sensor technologies, and predictive model types or destructive and slow chemical analysis. This paper investigates the feasibility of applying near-infrared hyperspectral imaging (NIR-HSI) to grain quality analysis. For this study two datasets of NIR hyperspectral images in the wavelength range of 900 nm - 1700 nm have been used. Both datasets contain images of sparsely and densely packed grain kernels. The first dataset contains ~87,000 image crops of bulk wheat samples from 63 harvests where protein value has been determined by the FOSS Infratec NOVA which is the golden industry standard for protein content estimation in bulk samples of cereal grain. The second dataset consists of ~28,000 image crops of bulk grain kernels from seven different wheat varieties and a single rye variety. In the first dataset, protein regression analysis is the problem to solve while variety classification analysis is the problem to solve in the second dataset. Deep convolutional neural networks (CNNs) have the potential to utilize spatio-spectral correlations within a hyperspectral image to simultaneously estimate the qualitative and quantitative parameters. CNNs can autonomously derive meaningful representations of the input data reducing the need for advanced preprocessing techniques required for classical chemometric model types such as artificial neural networks (ANNs) and partial least-squares regression (PLS-R). A comparison between different CNN architectures utilizing 2D and 3D convolution is conducted. These results are compared to the performance of ANNs and PLS-R. Additionally, a variety of preprocessing techniques from image analysis and chemometrics are tested. These include centering, scaling, standard normal variate (SNV), Savitzky-Golay (SG) filtering, and detrending. The results indicate that the combination of NIR-HSI and CNNs has the potential to be the foundation for an automatic system unifying qualitative and quantitative grain quality analysis within a single sensor technology and predictive model type.

Keywords: deep learning, grain analysis, hyperspectral imaging, preprocessing techniques

Procedia PDF Downloads 99
17021 Development of Star Tracker for Satellite

Authors: S. Yelubayev, V. Ten, B. Albazarov, E. Sarsenbayev, К. Аlipbayev, A. Shamro, Т. Bopeyev, А. Sukhenko

Abstract:

Currently in Kazakhstan much attention is paid to the development of space branch. Successful launch of two Earth remote sensing satellite is carried out, projects on development of components for satellite are being carried out. In particular, the project on development of star tracker experimental model is completed. In the future it is planned to use this experimental model for development of star tracker prototype. Main stages of star tracker experimental model development are considered in this article.

Keywords: development, prototype, satellite, star tracker

Procedia PDF Downloads 477
17020 Early Recognition and Grading of Cataract Using a Combined Log Gabor/Discrete Wavelet Transform with ANN and SVM

Authors: Hadeer R. M. Tawfik, Rania A. K. Birry, Amani A. Saad

Abstract:

Eyes are considered to be the most sensitive and important organ for human being. Thus, any eye disorder will affect the patient in all aspects of life. Cataract is one of those eye disorders that lead to blindness if not treated correctly and quickly. This paper demonstrates a model for automatic detection, classification, and grading of cataracts based on image processing techniques and artificial intelligence. The proposed system is developed to ease the cataract diagnosis process for both ophthalmologists and patients. The wavelet transform combined with 2D Log Gabor Wavelet transform was used as feature extraction techniques for a dataset of 120 eye images followed by a classification process that classified the image set into three classes; normal, early, and advanced stage. A comparison between the two used classifiers, the support vector machine SVM and the artificial neural network ANN were done for the same dataset of 120 eye images. It was concluded that SVM gave better results than ANN. SVM success rate result was 96.8% accuracy where ANN success rate result was 92.3% accuracy.

Keywords: cataract, classification, detection, feature extraction, grading, log-gabor, neural networks, support vector machines, wavelet

Procedia PDF Downloads 332
17019 Developing a Machine Learning-based Cost Prediction Model for Construction Projects using Particle Swarm Optimization

Authors: Soheila Sadeghi

Abstract:

Accurate cost prediction is essential for effective project management and decision-making in the construction industry. This study aims to develop a cost prediction model for construction projects using Machine Learning techniques and Particle Swarm Optimization (PSO). The research utilizes a comprehensive dataset containing project cost estimates, actual costs, resource details, and project performance metrics from a road reconstruction project. The methodology involves data preprocessing, feature selection, and the development of an Artificial Neural Network (ANN) model optimized using PSO. The study investigates the impact of various input features, including cost estimates, resource allocation, and project progress, on the accuracy of cost predictions. The performance of the optimized ANN model is evaluated using metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared. The results demonstrate the effectiveness of the proposed approach in predicting project costs, outperforming traditional benchmark models. The feature selection process identifies the most influential variables contributing to cost variations, providing valuable insights for project managers. However, this study has several limitations. Firstly, the model's performance may be influenced by the quality and quantity of the dataset used. A larger and more diverse dataset covering different types of construction projects would enhance the model's generalizability. Secondly, the study focuses on a specific optimization technique (PSO) and a single Machine Learning algorithm (ANN). Exploring other optimization methods and comparing the performance of various ML algorithms could provide a more comprehensive understanding of the cost prediction problem. Future research should focus on several key areas. Firstly, expanding the dataset to include a wider range of construction projects, such as residential buildings, commercial complexes, and infrastructure projects, would improve the model's applicability. Secondly, investigating the integration of additional data sources, such as economic indicators, weather data, and supplier information, could enhance the predictive power of the model. Thirdly, exploring the potential of ensemble learning techniques, which combine multiple ML algorithms, may further improve cost prediction accuracy. Additionally, developing user-friendly interfaces and tools to facilitate the adoption of the proposed cost prediction model in real-world construction projects would be a valuable contribution to the industry. The findings of this study have significant implications for construction project management, enabling proactive cost estimation, resource allocation, budget planning, and risk assessment, ultimately leading to improved project performance and cost control. This research contributes to the advancement of cost prediction techniques in the construction industry and highlights the potential of Machine Learning and PSO in addressing this critical challenge. However, further research is needed to address the limitations and explore the identified future research directions to fully realize the potential of ML-based cost prediction models in the construction domain.

Keywords: cost prediction, construction projects, machine learning, artificial neural networks, particle swarm optimization, project management, feature selection, road reconstruction

Procedia PDF Downloads 58