Search results for: synthetic dataset
1851 Synthesis and Characterization of Hydroxyapatite from Biowaste for Potential Medical Application
Authors: M. D. H. Beg, John O. Akindoyo, Suriati Ghazali, Nitthiyah Jeyaratnam
Abstract:
Over the period of time, several approaches have been undertaken to mitigate the challenges associated with bone regeneration. This includes but not limited to xenografts, allografts, autografts as well as artificial substitutions like bioceramics, synthetic cements and metals. The former three techniques often come along with peculiar limitation and problems such as morbidity, availability, disease transmission, collateral site damage or absolute rejection by the body as the case may be. Synthetic routes remain the only feasible alternative option for treatment of bone defects. Hydroxyapatite (HA) is very compatible and suitable for this application. However, most of the common methods for HA synthesis are either expensive, complicated or environmentally unfriendly. Interestingly, extraction of HA from bio-wastes have been perceived not only to be cost effective, but also environment friendly. In this research, HA was synthesized from bio-waste: namely bovine bones through three different methods which are hydrothermal chemical processes, ultrasound assisted synthesis and ordinary calcination techniques. Structure and property analysis of the HA was carried out through different characterization techniques such as TGA, FTIR, and XRD. All the methods applied were able to produce HA with similar compositional properties to biomaterials found in human calcified tissues. Calcination process was however observed to be more efficient as it eliminated all the organic components from the produced HA. The HA synthesized is unique for its minimal cost and environmental friendliness. It is also perceived to be suitable for tissue and bone engineering applications.Keywords: hydroxyapatite, bone, calcination, biowaste
Procedia PDF Downloads 2491850 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic
Authors: Fei Gao, Rodolfo C. Raga Jr.
Abstract:
This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle
Procedia PDF Downloads 751849 Leveraging Natural Language Processing for Legal Artificial Intelligence: A Longformer Approach for Taiwanese Legal Cases
Abstract:
Legal artificial intelligence (LegalAI) has been increasing applications within legal systems, propelled by advancements in natural language processing (NLP). Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. Most existing language models have difficulty understanding the long-distance dependencies between different structures. Another unique challenge is that while the Judiciary of Taiwan has released legal judgments from various levels of courts over the years, there remains a significant obstacle in the lack of labeled datasets. This deficiency makes it difficult to train models with strong generalization capabilities, as well as accurately evaluate model performance. To date, models in Taiwan have yet to be specifically trained on judgment data. Given these challenges, this research proposes a Longformer-based pre-trained language model explicitly devised for retrieving similar judgments in Taiwanese legal documents. This model is trained on a self-constructed dataset, which this research has independently labeled to measure judgment similarities, thereby addressing a void left by the lack of an existing labeled dataset for Taiwanese judgments. This research adopts strategies such as early stopping and gradient clipping to prevent overfitting and manage gradient explosion, respectively, thereby enhancing the model's performance. The model in this research is evaluated using both the dataset and the Average Entropy of Offense-charged Clustering (AEOC) metric, which utilizes the notion of similar case scenarios within the same type of legal cases. Our experimental results illustrate our model's significant advancements in handling similarity comparisons within extensive legal judgments. By enabling more efficient retrieval and analysis of legal case documents, our model holds the potential to facilitate legal research, aid legal decision-making, and contribute to the further development of LegalAI in Taiwan.Keywords: legal artificial intelligence, computation and language, language model, Taiwanese legal cases
Procedia PDF Downloads 721848 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh
Abstract:
Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.Keywords: cancer classification, feature selection, deep learning, genetic algorithm
Procedia PDF Downloads 1111847 Evaluation of Arsenic Removal in Soils Contaminated by the Phytoremediation Technique
Authors: V. Ibujes, A. Guevara, P. Barreto
Abstract:
Concentration of arsenic represents a serious threat to human health. It is a bioaccumulable toxic element and is transferred through the food chain. In Ecuador, values of 0.0423 mg/kg As are registered in potatoes of the skirts of the Tungurahua volcano. The increase of arsenic contamination in Ecuador is mainly due to mining activity, since the process of gold extraction generates toxic tailings with mercury. In the Province of Azuay, due to the mining activity, the soil reaches concentrations of 2,500 to 6,420 mg/kg As whereas in the province of Tungurahua it can be found arsenic concentrations of 6.9 to 198.7 mg/kg due to volcanic eruptions. Since the contamination by arsenic, the present investigation is directed to the remediation of the soils in the provinces of Azuay and Tungurahua by phytoremediation technique and the definition of a methodology of extraction by means of analysis of arsenic in the system soil-plant. The methodology consists in selection of two types of plants that have the best arsenic removal capacity in synthetic solutions 60 μM As, a lower percentage of mortality and hydroponics resistance. The arsenic concentrations in each plant were obtained from taking 10 ml aliquots and the subsequent analysis of the ICP-OES (inductively coupled plasma-optical emission spectrometry) equipment. Soils were contaminated with synthetic solutions of arsenic with the capillarity method to achieve arsenic concentration of 13 and 15 mg/kg. Subsequently, two types of plants were evaluated to reduce the concentration of arsenic in soils for 7 weeks. The global variance for soil types was obtained with the InfoStat program. To measure the changes in arsenic concentration in the soil-plant system, the Rhizo and Wenzel arsenic extraction methodology was used and subsequently analyzed with the ICP-OES (optima 8000 Pekin Elmer). As a result, the selected plants were bluegrass and llanten, due to the high percentages of arsenic removal of 55% and 67% and low mortality rates of 9% and 8% respectively. In conclusion, Azuay soil with an initial concentration of 13 mg/kg As reached the concentrations of 11.49 and 11.04 mg/kg As for bluegrass and llanten respectively, and for the initial concentration of 15 mg/kg As reached 11.79 and 11.10 mg/kg As for blue grass and llanten after 7 weeks. For the Tungurahua soil with an initial concentration of 13 mg/kg As it reached the concentrations of 11.56 and 12.16 mg/kg As for the bluegrass and llanten respectively, and for the initial concentration of 15 mg/kg As reached 11.97 and 12.27 mg/kg Ace for bluegrass and llanten after 7 weeks. The best arsenic extraction methodology of soil-plant system is Wenzel.Keywords: blue grass, llanten, phytoremediation, soil of Azuay, soil of Tungurahua, synthetic arsenic solution
Procedia PDF Downloads 1031846 Disaggregation the Daily Rainfall Dataset into Sub-Daily Resolution in the Temperate Oceanic Climate Region
Authors: Mohammad Bakhshi, Firas Al Janabi
Abstract:
High resolution rain data are very important to fulfill the input of hydrological models. Among models of high-resolution rainfall data generation, the temporal disaggregation was chosen for this study. The paper attempts to generate three different rainfall resolutions (4-hourly, hourly and 10-minutes) from daily for around 20-year record period. The process was done by DiMoN tool which is based on random cascade model and method of fragment. Differences between observed and simulated rain dataset are evaluated with variety of statistical and empirical methods: Kolmogorov-Smirnov test (K-S), usual statistics, and Exceedance probability. The tool worked well at preserving the daily rainfall values in wet days, however, the generated data are cumulated in a shorter time period and made stronger storms. It is demonstrated that the difference between generated and observed cumulative distribution function curve of 4-hourly datasets is passed the K-S test criteria while in hourly and 10-minutes datasets the P-value should be employed to prove that their differences were reasonable. The results are encouraging considering the overestimation of generated high-resolution rainfall data.Keywords: DiMoN Tool, disaggregation, exceedance probability, Kolmogorov-Smirnov test, rainfall
Procedia PDF Downloads 2011845 ECG Based Reliable User Identification Using Deep Learning
Authors: R. N. Begum, Ambalika Sharma, G. K. Singh
Abstract:
Identity theft has serious ramifications beyond data and personal information loss. This necessitates the implementation of robust and efficient user identification systems. Therefore, automatic biometric recognition systems are the need of the hour, and ECG-based systems are unquestionably the best choice due to their appealing inherent characteristics. The CNNs are the recent state-of-the-art techniques for ECG-based user identification systems. However, the results obtained are significantly below standards, and the situation worsens as the number of users and types of heartbeats in the dataset grows. As a result, this study proposes a highly accurate and resilient ECG-based person identification system using CNN's dense learning framework. The proposed research explores explicitly the calibre of dense CNNs in the field of ECG-based human recognition. The study tests four different configurations of dense CNN which are trained on a dataset of recordings collected from eight popular ECG databases. With the highest FAR of 0.04 percent and the highest FRR of 5%, the best performing network achieved an identification accuracy of 99.94 percent. The best network is also tested with various train/test split ratios. The findings show that DenseNets are not only extremely reliable but also highly efficient. Thus, they might also be implemented in real-time ECG-based human recognition systems.Keywords: Biometrics, Dense Networks, Identification Rate, Train/Test split ratio
Procedia PDF Downloads 1601844 Polyphosphate Kinase 1 Active Site Characterization for the Identification of Novel Antimicrobial Targets
Authors: Sanaa Bardaweel
Abstract:
Inorganic polyphosphate (poly P) is present in all living forms tested to date, from each of the three kingdoms of life. Studied mainly in prokaryotes, poly P and its associated enzymes are vital in diverse basic metabolism, in at least some structural functions and, notably, in stress responses. These plentiful and unrelated roles for poly P are probably the consequence of its presence in life-forms early in evolution. The genomes of many bacterial species, including pathogens, encode a homologue of a major poly P synthetic enzyme, poly P kinase 1 (PPK1). Genetic deletion of ppk1 results in reduced poly P levels and loss of pathogens virulence towards protozoa and animals. Thus far, no PPK1 homologue has been identified in higher-order eukaryotes and, therefore, PPK1 represents a novel target for chemotherapy. The idea of the current study is to purify the PPK1 from Escherichia coli to homogeneity in order to study the effect of active site point mutations on PPK1 catalysis via the application of site-directed mutagenesis strategy. The knowledge obtained about the active site of PPK1 will be utilized to characterize the catalytic and kinetic mechanism of PPK1 with model substrates. Comprehensive understanding of the enzyme kinetic mechanism and catalysis will be used to design and screen a library of synthetic compounds for potential discovery of selective PPK1-inhibitors.Keywords: antimicobial, Escherichia coli, inorganic polyphosphate, PPK1-inhibitors
Procedia PDF Downloads 2791843 Integrated Risk Assessment of Storm Surge and Climate Change for the Coastal Infrastructure
Authors: Sergey V. Vinogradov
Abstract:
Coastal communities are presently facing increased vulnerabilities due to rising sea levels and shifts in global climate patterns, a trend expected to escalate in the long run. To address the needs of government entities, the public sector, and private enterprises, there is an urgent need to thoroughly investigate, assess, and manage the present and projected risks associated with coastal flooding, including storm surges, sea level rise, and nuisance flooding. In response to these challenges, a practical approach to evaluating storm surge inundation risks has been developed. This methodology offers an integrated assessment of potential flood risk in targeted coastal areas. The physical modeling framework involves simulating synthetic storms and utilizing hydrodynamic models that align with projected future climate and ocean conditions. Both publicly available and site-specific data form the basis for a risk assessment methodology designed to translate inundation model outputs into statistically significant projections of expected financial and operational consequences. This integrated approach produces measurable indicators of impacts stemming from floods, encompassing economic and other dimensions. By establishing connections between the frequency of modeled flood events and their consequences across a spectrum of potential future climate conditions, our methodology generates probabilistic risk assessments. These assessments not only account for future uncertainty but also yield comparable metrics, such as expected annual losses for each inundation event. These metrics furnish stakeholders with a dependable dataset to guide strategic planning and inform investments in mitigation. Importantly, the model's adaptability ensures its relevance across diverse coastal environments, even in instances where site-specific data for analysis may be limited.Keywords: climate, coastal, surge, risk
Procedia PDF Downloads 561842 Quantitative Analysis of Contract Variations Impact on Infrastructure Project Performance
Authors: Soheila Sadeghi
Abstract:
Infrastructure projects often encounter contract variations that can significantly deviate from the original tender estimates, leading to cost overruns, schedule delays, and financial implications. This research aims to quantitatively assess the impact of changes in contract variations on project performance by conducting an in-depth analysis of a comprehensive dataset from the Regional Airport Car Park project. The dataset includes tender budget, contract quantities, rates, claims, and revenue data, providing a unique opportunity to investigate the effects of variations on project outcomes. The study focuses on 21 specific variations identified in the dataset, which represent changes or additions to the project scope. The research methodology involves establishing a baseline for the project's planned cost and scope by examining the tender budget and contract quantities. Each variation is then analyzed in detail, comparing the actual quantities and rates against the tender estimates to determine their impact on project cost and schedule. The claims data is utilized to track the progress of work and identify deviations from the planned schedule. The study employs statistical analysis using R to examine the dataset, including tender budget, contract quantities, rates, claims, and revenue data. Time series analysis is applied to the claims data to track progress and detect variations from the planned schedule. Regression analysis is utilized to investigate the relationship between variations and project performance indicators, such as cost overruns and schedule delays. The research findings highlight the significance of effective variation management in construction projects. The analysis reveals that variations can have a substantial impact on project cost, schedule, and financial outcomes. The study identifies specific variations that had the most significant influence on the Regional Airport Car Park project's performance, such as PV03 (additional fill, road base gravel, spray seal, and asphalt), PV06 (extension to the commercial car park), and PV07 (additional box out and general fill). These variations contributed to increased costs, schedule delays, and changes in the project's revenue profile. The study also examines the effectiveness of project management practices in managing variations and mitigating their impact. The research suggests that proactive risk management, thorough scope definition, and effective communication among project stakeholders can help minimize the negative consequences of variations. The findings emphasize the importance of establishing clear procedures for identifying, assessing, and managing variations throughout the project lifecycle. The outcomes of this research contribute to the body of knowledge in construction project management by demonstrating the value of analyzing tender, contract, claims, and revenue data in variation impact assessment. However, the research acknowledges the limitations imposed by the dataset, particularly the absence of detailed contract and tender documents. This constraint restricts the depth of analysis possible in investigating the root causes and full extent of variations' impact on the project. Future research could build upon this study by incorporating more comprehensive data sources to further explore the dynamics of variations in construction projects.Keywords: contract variation impact, quantitative analysis, project performance, claims analysis
Procedia PDF Downloads 401841 Mental Health Diagnosis through Machine Learning Approaches
Authors: Md Rafiqul Islam, Ashir Ahmed, Anwaar Ulhaq, Abu Raihan M. Kamal, Yuan Miao, Hua Wang
Abstract:
Mental health of people is equally important as of their physical health. Mental health and well-being are influenced not only by individual attributes but also by the social circumstances in which people find themselves and the environment in which they live. Like physical health, there is a number of internal and external factors such as biological, social and occupational factors that could influence the mental health of people. People living in poverty, suffering from chronic health conditions, minority groups, and those who exposed to/or displaced by war or conflict are generally more likely to develop mental health conditions. However, to authors’ best knowledge, there is dearth of knowledge on the impact of workplace (especially the highly stressed IT/Tech workplace) on the mental health of its workers. This study attempts to examine the factors influencing the mental health of tech workers. A publicly available dataset containing more than 65,000 cells and 100 attributes is examined for this purpose. Number of machine learning techniques such as ‘Decision Tree’, ‘K nearest neighbor’ ‘Support Vector Machine’ and ‘Ensemble’, are then applied to the selected dataset to draw the findings. It is anticipated that the analysis reported in this study would contribute in presenting useful insights on the attributes contributing in the mental health of tech workers using relevant machine learning techniques.Keywords: mental disorder, diagnosis, occupational stress, IT workplace
Procedia PDF Downloads 2881840 Brain Tumor Detection and Classification Using Pre-Trained Deep Learning Models
Authors: Aditya Karade, Sharada Falane, Dhananjay Deshmukh, Vijaykumar Mantri
Abstract:
Brain tumors pose a significant challenge in healthcare due to their complex nature and impact on patient outcomes. The application of deep learning (DL) algorithms in medical imaging have shown promise in accurate and efficient brain tumour detection. This paper explores the performance of various pre-trained DL models ResNet50, Xception, InceptionV3, EfficientNetB0, DenseNet121, NASNetMobile, VGG19, VGG16, and MobileNet on a brain tumour dataset sourced from Figshare. The dataset consists of MRI scans categorizing different types of brain tumours, including meningioma, pituitary, glioma, and no tumour. The study involves a comprehensive evaluation of these models’ accuracy and effectiveness in classifying brain tumour images. Data preprocessing, augmentation, and finetuning techniques are employed to optimize model performance. Among the evaluated deep learning models for brain tumour detection, ResNet50 emerges as the top performer with an accuracy of 98.86%. Following closely is Xception, exhibiting a strong accuracy of 97.33%. These models showcase robust capabilities in accurately classifying brain tumour images. On the other end of the spectrum, VGG16 trails with the lowest accuracy at 89.02%.Keywords: brain tumour, MRI image, detecting and classifying tumour, pre-trained models, transfer learning, image segmentation, data augmentation
Procedia PDF Downloads 741839 Analysis of the Lung Microbiome in Cystic Fibrosis Patients Using 16S Sequencing
Authors: Manasvi Pinnaka, Brianna Chrisman
Abstract:
Cystic fibrosis patients often develop lung infections that range anywhere in severity from mild to life-threatening due to the presence of thick and sticky mucus that fills their airways. Since many of these infections are chronic, they not only affect a patient’s ability to breathe but also increase the chances of mortality by respiratory failure. With a publicly available dataset of DNA sequences from bacterial species in the lung microbiome of cystic fibrosis patients, the correlations between different microbial species in the lung and the extent of deterioration of lung function were investigated. 16S sequencing technologies were used to determine the microbiome composition of the samples in the dataset. For the statistical analyses, referencing helped distinguish between taxonomies, and the proportions of certain taxa relative to another were determined. It was found that the Fusobacterium, Actinomyces, and Leptotrichia microbial types all had a positive correlation with the FEV1 score, indicating the potential displacement of these species by pathogens as the disease progresses. However, the dominant pathogens themselves, including Pseudomonas aeruginosa and Staphylococcus aureus, did not have statistically significant negative correlations with the FEV1 score as described by past literature. Examining the lung microbiology of cystic fibrosis patients can help with the prediction of the current condition of lung function, with the potential to guide doctors when designing personalized treatment plans for patients.Keywords: bacterial infections, cystic fibrosis, lung microbiome, 16S sequencing
Procedia PDF Downloads 991838 Evaluation and Compression of Different Language Transformer Models for Semantic Textual Similarity Binary Task Using Minority Language Resources
Authors: Ma. Gracia Corazon Cayanan, Kai Yuen Cheong, Li Sha
Abstract:
Training a language model for a minority language has been a challenging task. The lack of available corpora to train and fine-tune state-of-the-art language models is still a challenge in the area of Natural Language Processing (NLP). Moreover, the need for high computational resources and bulk data limit the attainment of this task. In this paper, we presented the following contributions: (1) we introduce and used a translation pair set of Tagalog and English (TL-EN) in pre-training a language model to a minority language resource; (2) we fine-tuned and evaluated top-ranking and pre-trained semantic textual similarity binary task (STSB) models, to both TL-EN and STS dataset pairs. (3) then, we reduced the size of the model to offset the need for high computational resources. Based on our results, the models that were pre-trained to translation pairs and STS pairs can perform well for STSB task. Also, having it reduced to a smaller dimension has no negative effect on the performance but rather has a notable increase on the similarity scores. Moreover, models that were pre-trained to a similar dataset have a tremendous effect on the model’s performance scores.Keywords: semantic matching, semantic textual similarity binary task, low resource minority language, fine-tuning, dimension reduction, transformer models
Procedia PDF Downloads 2111837 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis
Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel
Abstract:
Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.Keywords: artificial immune system, breast cancer diagnosis, Euclidean function, Gaussian function
Procedia PDF Downloads 4351836 Efficient Ni(II)-Containing Layered Triple Hydroxide-Based Catalysts: Synthesis, Characterisation and Their Role in the Heck Reaction
Authors: Gabor Varga, Krisztina Karadi, Zoltan Konya, Akos Kukovecz, Pal Sipos, Istvan Palinko
Abstract:
Nickel can efficiently replace palladium in the Heck, Suzuki and Negishi reactions. This study focuses on the synthesis and catalytic application of Ni(II)-containing layered double hydroxides (LDHs) and layered triple hydroxides (LTHs). Our goals were to incorporate Ni(II) ions among the layers of LDHs or LTHs, or binding it to their surface or building it into their layers in such a way that their catalytic activities are maintained or even increased. The LDHs and LTHs were prepared by the co-precipitation method using ethylene glycol as co-solvent. In several cases, post-synthetic modifications (e.g., thermal treatment) were performed. After optimizing the synthesis conditions, the composites displayed good crystallinity and were free of byproducts. The success of the syntheses and the post-synthetic modifications was confirmed by relevant characterization methods (XRD, SEM, SEM-EDX and combined IR techniques). Catalytic activities of the produced and well-characterized solids were investigated through the Heck reaction. The composites behaved as efficient, recyclable catalysts in the Heck reaction between 4-bromoanisole and styrene. Through varying the reaction parameters, we were able to obtain acceptable conversions under mild conditions. Our study highlights the possibility of the application of Ni(II)-containing composites as efficient catalysts in coupling reactions.Keywords: layered double hydroxide, layered triple hydroxide, heterogeneous catalysis, heck reaction
Procedia PDF Downloads 1741835 JaCoText: A Pretrained Model for Java Code-Text Generation
Authors: Jessica Lopez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri
Abstract:
Pretrained transformer-based models have shown high performance in natural language generation tasks. However, a new wave of interest has surged: automatic programming language code generation. This task consists of translating natural language instructions to a source code. Despite the fact that well-known pre-trained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformer neural network. It aims to generate java source code from natural language text. JaCoText leverages the advantages of both natural language and code generation models. More specifically, we study some findings from state of the art and use them to (1) initialize our model from powerful pre-trained models, (2) explore additional pretraining on our java dataset, (3) lead experiments combining the unimodal and bimodal data in training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.Keywords: java code generation, natural language processing, sequence-to-sequence models, transformer neural networks
Procedia PDF Downloads 2841834 Brief Inquisition of Photocatalytic Degradation of Azo Dyes by Magnetically Enhanced Zinc Oxide Nanoparticles
Authors: Thian Khoon Tan, Poi Sim Khiew, Wee Siong Chiu, Chin Hua Chia
Abstract:
This study investigates the efficacy of magnetically enhanced zinc oxide (MZnO) nanoparticles as a photocatalyst in the photodegradation of synthetic dyes, especially azo dyes. This magnetised zinc oxide has been simply fabricated by mechanical mixing through low-temperature calcination. This MZnO has been analysed through several analytical measurements, including FESEM, XRD, BET, EDX, and TEM, as well as VSM analysis which reflects successful fabrication. A high volume of azo dyes was found in industries effluent wastewater. They contribute to serious environmental stability and are very harmful to human health due to their high stability and carcinogenic properties. Therefore, five azo dyes, Reactive Red 120 (RR120), Disperse Blue 15 (DB15), Acid Brown 14 (AB14), Orange G (OG), and Acid Orange 7 (AO7), have been randomly selected to study their photodegradation property with reference to few characteristics, such as number of azo functional groups, benzene groups, molecular mass, and absorbance. The photocatalytic degradation efficiency was analysed by using a UV-vis spectrophotometer, where the reaction rate constant was obtained. It was found that azo dyes were significantly degraded through the first-order rate constant, which shows a higher kinetic constant as the number of azo functional groups and benzene group increases. However, the kinetic constant is inversely proportional to the molecular weight of these azo dyes.Keywords: nanoparticles, photocatalyst, magnetically enhanced, wastewater, synthetic dyes, azo dyes
Procedia PDF Downloads 111833 The Reproducibility and Repeatability of Modified Likelihood Ratio for Forensics Handwriting Examination
Authors: O. Abiodun Adeyinka, B. Adeyemo Adesesan
Abstract:
The forensic use of handwriting depends on the analysis, comparison, and evaluation decisions made by forensic document examiners. When using biometric technology in forensic applications, it is necessary to compute Likelihood Ratio (LR) for quantifying strength of evidence under two competing hypotheses, namely the prosecution and the defense hypotheses wherein a set of assumptions and methods for a given data set will be made. It is therefore important to know how repeatable and reproducible our estimated LR is. This paper evaluated the accuracy and reproducibility of examiners' decisions. Confidence interval for the estimated LR were presented so as not get an incorrect estimate that will be used to deliver wrong judgment in the court of Law. The estimate of LR is fundamentally a Bayesian concept and we used two LR estimators, namely Logistic Regression (LoR) and Kernel Density Estimator (KDE) for this paper. The repeatability evaluation was carried out by retesting the initial experiment after an interval of six months to observe whether examiners would repeat their decisions for the estimated LR. The experimental results, which are based on handwriting dataset, show that LR has different confidence intervals which therefore implies that LR cannot be estimated with the same certainty everywhere. Though the LoR performed better than the KDE when tested using the same dataset, the two LR estimators investigated showed a consistent region in which LR value can be estimated confidently. These two findings advance our understanding of LR when used in computing the strength of evidence in handwriting using forensics.Keywords: confidence interval, handwriting, kernel density estimator, KDE, logistic regression LoR, repeatability, reproducibility
Procedia PDF Downloads 1241832 Content-Aware Image Augmentation for Medical Imaging Applications
Authors: Filip Rusak, Yulia Arzhaeva, Dadong Wang
Abstract:
Machine learning based Computer-Aided Diagnosis (CAD) is gaining much popularity in medical imaging and diagnostic radiology. However, it requires a large amount of high quality and labeled training image datasets. The training images may come from different sources and be acquired from different radiography machines produced by different manufacturers, digital or digitized copies of film radiographs, with various sizes as well as different pixel intensity distributions. In this paper, a content-aware image augmentation method is presented to deal with these variations. The results of the proposed method have been validated graphically by plotting the removed and added seams of pixels on original images. Two different chest X-ray (CXR) datasets are used in the experiments. The CXRs in the datasets defer in size, some are digital CXRs while the others are digitized from analog CXR films. With the proposed content-aware augmentation method, the Seam Carving algorithm is employed to resize CXRs and the corresponding labels in the form of image masks, followed by histogram matching used to normalize the pixel intensities of digital radiography, based on the pixel intensity values of digitized radiographs. We implemented the algorithms, resized the well-known Montgomery dataset, to the size of the most frequently used Japanese Society of Radiological Technology (JSRT) dataset and normalized our digital CXRs for testing. This work resulted in the unified off-the-shelf CXR dataset composed of radiographs included in both, Montgomery and JSRT datasets. The experimental results show that even though the amount of augmentation is large, our algorithm can preserve the important information in lung fields, local structures, and global visual effect adequately. The proposed method can be used to augment training and testing image data sets so that the trained machine learning model can be used to process CXRs from various sources, and it can be potentially used broadly in any medical imaging applications.Keywords: computer-aided diagnosis, image augmentation, lung segmentation, medical imaging, seam carving
Procedia PDF Downloads 2221831 Nano-Zinc Oxide: A Powerful and Recyclable Catalyst for Chemospecific Synthesis of Dicoumarols Based on Aryl Glyoxals
Authors: F. Jafari, S. GharehzadehShirazi, S. Khodabakhshi
Abstract:
An efficient, simple, and environmentally benign procedure for the one-pot synthesis of dicoumarols was reported. The reaction entails the condensation of aryl glyoxals and 4-hydroxyxoumarin in the presence of catalytic amount of zinc oxide nanoparticles (ZnO NPs) as recyclable catalyst in aqueous media. High product yields and use of clean conditions are important factors of green chemistry.Part of our continued interest to achieve high atom economic reactions by the use safe catalysts. The reaction mixture was refluxed with catalytic amount (3 mol%) of zinc oxide nanoparticles.Reducing the amount of toxic waste and byproducts arising from chemical reactions is an important issue in the context of green chemistry. In comparison with commonly organic solvents, the aqueous media is cheaper and more environmentally friendly. Avoiding the use of organic solvents is an important way to prevent waste in chemical processes. In the context of green and sustainable chemistry, one ofthe most promising approaches is the use of water as the reaction media. In recent years, there has been increasing recognition that water is an attractive media for manyorganic reactions. Using water continues to attract wide attention among synthetic chemists in the design of new synthetic methods.Keywords: zinc oxide, dicoumarol, aryl glyoxal, green chemistry, catalyst
Procedia PDF Downloads 3541830 Multivariate Rainfall Disaggregation Using MuDRain Model: Malaysia Experience
Authors: Ibrahim Suliman Hanaish
Abstract:
Disaggregation daily rainfall using stochastic models formulated based on multivariate approach (MuDRain) is discussed in this paper. Seven rain gauge stations are considered in this study for different distances from the referred station starting from 4 km to 160 km in Peninsular Malaysia. The hourly rainfall data used are covered the period from 1973 to 2008 and July and November months are considered as an example of dry and wet periods. The cross-correlation among the rain gauges is considered for the available hourly rainfall information at the neighboring stations or not. This paper discussed the applicability of the MuDRain model for disaggregation daily rainfall to hourly rainfall for both sources of cross-correlation. The goodness of fit of the model was based on the reproduction of fitting statistics like the means, variances, coefficients of skewness, lag zero cross-correlation of coefficients and the lag one auto correlation of coefficients. It is found the correlation coefficients based on extracted correlations that was based on daily are slightly higher than correlations based on available hourly rainfall especially for neighboring stations not more than 28 km. The results showed also the MuDRain model did not reproduce statistics very well. In addition, a bad reproduction of the actual hyetographs comparing to the synthetic hourly rainfall data. Mean while, it is showed a good fit between the distribution function of the historical and synthetic hourly rainfall. These discrepancies are unavoidable because of the lowest cross correlation of hourly rainfall. The overall performance indicated that the MuDRain model would not be appropriate choice for disaggregation daily rainfall.Keywords: rainfall disaggregation, multivariate disaggregation rainfall model, correlation, stochastic model
Procedia PDF Downloads 5151829 Algae Biofertilizers Promote Sustainable Food Production and Nutrient Efficiency: An Integrated Empirical-Modeling Study
Authors: Zeenat Rupawalla, Nicole Robinson, Susanne Schmidt, Sijie Li, Selina Carruthers, Elodie Buisset, John Roles, Ben Hankamer, Juliane Wolf
Abstract:
Agriculture has radically changed the global biogeochemical cycle of nitrogen (N). Fossil fuel-enabled synthetic N-fertiliser is a foundation of modern agriculture but applied to soil crops only use about half of it. To address N-pollution from cropping and the large carbon and energy footprint of N-fertiliser synthesis, new technologies delivering enhanced energy efficiency, decarbonisation, and a circular nutrient economy are needed. We characterised algae fertiliser (AF) as an alternative to synthetic N-fertiliser (SF) using empirical and modelling approaches. We cultivated microalgae in nutrient solution and modelled up-scaled production in nutrient-rich wastewater. Over four weeks, AF released 63.5% of N as ammonium and nitrate, and 25% of phosphorous (P) as phosphate to the growth substrate, while SF released 100% N and 20% P. To maximise crop N-use and minimise N-leaching, we explored AF and SF dose-response-curves with spinach in glasshouse conditions. AF-grown spinach produced 36% less biomass than SF-grown plants due to AF’s slower and linear N-release, while SF resulted in 5-times higher N-leaching loss than AF. Optimised blends of AF and SF boosted crop yield and minimised N-loss due to greater synchrony of N-release and crop uptake. Additional benefits of AF included greener leaves, lower leaf nitrate concentration, and higher microbial diversity and water holding capacity in the growth substrate. Life-cycle-analysis showed that replacing the most effective SF dosage with AF lowered the carbon footprint of fertiliser production from 2.02 g CO₂ (C-producing) to -4.62 g CO₂ (C-sequestering), with a further 12% reduction when AF is produced on wastewater. Embodied energy was lowest for AF-SF blends and could be reduced by 32% when cultivating algae on wastewater. We conclude that (i) microalgae offer a sustainable alternative to synthetic N-fertiliser in spinach production and potentially other crop systems, and (ii) microalgae biofertilisers support the circular nutrient economy and several sustainable development goals.Keywords: bioeconomy, decarbonisation, energy footprint, microalgae
Procedia PDF Downloads 1371828 Development and Characterization of Synthetic Non-Woven for Sound Absorption
Authors: P. Sam Vimal Rajkumar, K. Priyanga
Abstract:
Acoustics is the scientific study of sound which includes the effect of reflection, refraction, absorption, diffraction and interference. Sound can be considered as a wave phenomenon. A sound wave is a longitudinal wave where particles of the medium are temporarily displaced in a direction parallel to energy transport and then return to their original position. The vibration in a medium produces alternating waves of relatively dense and sparse particles –compression and rarefaction respectively. The resultant variation to normal ambient pressure is translated by the ear and perceived as sound. Today much importance is given to the acoustical environment. The noise sources are increased day by day and annoying level is strongly violated in different locations by traffic, sound systems, and industries. There is simple evidence showing that the high noise levels cause sleep disturbance, hearing loss, decrease in productivity, learning disability, lower scholastic performance and increase in stress related hormones and blood pressure. Therefore, achieving a pleasing and noise free environment is one of the endeavours of many a research groups. This can be obtained by using various techniques. One such technique is by using suitable materials with good sound absorbing properties. The conventionally used materials that possess sound absorbing properties are rock wool or glass wool. In this work, an attempt is made to use synthetic material in both fibrous and sheet form and use it for manufacturing of non-woven for sound absorption.Keywords: acoustics, fibre, non-woven, noise, sound absorption properties, sound absorption coefficient
Procedia PDF Downloads 3011827 Deep Feature Augmentation with Generative Adversarial Networks for Class Imbalance Learning in Medical Images
Authors: Rongbo Shen, Jianhua Yao, Kezhou Yan, Kuan Tian, Cheng Jiang, Ke Zhou
Abstract:
This study proposes a generative adversarial networks (GAN) framework to perform synthetic sampling in feature space, i.e., feature augmentation, to address the class imbalance problem in medical image analysis. A feature extraction network is first trained to convert images into feature space. Then the GAN framework incorporates adversarial learning to train a feature generator for the minority class through playing a minimax game with a discriminator. The feature generator then generates features for minority class from arbitrary latent distributions to balance the data between the majority class and the minority class. Additionally, a data cleaning technique, i.e., Tomek link, is employed to clean up undesirable conflicting features introduced from the feature augmentation and thus establish well-defined class clusters for the training. The experiment section evaluates the proposed method on two medical image analysis tasks, i.e., mass classification on mammogram and cancer metastasis classification on histopathological images. Experimental results suggest that the proposed method obtains superior or comparable performance over the state-of-the-art counterparts. Compared to all counterparts, our proposed method improves more than 1.5 percentage of accuracy.Keywords: class imbalance, synthetic sampling, feature augmentation, generative adversarial networks, data cleaning
Procedia PDF Downloads 1271826 A Comparative Asessment of Some Algorithms for Modeling and Forecasting Horizontal Displacement of Ialy Dam, Vietnam
Authors: Kien-Trinh Thi Bui, Cuong Manh Nguyen
Abstract:
In order to simulate and reproduce the operational characteristics of a dam visually, it is necessary to capture the displacement at different measurement points and analyze the observed movement data promptly to forecast the dam safety. The accuracy of forecasts is further improved by applying machine learning methods to data analysis progress. In this study, the horizontal displacement monitoring data of the Ialy hydroelectric dam was applied to machine learning algorithms: Gaussian processes, multi-layer perceptron neural networks, and the M5-rules algorithm for modelling and forecasting of horizontal displacement of the Ialy hydropower dam (Vietnam), respectively, for analysing. The database which used in this research was built by collecting time series of data from 2006 to 2021 and divided into two parts: training dataset and validating dataset. The final results show all three algorithms have high performance for both training and model validation, but the MLPs is the best model. The usability of them are further investigated by comparison with a benchmark models created by multi-linear regression. The result show the performance which obtained from all the GP model, the MLPs model and the M5-Rules model are much better, therefore these three models should be used to analyze and predict the horizontal displacement of the dam.Keywords: Gaussian processes, horizontal displacement, hydropower dam, Ialy dam, M5-Rules, multi-layer perception neural networks
Procedia PDF Downloads 2101825 Development and Validation of HPLC Method on Determination of Acesulfame-K in Jelly Drink Product
Authors: Candra Irawan, David Yudianto, Ahsanu Nadiyya, Dewi Anna Br Sitepu, Hanafi, Erna Styani
Abstract:
Jelly drink was produced from a combination of both natural and synthetic materials, such as acesulfame potassium (acesulfame-K) as synthetic sweetener material. Acesulfame-K content in jelly drink could be determined by High-Performance Liquid Chromatography (HPLC), but this method needed validation due to having a change on the reagent addition step which skips the carrez addition and comparison of mix mobile phase (potassium dihydrogen phosphate and acetonitrile) with ratio from 75:25 to 90:10 to be more efficient and cheap. This study was conducted to evaluate the performance of determination method for acesulfame-K content in the jelly drink by HPLC. The method referred to Deutsches Institut fur Normung European Standard International Organization for Standardization (DIN EN ISO):12856 (1999) about Foodstuffs, Determination of acesulfame-K, aspartame and saccharin. The result of the correlation coefficient value (r) on the linearity test was 0.9987 at concentration range 5-100 mg/L. Detection limit value was 0.9153 ppm, while the quantitation limit value was 1.1932 ppm. The recovery (%) value on accuracy test for sample concentration by spiking 100 mg/L was 102-105%. Relative Standard Deviation (RSD) value for precision and homogenization tests were 2.815% and 4.978%, respectively. Meanwhile, the comparative and stability tests were tstat (0.136) < ttable (2.101) and |µ1-µ2| (1.502) ≤ 0.3×CV Horwitz. Obstinacy test value was tstat < ttable. It can be concluded that the HPLC method for the determination of acesulfame-K in jelly drink product by HPLC has been valid and can be used for analysis with good performance.Keywords: acesulfame-K, jelly drink, HPLC, validation
Procedia PDF Downloads 1291824 Non-intrusive Hand Control of Drone Using an Inexpensive and Streamlined Convolutional Neural Network Approach
Authors: Evan Lowhorn, Rocio Alba-Flores
Abstract:
The purpose of this work is to develop a method for classifying hand signals and using the output in a drone control algorithm. To achieve this, methods based on Convolutional Neural Networks (CNN) were applied. CNN's are a subset of deep learning, which allows grid-like inputs to be processed and passed through a neural network to be trained for classification. This type of neural network allows for classification via imaging, which is less intrusive than previous methods using biosensors, such as EMG sensors. Classification CNN's operate purely from the pixel values in an image; therefore they can be used without additional exteroceptive sensors. A development bench was constructed using a desktop computer connected to a high-definition webcam mounted on a scissor arm. This allowed the camera to be pointed downwards at the desk to provide a constant solid background for the dataset and a clear detection area for the user. A MATLAB script was created to automate dataset image capture at the development bench and save the images to the desktop. This allowed the user to create their own dataset of 12,000 images within three hours. These images were evenly distributed among seven classes. The defined classes include forward, backward, left, right, idle, and land. The drone has a popular flip function which was also included as an additional class. To simplify control, the corresponding hand signals chosen were the numerical hand signs for one through five for movements, a fist for land, and the universal “ok” sign for the flip command. Transfer learning with PyTorch (Python) was performed using a pre-trained 18-layer residual learning network (ResNet-18) to retrain the network for custom classification. An algorithm was created to interpret the classification and send encoded messages to a Ryze Tello drone over its 2.4 GHz Wi-Fi connection. The drone’s movements were performed in half-meter distance increments at a constant speed. When combined with the drone control algorithm, the classification performed as desired with negligible latency when compared to the delay in the drone’s movement commands.Keywords: classification, computer vision, convolutional neural networks, drone control
Procedia PDF Downloads 2101823 Network and Sentiment Analysis of U.S. Congressional Tweets
Authors: Chaitanya Kanakamedala, Hansa Pradhan, Carter Gilbert
Abstract:
Social media platforms, such as Twitter, are excellent datasets for understanding human interactions and sentiments. This report explores social dynamics among US Congressional members through a network analysis applied to a dataset of tweets spanning 2008 to 2017 from the ’US Congressional Tweets Dataset’. In this report, we preform network analysis where connections between users (edges) are established based on a similarity threshold: two tweets are connected if the tweets they post are similar. By utilizing the Natural Language Toolkit (NLTK) and NetworkX, we quantified tweet similarity and constructed a graph comprising various interconnected components. Each component represents a cluster of users with closely aligned content. We then preform sentiment analysis on each cluster to explore the prevalent emotions and opinions within these groups. Our findings reveal that despite the initial expectation of distinct ideological divisions typically aligning with party lines, the analysis exposed a high degree of topical convergence across tweets from different political affiliations. The analysis preformed in this report not only highlights the potential of social media as a tool for political communication but also suggests a complex layer of interaction that transcends traditional partisan boundaries, reflecting a complicated landscape of politics in the digital age.Keywords: natural language processing, sentiment analysis, centrality analysis, topic modeling
Procedia PDF Downloads 331822 Formulation of Value Added Beff Meatballs with the Addition of Pomegranate (Punica granatum) Extract as a Source of Natural Antioxident
Authors: M. A. Hashem, I. Jahan
Abstract:
The experiment was conducted to find out the effect of different levels of Pomegranate (Punica granatum) extract and synthetic antioxidant BHA (Beta Hydroxyl Anisole) on fresh and preserved beef meatballs in order to make functional food. For this purpose, ground beef samples were divided into five treatment groups. They were treated as control group, 0.1% synthetic antioxidant group, 0.1%, 0.2% and 0.3% pomegranate extract group as T1, T2, T3, T4 and T5 respectively. Proximate analysis, sensory tests (color, flavor, tenderness, juiciness, overall acceptability), cooking loss, pH value, free fatty acids (FFA), thiobarbituric acid values (TBARS), peroxide value (POV) and microbiological examination were determined in order to evaluate the effect of pomegranate extract as natural antioxidant and antimicrobial activities compared to BHA (Beta Hydroxyl Anisole) at first day before freezing and for maintaining meatballs qualities on the shelf life of beef meat balls stored for 60 days under frozen condition. Freezing temperature was -20˚C. Days of intervals of experiment were on 0, 15th, 30th and 60th days. Dry matter content of all the treatment groups differ significantly (p<0.05). On the contrary, DM content increased significantly (p<0.05) with the advancement of different days of intervals. CP content of all the treatments were increased significantly (p<0.05) among the different treatment groups. EE and Ash content were decreased significantly (p<0.05) at different treatment levels. FFA values, TBARS, POV were decreased significantly (p<0.05) at different treatment levels. Color, odor, tenderness, juiciness, overall acceptability decreased significantly (p<0.05) at different days of intervals. Raw PH, cooked pH were increased at different treatment levels significantly (p<0.05). The cooking loss (%) at different treatment levels were differ significantly (p<0.05). TVC (logCFU/g), TCC (logCFU/g) and TYMC (logCFU/g) was decreased significantly (p<0.05) at different treatment levels and at different days of intervals comparison to control. Considering CP, tenderness, juiciness, overall acceptability, cooking loss, FFA, POV, TBARS value and microbial analysis it can be concluded that pomegranate extract at 0.1%, 0.2% and 0.3% can be used instead of synthetic antioxidant BHA in beef meatballs. On the basis of sensory evaluation, nutrient quality, physicochemical properties, biochemical analysis and microbial analysis 0.3% Pomegranate extract can be recommended for formulation of value added beef meatball enriched with natural antioxidant.Keywords: antioxidant, pomegranate, BHA, value added meat products
Procedia PDF Downloads 246