Search results for: imbalanced datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 746

Search results for: imbalanced datasets

86 Parallel Fuzzy Rough Support Vector Machine for Data Classification in Cloud Environment

Authors: Arindam Chaudhuri

Abstract:

Classification of data has been actively used for most effective and efficient means of conveying knowledge and information to users. The prima face has always been upon techniques for extracting useful knowledge from data such that returns are maximized. With emergence of huge datasets the existing classification techniques often fail to produce desirable results. The challenge lies in analyzing and understanding characteristics of massive data sets by retrieving useful geometric and statistical patterns. We propose a supervised parallel fuzzy rough support vector machine (PFRSVM) for data classification in cloud environment. The classification is performed by PFRSVM using hyperbolic tangent kernel. The fuzzy rough set model takes care of sensitiveness of noisy samples and handles impreciseness in training samples bringing robustness to results. The membership function is function of center and radius of each class in feature space and is represented with kernel. It plays an important role towards sampling the decision surface. The success of PFRSVM is governed by choosing appropriate parameter values. The training samples are either linear or nonlinear separable. The different input points make unique contributions to decision surface. The algorithm is parallelized with a view to reduce training times. The system is built on support vector machine library using Hadoop implementation of MapReduce. The algorithm is tested on large data sets to check its feasibility and convergence. The performance of classifier is also assessed in terms of number of support vectors. The challenges encountered towards implementing big data classification in machine learning frameworks are also discussed. The experiments are done on the cloud environment available at University of Technology and Management, India. The results are illustrated for Gaussian RBF and Bayesian kernels. The effect of variability in prediction and generalization of PFRSVM is examined with respect to values of parameter C. It effectively resolves outliers’ effects, imbalance and overlapping class problems, normalizes to unseen data and relaxes dependency between features and labels. The average classification accuracy for PFRSVM is better than other classifiers for both Gaussian RBF and Bayesian kernels. The experimental results on both synthetic and real data sets clearly demonstrate the superiority of the proposed technique.

Keywords: FRSVM, Hadoop, MapReduce, PFRSVM

Procedia PDF Downloads 466
85 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 143
84 Comparing Xbar Charts: Conventional versus Reweighted Robust Estimation Methods for Univariate Data Sets

Authors: Ece Cigdem Mutlu, Burak Alakent

Abstract:

Maintaining the quality of manufactured products at a desired level depends on the stability of process dispersion and location parameters and detection of perturbations in these parameters as promptly as possible. Shewhart control chart is the most widely used technique in statistical process monitoring to monitor the quality of products and control process mean and variability. In the application of Xbar control charts, sample standard deviation and sample mean are known to be the most efficient conventional estimators in determining process dispersion and location parameters, respectively, based on the assumption of independent and normally distributed datasets. On the other hand, there is no guarantee that the real-world data would be normally distributed. In the cases of estimated process parameters from Phase I data clouded with outliers, efficiency of traditional estimators is significantly reduced, and performance of Xbar charts are undesirably low, e.g. occasional outliers in the rational subgroups in Phase I data set may considerably affect the sample mean and standard deviation, resulting a serious delay in detection of inferior products in Phase II. For more efficient application of control charts, it is required to use robust estimators against contaminations, which may exist in Phase I. In the current study, we present a simple approach to construct robust Xbar control charts using average distance to the median, Qn-estimator of scale, M-estimator of scale with logistic psi-function in the estimation of process dispersion parameter, and Harrell-Davis qth quantile estimator, Hodge-Lehmann estimator and M-estimator of location with Huber psi-function and logistic psi-function in the estimation of process location parameter. Phase I efficiency of proposed estimators and Phase II performance of Xbar charts constructed from these estimators are compared with the conventional mean and standard deviation statistics both under normality and against diffuse-localized and symmetric-asymmetric contaminations using 50,000 Monte Carlo simulations on MATLAB. Consequently, it is found that robust estimators yield parameter estimates with higher efficiency against all types of contaminations, and Xbar charts constructed using robust estimators have higher power in detecting disturbances, compared to conventional methods. Additionally, utilizing individuals charts to screen outlier subgroups and employing different combination of dispersion and location estimators on subgroups and individual observations are found to improve the performance of Xbar charts.

Keywords: average run length, M-estimators, quality control, robust estimators

Procedia PDF Downloads 170
83 Unveiling Comorbidities in Irritable Bowel Syndrome: A UK BioBank Study utilizing Supervised Machine Learning

Authors: Uswah Ahmad Khan, Muhammad Moazam Fraz, Humayoon Shafique Satti, Qasim Aziz

Abstract:

Approximately 10-14% of the global population experiences a functional disorder known as irritable bowel syndrome (IBS). The disorder is defined by persistent abdominal pain and an irregular bowel pattern. IBS significantly impairs work productivity and disrupts patients' daily lives and activities. Although IBS is widespread, there is still an incomplete understanding of its underlying pathophysiology. This study aims to help characterize the phenotype of IBS patients by differentiating the comorbidities found in IBS patients from those in non-IBS patients using machine learning algorithms. In this study, we extracted samples coding for IBS from the UK BioBank cohort and randomly selected patients without a code for IBS to create a total sample size of 18,000. We selected the codes for comorbidities of these cases from 2 years before and after their IBS diagnosis and compared them to the comorbidities in the non-IBS cohort. Machine learning models, including Decision Trees, Gradient Boosting, Support Vector Machine (SVM), AdaBoost, Logistic Regression, and XGBoost, were employed to assess their accuracy in predicting IBS. The most accurate model was then chosen to identify the features associated with IBS. In our case, we used XGBoost feature importance as a feature selection method. We applied different models to the top 10% of features, which numbered 50. Gradient Boosting, Logistic Regression and XGBoost algorithms yielded a diagnosis of IBS with an optimal accuracy of 71.08%, 71.427%, and 71.53%, respectively. Among the comorbidities most closely associated with IBS included gut diseases (Haemorrhoids, diverticular diseases), atopic conditions(asthma), and psychiatric comorbidities (depressive episodes or disorder, anxiety). This finding emphasizes the need for a comprehensive approach when evaluating the phenotype of IBS, suggesting the possibility of identifying new subsets of IBS rather than relying solely on the conventional classification based on stool type. Additionally, our study demonstrates the potential of machine learning algorithms in predicting the development of IBS based on comorbidities, which may enhance diagnosis and facilitate better management of modifiable risk factors for IBS. Further research is necessary to confirm our findings and establish cause and effect. Alternative feature selection methods and even larger and more diverse datasets may lead to more accurate classification models. Despite these limitations, our findings highlight the effectiveness of Logistic Regression and XGBoost in predicting IBS diagnosis.

Keywords: comorbidities, disease association, irritable bowel syndrome (IBS), predictive analytics

Procedia PDF Downloads 89
82 Real Estate Trend Prediction with Artificial Intelligence Techniques

Authors: Sophia Liang Zhou

Abstract:

For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three-time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy ( > 95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as the most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have a significant influence on the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed the best stable performance overall, with accuracies > 95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.

Keywords: linear regression, random forest, artificial neural network, real estate price prediction

Procedia PDF Downloads 80
81 Undernutrition Among Children Below Five Years of Age in Uganda: A Deep Dive into Space and Time

Authors: Vallence Ngabo Maniragaba

Abstract:

This study aimed at examining the variations of undernutrition among children below 5 years of age in Uganda. The approach of spatial and spatiotemporal analysis helped in identifying cluster patterns, hot spots and emerging hot spots. Data from the 6 Uganda Demographic and Health Surveys spanning from 1990 to 2016 were used with the main outcome variable being undernutrition among children <5 years of age. All data that were relevant to this study were retrieved from the survey datasets and combined with the 214 shape files for the districts of Uganda to enable spatial and spatiotemporal analysis. Spatial maps with the spatial distribution of the prevalence of undernutrition, both in space and time, were generated using ArcGIS Pro version 2.8. Moran’s I, an index of spatial autocorrelation, rules out doubts of spatial randomness in order to identify spatially clustered patterns of hot or cold spot areas. Furthermore, space-time cubes were generated to establish the trend in undernutrition as well as to mirror its variations over time and across Uganda. Moreover, emerging hot spot analysis was done to help identify the patterns of undernutrition over time. The results indicate a heterogeneous distribution of undernutrition across Uganda and the same variations were also evident over time. Moran’s I index confirmed spatial clustered patterns as opposed to random distributions of undernutrition prevalence. Four hot spot areas, namely; the Karamoja, the Sebei, the West Nile and the Toro regions were significantly evident, most of the central parts of Uganda were identified as cold spot clusters, while most of Western Uganda, the Acholi and the Lango regions had no statistically significant spatial patterns by the year 2016. The spatio-temporal analysis identified the Karamoja and Sebei regions as clusters of persistent, consecutive and intensifying hot spots, West Nile region was identified as a sporadic hot spot area while the Toro region was identified with both sporadic and emerging hotspots. In conclusion, undernutrition is a silent pandemic that needs to be handled with both hands. At 31.2 percent, the prevalence is still very high and unpleasant. The distribution across the country is nonuniform with some areas such as the Karamoja, the West Nile, the Sebei and the Toro regions being epicenters of undernutrition in Uganda. Over time, the same areas have experienced and exhibited high undernutrition prevalence. Policymakers, as well as the implementers, should bear in mind the spatial variations across the country and prioritize hot spot areas in order to have efficient, timely and region-specific interventions.

Keywords: undernutrition, spatial autocorrelation, hotspots analysis, geographically weighted regressions, emerging hotspots analysis, under-fives, Uganda

Procedia PDF Downloads 55
80 AS-Geo: Arbitrary-Sized Image Geolocalization with Learnable Geometric Enhancement Resizer

Authors: Huayuan Lu, Chunfang Yang, Ma Zhu, Baojun Qi, Yaqiong Qiao, Jiangqian Xu

Abstract:

Image geolocalization has great application prospects in fields such as autonomous driving and virtual/augmented reality. In practical application scenarios, the size of the image to be located is not fixed; it is impractical to train different networks for all possible sizes. When its size does not match the size of the input of the descriptor extraction model, existing image geolocalization methods usually directly scale or crop the image in some common ways. This will result in the loss of some information important to the geolocalization task, thus affecting the performance of the image geolocalization method. For example, excessive down-sampling can lead to blurred building contour, and inappropriate cropping can lead to the loss of key semantic elements, resulting in incorrect geolocation results. To address this problem, this paper designs a learnable image resizer and proposes an arbitrary-sized image geolocation method. (1) The designed learnable image resizer employs the self-attention mechanism to enhance the geometric features of the resized image. Firstly, it applies bilinear interpolation to the input image and its feature maps to obtain the initial resized image and the resized feature maps. Then, SKNet (selective kernel net) is used to approximate the best receptive field, thus keeping the geometric shapes as the original image. And SENet (squeeze and extraction net) is used to automatically select the feature maps with strong contour information, enhancing the geometric features. Finally, the enhanced geometric features are fused with the initial resized image, to obtain the final resized images. (2) The proposed image geolocalization method embeds the above image resizer as a fronting layer of the descriptor extraction network. It not only enables the network to be compatible with arbitrary-sized input images but also enhances the geometric features that are crucial to the image geolocalization task. Moreover, the triplet attention mechanism is added after the first convolutional layer of the backbone network to optimize the utilization of geometric elements extracted by the first convolutional layer. Finally, the local features extracted by the backbone network are aggregated to form image descriptors for image geolocalization. The proposed method was evaluated on several mainstream datasets, such as Pittsburgh30K, Tokyo24/7, and Places365. The results show that the proposed method has excellent size compatibility and compares favorably to recently mainstream geolocalization methods.

Keywords: image geolocalization, self-attention mechanism, image resizer, geometric feature

Procedia PDF Downloads 182
79 AI Applications in Accounting: Transforming Finance with Technology

Authors: Alireza Karimi

Abstract:

Artificial Intelligence (AI) is reshaping various industries, and accounting is no exception. With the ability to process vast amounts of data quickly and accurately, AI is revolutionizing how financial professionals manage, analyze, and report financial information. In this article, we will explore the diverse applications of AI in accounting and its profound impact on the field. Automation of Repetitive Tasks: One of the most significant contributions of AI in accounting is automating repetitive tasks. AI-powered software can handle data entry, invoice processing, and reconciliation with minimal human intervention. This not only saves time but also reduces the risk of errors, leading to more accurate financial records. Pattern Recognition and Anomaly Detection: AI algorithms excel at pattern recognition. In accounting, this capability is leveraged to identify unusual patterns in financial data that might indicate fraud or errors. AI can swiftly detect discrepancies, enabling auditors and accountants to focus on resolving issues rather than hunting for them. Real-Time Financial Insights: AI-driven tools, using natural language processing and computer vision, can process documents faster than ever. This enables organizations to have real-time insights into their financial status, empowering decision-makers with up-to-date information for strategic planning. Fraud Detection and Prevention: AI is a powerful tool in the fight against financial fraud. It can analyze vast transaction datasets, flagging suspicious activities and reducing the likelihood of financial misconduct going unnoticed. This proactive approach safeguards a company's financial integrity. Enhanced Data Analysis and Forecasting: Machine learning, a subset of AI, is used for data analysis and forecasting. By examining historical financial data, AI models can provide forecasts and insights, aiding businesses in making informed financial decisions and optimizing their financial strategies. Artificial Intelligence is fundamentally transforming the accounting profession. From automating mundane tasks to enhancing data analysis and fraud detection, AI is making financial processes more efficient, accurate, and insightful. As AI continues to evolve, its role in accounting will only become more significant, offering accountants and finance professionals powerful tools to navigate the complexities of modern finance. Embracing AI in accounting is not just a trend; it's a necessity for staying competitive in the evolving financial landscape.

Keywords: artificial intelligence, accounting automation, financial analysis, fraud detection, machine learning in finance

Procedia PDF Downloads 39
78 The Use of Empirical Models to Estimate Soil Erosion in Arid Ecosystems and the Importance of Native Vegetation

Authors: Meshal M. Abdullah, Rusty A. Feagin, Layla Musawi

Abstract:

When humans mismanage arid landscapes, soil erosion can become a primary mechanism that leads to desertification. This study focuses on applying soil erosion models to a disturbed landscape in Umm Nigga, Kuwait, and identifying its predicted change under restoration plans, The northern portion of Umm Nigga, containing both coastal and desert ecosystems, falls within the boundaries of the Demilitarized Zone (DMZ) adjacent to Iraq, and has been fenced off to restrict public access since 1994. The central objective of this project was to utilize GIS and remote sensing to compare the MPSIAC (Modified Pacific South West Inter Agency Committee), EMP (Erosion Potential Method), and USLE (Universal Soil Loss Equation) soil erosion models and determine their applicability for arid regions such as Kuwait. Spatial analysis was used to develop the necessary datasets for factors such as soil characteristics, vegetation cover, runoff, climate, and topography. Results showed that the MPSIAC and EMP models produced a similar spatial distribution of erosion, though the MPSIAC had more variability. For the MPSIAC model, approximately 45% of the land surface ranged from moderate to high soil loss, while 35% ranged from moderate to high for the EMP model. The USLE model had contrasting results and a different spatial distribution of the soil loss, with 25% of area ranging from moderate to high erosion, and 75% ranging from low to very low. We concluded that MPSIAC and EMP were the most suitable models for arid regions in general, with the MPSIAC model best. We then applied the MPSIAC model to identify the amount of soil loss between coastal and desert areas, and fenced and unfenced sites. In the desert area, soil loss was different between fenced and unfenced sites. In these desert fenced sites, 88% of the surface was covered with vegetation and soil loss was very low, while at the desert unfenced sites it was 3% and correspondingly higher. In the coastal areas, the amount of soil loss was nearly similar between fenced and unfenced sites. These results implied that vegetation cover played an important role in reducing soil erosion, and that fencing is much more important in the desert ecosystems to protect against overgrazing. When applying the MPSIAC model predictively, we found that vegetation cover could be increased from 3% to 37% in unfenced areas, and soil erosion could then decrease by 39%. We conclude that the MPSIAC model is best to predict soil erosion for arid regions such as Kuwait.

Keywords: soil erosion, GIS, modified pacific South west inter agency committee model (MPSIAC), erosion potential method (EMP), Universal soil loss equation (USLE)

Procedia PDF Downloads 273
77 Colored Image Classification Using Quantum Convolutional Neural Networks Approach

Authors: Farina Riaz, Shahab Abdulla, Srinjoy Ganguly, Hajime Suzuki, Ravinesh C. Deo, Susan Hopkins

Abstract:

Recently, quantum machine learning has received significant attention. For various types of data, including text and images, numerous quantum machine learning (QML) models have been created and are being tested. Images are exceedingly complex data components that demand more processing power. Despite being mature, classical machine learning still has difficulties with big data applications. Furthermore, quantum technology has revolutionized how machine learning is thought of, by employing quantum features to address optimization issues. Since quantum hardware is currently extremely noisy, it is not practicable to run machine learning algorithms on it without risking the production of inaccurate results. To discover the advantages of quantum versus classical approaches, this research has concentrated on colored image data. Deep learning classification models are currently being created on Quantum platforms, but they are still in a very early stage. Black and white benchmark image datasets like MNIST and Fashion MINIST have been used in recent research. MNIST and CIFAR-10 were compared for binary classification, but the comparison showed that MNIST performed more accurately than colored CIFAR-10. This research will evaluate the performance of the QML algorithm on the colored benchmark dataset CIFAR-10 to advance QML's real-time applicability. However, deep learning classification models have not been developed to compare colored images like Quantum Convolutional Neural Network (QCNN) to determine how much it is better to classical. Only a few models, such as quantum variational circuits, take colored images. The methodology adopted in this research is a hybrid approach by using penny lane as a simulator. To process the 10 classes of CIFAR-10, the image data has been translated into grey scale and the 28 × 28-pixel image containing 10,000 test and 50,000 training images were used. The objective of this work is to determine how much the quantum approach can outperform a classical approach for a comprehensive dataset of color images. After pre-processing 50,000 images from a classical computer, the QCNN model adopted a hybrid method and encoded the images into a quantum simulator for feature extraction using quantum gate rotations. The measurements were carried out on the classical computer after the rotations were applied. According to the results, we note that the QCNN approach is ~12% more effective than the traditional classical CNN approaches and it is possible that applying data augmentation may increase the accuracy. This study has demonstrated that quantum machine and deep learning models can be relatively superior to the classical machine learning approaches in terms of their processing speed and accuracy when used to perform classification on colored classes.

Keywords: CIFAR-10, quantum convolutional neural networks, quantum deep learning, quantum machine learning

Procedia PDF Downloads 95
76 The Association of Work Stress with Job Satisfaction and Occupational Burnout in Nurse Anesthetists

Authors: I. Ling Tsai, Shu Fen Wu, Chen-Fuh Lam, Chia Yu Chen, Shu Jiuan Chen, Yen Lin Liu

Abstract:

Purpose: Following the conduction of the National Health Insurance (NHI) system in Taiwan since 1995, the demand for anesthesia services continues to increase in the operating rooms and other medical units. It has been well recognized that increased work stress not only affects the clinical performance of the medical staff, long-term work load may also result in occupational burnout. Our study aimed to determine the influence of working environment, work stress and job satisfaction on the occupational burnout in nurse anesthetists. The ultimate goal of this research project is to develop a strategy in establishing a friendly, less stressful workplace for the nurse anesthetists to enhance their job satisfaction, thereby reducing occupational burnout and increasing the career life for nurse anesthetists. Methods: This was a cross-sectional, descriptive study performed in a metropolitan teaching hospital in southern Taiwan between May 2017 to July 2017. A structured self-administered questionnaire, modified from the Practice Environment Scale of the Nursing Work Index (PES-NWI), Occupational Stress Indicator 2 (OSI-2) and Maslach Burnout Inventory (MBI) manual was collected from the nurse anesthetists. The relationships between two numeric datasets were analyzed by the Pearson correlation test (SPSS 20.0). Results: A total of 66 completed questionnaires were collected from 75 nurses (response rate 88%). The average scores for the working environment, job satisfaction, and work stress were 69.6%, 61.5%, and 63.9%, respectively. The three perspectives used to assess the occupational burnout, namely emotional exhaustion, depersonalization and sense of personal accomplishment were 26.3, 13.0 and 24.5, suggesting the presence of moderate to high degrees of burnout in our nurse anesthetists. The presence of occupational burnout was closely correlated with the unsatisfactory working environment (r=-0.385, P=0.001) and reduced job satisfaction (r=-0.430, P=0.000). Junior nurse anesthetists (<1-year clinical experience) reported having higher satisfaction in working environment than the seniors (5 to 10-year clinical experience) (P=0.02). Although the average scores for work stress, job satisfaction, and occupational burnout were lower in junior nurses, the differences were not statistically different. The linear regression model, the working environment was the independent factor that predicted occupational burnout in nurse anesthetists up to 19.8%. Conclusions: High occupational burnout is more likely to develop in senior nurse anesthetists who experienced the dissatisfied working environment, work stress and lower job satisfaction. In addition to the regulation of clinical duties, the increased workload in the supervision of the junior nurse anesthetists may result in emotional stress and burnout in senior nurse anesthetists. Therefore, appropriate adjustment of clinical and teaching loading in the senior nurse anesthetists could be helpful to improve the occupational burnout and enhance the retention rate.

Keywords: nurse anesthetists, working environment, work stress, job satisfaction, occupational burnout

Procedia PDF Downloads 257
75 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 209
74 Improving Predictions of Coastal Benthic Invertebrate Occurrence and Density Using a Multi-Scalar Approach

Authors: Stephanie Watson, Fabrice Stephenson, Conrad Pilditch, Carolyn Lundquist

Abstract:

Spatial data detailing both the distribution and density of functionally important marine species are needed to inform management decisions. Species distribution models (SDMs) have proven helpful in this regard; however, models often focus only on species occurrences derived from spatially expansive datasets and lack the resolution and detail required to inform regional management decisions. Boosted regression trees (BRT) were used to produce high-resolution SDMs (250 m) at two spatial scales predicting probability of occurrence, abundance (count per sample unit), density (count per km2) and uncertainty for seven coastal seafloor taxa that vary in habitat usage and distribution to examine prediction differences and implications for coastal management. We investigated if small scale regionally focussed models (82,000 km2) can provide improved predictions compared to data-rich national scale models (4.2 million km2). We explored the variability in predictions across model type (occurrence vs abundance) and model scale to determine if specific taxa models or model types are more robust to geographical variability. National scale occurrence models correlated well with broad-scale environmental predictors, resulting in higher AUC (Area under the receiver operating curve) and deviance explained scores; however, they tended to overpredict in the coastal environment and lacked spatially differentiated detail for some taxa. Regional models had lower overall performance, but for some taxa, spatial predictions were more differentiated at a localised ecological scale. National density models were often spatially refined and highlighted areas of ecological relevance producing more useful outputs than regional-scale models. The utility of a two-scale approach aids the selection of the most optimal combination of models to create a spatially informative density model, as results contrasted for specific taxa between model type and scale. However, it is vital that robust predictions of occurrence and abundance are generated as inputs for the combined density model as areas that do not spatially align between models can be discarded. This study demonstrates the variability in SDM outputs created over different geographical scales and highlights implications and opportunities for managers utilising these tools for regional conservation, particularly in data-limited environments.

Keywords: Benthic ecology, spatial modelling, multi-scalar modelling, marine conservation.

Procedia PDF Downloads 42
73 A Novel Chicken W Chromosome Specific Tandem Repeat

Authors: Alsu F. Saifitdinova, Alexey S. Komissarov, Svetlana A. Galkina, Elena I. Koshel, Maria M. Kulak, Stephen J. O'Brien, Elena R. Gaginskaya

Abstract:

The mystery of sex determination is one of the most ancient and still not solved until the end so far. In many species, sex determination is genetic and often accompanied by the presence of dimorphic sex chromosomes in the karyotype. Genomic sequencing gave the information about the gene content of sex chromosomes which allowed to reveal their origin from ordinary autosomes and to trace their evolutionary history. Female-specific W chromosome in birds as well as mammalian male-specific Y chromosome is characterized by the degeneration of gene content and the accumulation of repetitive DNA. Tandem repeats complicate the analysis of genomic data. Despite the best efforts chicken W chromosome assembly includes only 1.2 Mb from expected 55 Mb. Supplementing the information on the sex chromosome composition not only helps to complete the assembly of genomes but also moves us in the direction of understanding of the sex-determination systems evolution. A whole-genome survey to the assembly Gallus_gallus WASHUC 2.60 was applied for repeats search in assembled genome and performed search and assembly of high copy number repeats in unassembled reads of SRR867748 short reads datasets. For cytogenetic analysis conventional methods of fluorescent in situ hybridization was used for previously cloned W specific satellites and specifically designed directly labeled synthetic oligonucleotide DNA probe was used for bioinformatically identified repetitive sequence. Hybridization was performed with mitotic chicken chromosomes and manually isolated giant meiotic lampbrush chromosomes from growing oocytes. A novel chicken W specific satellite (GGAAA)n which is not co-localizes with any previously described classes of W specific repeats was identified and mapped with high resolution. In the composition of autosomes this repeat units was found as a part of upstream regions of gonad specific protein coding sequences. These findings may contribute to the understanding of the role of tandem repeats in sex specific differentiation regulation in birds and sex chromosome evolution. This work was supported by the postdoctoral fellowships from St. Petersburg State University (#1.50.1623.2013 and #1.50.1043.2014), the grant for Leading Scientific Schools (#3553.2014.4) and the grant from Russian foundation for basic researches (#15-04-05684). The equipment and software of Research Resource Center “Chromas” and Theodosius Dobzhansky Center for Genome Bioinformatics of Saint Petersburg State University were used.

Keywords: birds, lampbrush chromosomes, sex chromosomes, tandem repeats

Procedia PDF Downloads 363
72 The Usage of Negative Emotive Words in Twitter

Authors: Martina Katalin Szabó, István Üveges

Abstract:

In this paper, the usage of negative emotive words is examined on the basis of a large Hungarian twitter-database via NLP methods. The data is analysed from a gender point of view, as well as changes in language usage over time. The term negative emotive word refers to those words that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g. rohadt jó ’damn good’) or a sentiment expression with positive polarity despite their negative prior polarity (e.g. brutális, ahogy ez a férfi rajzol ’it’s awesome (lit. brutal) how this guy draws’. Based on the findings of several authors, the same phenomenon can be found in other languages, so it is probably a language-independent feature. For the recent analysis, 67783 tweets were collected: 37818 tweets (19580 tweets written by females and 18238 tweets written by males) in 2016 and 48344 (18379 tweets written by females and 29965 tweets written by males) in 2021. The goal of the research was to make up two datasets comparable from the viewpoint of semantic changes, as well as from gender specificities. An exhaustive lexicon of Hungarian negative emotive intensifiers was also compiled (containing 214 words). After basic preprocessing steps, tweets were processed by ‘magyarlanc’, a toolkit is written in JAVA for the linguistic processing of Hungarian texts. Then, the frequency and collocation features of all these words in our corpus were automatically analyzed (via the analysis of parts-of-speech and sentiment values of the co-occurring words). Finally, the results of all four subcorpora were compared. Here some of the main outcomes of our analyses are provided: There are almost four times fewer cases in the male corpus compared to the female corpus when the negative emotive intensifier modified a negative polarity word in the tweet (e.g., damn bad). At the same time, male authors used these intensifiers more frequently, modifying a positive polarity or a neutral word (e.g., damn good and damn big). Results also pointed out that, in contrast to female authors, male authors used these words much more frequently as a positive polarity word as well (e.g., brutális, ahogy ez a férfi rajzol ’it’s awesome (lit. brutal) how this guy draws’). We also observed that male authors use significantly fewer types of emotive intensifiers than female authors, and the frequency proportion of the words is more balanced in the female corpus. As for changes in language usage over time, some notable differences in the frequency and collocation features of the words examined were identified: some of the words collocate with more positive words in the 2nd subcorpora than in the 1st, which points to the semantic change of these words over time.

Keywords: gender differences, negative emotive words, semantic changes over time, twitter

Procedia PDF Downloads 179
71 Bivariate Analyses of Factors That May Influence HIV Testing among Women Living in the Democratic Republic of the Congo

Authors: Danielle A. Walker, Kyle L. Johnson, Patrick J. Fox, Jacen S. Moore

Abstract:

The HIV Continuum of Care has become a universal model to provide context for the process of HIV testing, linkage to care, treatment, and viral suppression. HIV testing is the first step in moving toward community viral suppression. Countries with a lower socioeconomic status experience the lowest rates of testing and access to care. The Democratic Republic of the Congo is located in the heart of sub-Saharan Africa, where testing and access to care are low and women experience higher HIV prevalence compared to men. In the Democratic Republic of the Congo there is only a 21.6% HIV testing rate among women. Because a critical gap exists between a woman’s risk of contracting HIV and the decision to be tested, this study was conducted to obtain a better understanding of the relationship between factors that could influence HIV testing among women. The datasets analyzed were from the 2013-14 Democratic Republic of the Congo Demographic and Health Survey Program. The data was subset for women with an age range of 18-49 years. All missing cases were removed and one variable was recoded. The total sample size analyzed was 14,982 women. The results showed that there did not seem to be a difference in HIV testing by mean age. Out of 11 religious categories (Catholic, Protestant, Armee de salut, Kimbanguiste, Other Christians, Muslim, Bundu dia kongo, Vuvamu, Animist, no religion, and other), those who identified as Other Christians had the highest testing rate of 25.9% and those identified as Vuvamu had a 0% testing rate (p<0.001). There was a significant difference in testing by religion. Only 0.7% of women surveyed identified as having no religious affiliation. This suggests partnerships with key community and religious leaders could be a tool to increase testing. Over 60% of women who had never been tested for HIV did not know where to be tested. This highlights the need to educate communities on where testing facilities can be located. Almost 80% of women who believed HIV could be transmitted by supernatural means and/or witchcraft had never been tested before (p=0.08). Cultural beliefs could influence risk perception and testing decisions. Consequently, misconceptions need to be considered when implementing HIV testing and prevention programs. Location by province, years of education, and wealth index were also analyzed to control for socioeconomic status. Kinshasa had the highest testing rate of 54.2% of women living there, and both Equateur and Kasai-Occidental had less than a 10% testing rate (p<0.001). As the education level increased up to 12 years, testing increased (p<0.001). Women within the highest quintile of the wealth index had a 56.1% testing rate, and women within the lowest quintile had a 6.5% testing rate (p<0.001). This study concludes that further research is needed to identify culturally competent methods to increase HIV education programs, build partnerships with key community leaders, and improve knowledge on access to care.

Keywords: Democratic Republic of the Congo, cultural beliefs, education, HIV testing

Procedia PDF Downloads 266
70 A Prospective Study of a Clinically Significant Anatomical Change in Head and Neck Intensity-Modulated Radiation Therapy Using Transit Electronic Portal Imaging Device Images

Authors: Wilai Masanga, Chirapha Tannanonta, Sangutid Thongsawad, Sasikarn Chamchod, Todsaporn Fuangrod

Abstract:

The major factors of radiotherapy for head and neck (HN) cancers include patient’s anatomical changes and tumour shrinkage. These changes can significantly affect the planned dose distribution that causes the treatment plan deterioration. A measured transit EPID images compared to a predicted EPID images using gamma analysis has been clinically implemented to verify the dose accuracy as part of adaptive radiotherapy protocol. However, a global gamma analysis dose not sensitive to some critical organ changes as the entire treatment field is compared. The objective of this feasibility study is to evaluate the dosimetric response to patient anatomical changes during the treatment course in HN IMRT (Head and Neck Intensity-Modulated Radiation Therapy) using a novel comparison method; organ-of-interest gamma analysis. This method provides more sensitive to specific organ change detection. Random replanned 5 HN IMRT patients with causes of tumour shrinkage and patient weight loss that critically affect to the parotid size changes were selected and evaluated its transit dosimetry. A comprehensive physics-based model was used to generate a series of predicted transit EPID images for each gantry angle from original computed tomography (CT) and replan CT datasets. The patient structures; including left and right parotid, spinal cord, and planning target volume (PTV56) were projected to EPID level. The agreement between the transit images generated from original CT and replanned CT was quantified using gamma analysis with 3%, 3mm criteria. Moreover, only gamma pass-rate is calculated within each projected structure. The gamma pass-rate in right parotid and PTV56 between predicted transit of original CT and replan CT were 42.8%( ± 17.2%) and 54.7%( ± 21.5%). The gamma pass-rate for other projected organs were greater than 80%. Additionally, the results of organ-of-interest gamma analysis were compared with 3-dimensional cone-beam computed tomography (3D-CBCT) and the rational of replan by radiation oncologists. It showed that using only registration of 3D-CBCT to original CT does not provide the dosimetric impact of anatomical changes. Using transit EPID images with organ-of-interest gamma analysis can provide additional information for treatment plan suitability assessment.

Keywords: re-plan, anatomical change, transit electronic portal imaging device, EPID, head, and neck

Procedia PDF Downloads 198
69 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 101
68 Detailed Quantum Circuit Design and Evaluation of Grover's Algorithm for the Bounded Degree Traveling Salesman Problem Using the Q# Language

Authors: Wenjun Hou, Marek Perkowski

Abstract:

The Traveling Salesman problem is famous in computing and graph theory. In short, it asks for the Hamiltonian cycle of the least total weight in a given graph with N nodes. All variations on this problem, such as those with K-bounded-degree nodes, are classified as NP-complete in classical computing. Although several papers propose theoretical high-level designs of quantum algorithms for the Traveling Salesman Problem, no quantum circuit implementation of these algorithms has been created up to our best knowledge. In contrast to previous papers, the goal of this paper is not to optimize some abstract complexity measures based on the number of oracle iterations, but to be able to evaluate the real circuit and time costs of the quantum computer. Using the emerging quantum programming language Q# developed by Microsoft, which runs quantum circuits in a quantum computer simulation, an implementation of the bounded-degree problem and its respective quantum circuit were created. To apply Grover’s algorithm to this problem, a quantum oracle was designed, evaluating the cost of a particular set of edges in the graph as well as its validity as a Hamiltonian cycle. Repeating the Grover algorithm with an oracle that finds successively lower cost each time allows to transform the decision problem to an optimization problem, finding the minimum cost of Hamiltonian cycles. N log₂ K qubits are put into an equiprobablistic superposition by applying the Hadamard gate on each qubit. Within these N log₂ K qubits, the method uses an encoding in which every node is mapped to a set of its encoded edges. The oracle consists of several blocks of circuits: a custom-written edge weight adder, node index calculator, uniqueness checker, and comparator, which were all created using only quantum Toffoli gates, including its special forms, which are Feynman and Pauli X. The oracle begins by using the edge encodings specified by the qubits to calculate each node that this path visits and adding up the edge weights along the way. Next, the oracle uses the calculated nodes from the previous step and check that all the nodes are unique. Finally, the oracle checks that the calculated cost is less than the previously-calculated cost. By performing the oracle an optimal number of times, a correct answer can be generated with very high probability. The oracle of the Grover Algorithm is modified using the recalculated minimum cost value, and this procedure is repeated until the cost cannot be further reduced. This algorithm and circuit design have been verified, using several datasets, to generate correct outputs.

Keywords: quantum computing, quantum circuit optimization, quantum algorithms, hybrid quantum algorithms, quantum programming, Grover’s algorithm, traveling salesman problem, bounded-degree TSP, minimal cost, Q# language

Procedia PDF Downloads 151
67 Household Climate-Resilience Index Development for the Health Sector in Tanzania: Use of Demographic and Health Surveys Data Linked with Remote Sensing

Authors: Heribert R. Kaijage, Samuel N. A. Codjoe, Simon H. D. Mamuya, Mangi J. Ezekiel

Abstract:

There is strong evidence that climate has changed significantly affecting various sectors including public health. The recommended feasible solution is adopting development trajectories which combine both mitigation and adaptation measures for improving resilience pathways. This approach demands a consideration for complex interactions between climate and social-ecological systems. While other sectors such as agriculture and water have developed climate resilience indices, the public health sector in Tanzania is still lagging behind. The aim of this study was to find out how can we use Demographic and Health Surveys (DHS) linked with Remote Sensing (RS) technology and metrological information as tools to inform climate change resilient development and evaluation for the health sector. Methodological review was conducted whereby a number of studies were content analyzed to find appropriate indicators and indices for climate resilience household and their integration approach. These indicators were critically reviewed, listed, filtered and their sources determined. Preliminary identification and ranking of indicators were conducted using participatory approach of pairwise weighting by selected national stakeholders from meeting/conferences on human health and climate change sciences in Tanzania. DHS datasets were retrieved from Measure Evaluation project, processed and critically analyzed for possible climate change indicators. Other sources for indicators of climate change exposure were also identified. For the purpose of preliminary reporting, operationalization of selected indicators was discussed to produce methodological approach to be used in resilience comparative analysis study. It was found that household climate resilient index depends on the combination of three indices namely Household Adaptive and Mitigation Capacity (HC), Household Health Sensitivity (HHS) and Household Exposure Status (HES). It was also found that, DHS alone cannot complement resilient evaluation unless integrated with other data sources notably flooding data as a measure of vulnerability, remote sensing image of Normalized Vegetation Index (NDVI) and Metrological data (deviation from rainfall pattern). It can be concluded that if these indices retrieved from DHS data sets are computed and scientifically integrated can produce single climate resilience index and resilience maps could be generated at different spatial and time scales to enhance targeted interventions for climate resilient development and evaluations. However, further studies are need to test for the sensitivity of index in resilience comparative analysis among selected regions.

Keywords: climate change, resilience, remote sensing, demographic and health surveys

Procedia PDF Downloads 134
66 Federated Knowledge Distillation with Collaborative Model Compression for Privacy-Preserving Distributed Learning

Authors: Shayan Mohajer Hamidi

Abstract:

Federated learning has emerged as a promising approach for distributed model training while preserving data privacy. However, the challenges of communication overhead, limited network resources, and slow convergence hinder its widespread adoption. On the other hand, knowledge distillation has shown great potential in compressing large models into smaller ones without significant loss in performance. In this paper, we propose an innovative framework that combines federated learning and knowledge distillation to address these challenges and enhance the efficiency of distributed learning. Our approach, called Federated Knowledge Distillation (FKD), enables multiple clients in a federated learning setting to collaboratively distill knowledge from a teacher model. By leveraging the collaborative nature of federated learning, FKD aims to improve model compression while maintaining privacy. The proposed framework utilizes a coded teacher model that acts as a reference for distilling knowledge to the client models. To demonstrate the effectiveness of FKD, we conduct extensive experiments on various datasets and models. We compare FKD with baseline federated learning methods and standalone knowledge distillation techniques. The results show that FKD achieves superior model compression, faster convergence, and improved performance compared to traditional federated learning approaches. Furthermore, FKD effectively preserves privacy by ensuring that sensitive data remains on the client devices and only distilled knowledge is shared during the training process. In our experiments, we explore different knowledge transfer methods within the FKD framework, including Fine-Tuning (FT), FitNet, Correlation Congruence (CC), Similarity-Preserving (SP), and Relational Knowledge Distillation (RKD). We analyze the impact of these methods on model compression and convergence speed, shedding light on the trade-offs between size reduction and performance. Moreover, we address the challenges of communication efficiency and network resource utilization in federated learning by leveraging the knowledge distillation process. FKD reduces the amount of data transmitted across the network, minimizing communication overhead and improving resource utilization. This makes FKD particularly suitable for resource-constrained environments such as edge computing and IoT devices. The proposed FKD framework opens up new avenues for collaborative and privacy-preserving distributed learning. By combining the strengths of federated learning and knowledge distillation, it offers an efficient solution for model compression and convergence speed enhancement. Future research can explore further extensions and optimizations of FKD, as well as its applications in domains such as healthcare, finance, and smart cities, where privacy and distributed learning are of paramount importance.

Keywords: federated learning, knowledge distillation, knowledge transfer, deep learning

Procedia PDF Downloads 44
65 Exploring Male and Female Consumers’ Perceptions of Clothing Retailers’ CSR Initiatives in South Africa

Authors: Gerhard D. Muller, Nadine C. Sonnenberg, Suné Donoghue

Abstract:

This study delves into the intricacies of male and female consumers’ perceptions of Corporate Social Responsibility (CSR) in the South African clothing retail sector, a sector experiencing increasing consumption, yet facing significant environmental and social challenges. The aim is to discern between male and female consumers’ perceptions of clothing retailers’ CSR initiatives based on the Triple Bottom Line (TBL) framework, which evaluates organizational sustainability across social, environmental, and economic domains. Methodologically, the study is embedded in a quantitative research paradigm adopting a cross-sectional survey design. A purposive sampling strategy was used to recruit male and female respondents from a diverse South African demographic background. A structured questionnaire was developed and included established consumer CSR perception scales that were adapted for the purposes of this study. The questionnaire was distributed via online platforms. The data collected from the online survey, were split by gender to allow for comparison between male and female consumers’ perceptions of clothing retailers’ CSR initiatives. Exploratory Factor Analysis (EFA) was conducted on each of the datasets. The EFA for females revealed a five-factor solution, whereas the male EFA presented a six-factor solution, with the notable addition of an Economic Performance dimension. Results indicate subtle differences in the gender groups’ CSR perceptions. While both genders seem to value clothing retailers’ focus on quality services, females seem to have more pronounced perceptions surrounding clothing retailers’ contributions to social and environmental causes. Males, on the other hand, seem to be more discerning in their perceptions surrounding clothing retailers’ support of social and environmental causes. Ethical stakeholder relationships emerged as a shared concern across genders. Still, males presented a distinct factor, Economic Performance, highlighting a gendered divergence in the weighting of economic success and financial performance in CSR evaluation. The implications of these results are multifaceted. Theoretically, the study enriches the discourse on CSR by integrating gender insights into the TBL framework, offering a greater understanding of consumers’ CSR perceptions in the South African clothing retail context. Practically, it provides actionable insights for clothing retailers, suggesting that CSR initiatives should be gender-sensitive and communicate the TBL's elements effectively to resonate with the pertinent concerns of each segment. Additionally, the findings advocate for a contextualized approach to CSR in emerging markets that aligns with local cultural and social differences.

Keywords: consumer perceptions, corporate Social responsibility, gender differentiation, triple bottom line

Procedia PDF Downloads 34
64 Development of an EEG-Based Real-Time Emotion Recognition System on Edge AI

Authors: James Rigor Camacho, Wansu Lim

Abstract:

Over the last few years, the development of new wearable and processing technologies has accelerated in order to harness physiological data such as electroencephalograms (EEGs) for EEG-based applications. EEG has been demonstrated to be a source of emotion recognition signals with the highest classification accuracy among physiological signals. However, when emotion recognition systems are used for real-time classification, the training unit is frequently left to run offline or in the cloud rather than working locally on the edge. That strategy has hampered research, and the full potential of using an edge AI device has yet to be realized. Edge AI devices are computers with high performance that can process complex algorithms. It is capable of collecting, processing, and storing data on its own. It can also analyze and apply complicated algorithms like localization, detection, and recognition on a real-time application, making it a powerful embedded device. The NVIDIA Jetson series, specifically the Jetson Nano device, was used in the implementation. The cEEGrid, which is integrated to the open-source brain computer-interface platform (OpenBCI), is used to collect EEG signals. An EEG-based real-time emotion recognition system on Edge AI is proposed in this paper. To perform graphical spectrogram categorization of EEG signals and to predict emotional states based on input data properties, machine learning-based classifiers were used. Until the emotional state was identified, the EEG signals were analyzed using the K-Nearest Neighbor (KNN) technique, which is a supervised learning system. In EEG signal processing, after each EEG signal has been received in real-time and translated from time to frequency domain, the Fast Fourier Transform (FFT) technique is utilized to observe the frequency bands in each EEG signal. To appropriately show the variance of each EEG frequency band, power density, standard deviation, and mean are calculated and employed. The next stage is to identify the features that have been chosen to predict emotion in EEG data using the K-Nearest Neighbors (KNN) technique. Arousal and valence datasets are used to train the parameters defined by the KNN technique.Because classification and recognition of specific classes, as well as emotion prediction, are conducted both online and locally on the edge, the KNN technique increased the performance of the emotion recognition system on the NVIDIA Jetson Nano. Finally, this implementation aims to bridge the research gap on cost-effective and efficient real-time emotion recognition using a resource constrained hardware device, like the NVIDIA Jetson Nano. On the cutting edge of AI, EEG-based emotion identification can be employed in applications that can rapidly expand the research and implementation industry's use.

Keywords: edge AI device, EEG, emotion recognition system, supervised learning algorithm, sensors

Procedia PDF Downloads 81
63 Automated Computer-Vision Analysis Pipeline of Calcium Imaging Neuronal Network Activity Data

Authors: David Oluigbo, Erik Hemberg, Nathan Shwatal, Wenqi Ding, Yin Yuan, Susanna Mierau

Abstract:

Introduction: Calcium imaging is an established technique in neuroscience research for detecting activity in neural networks. Bursts of action potentials in neurons lead to transient increases in intracellular calcium visualized with fluorescent indicators. Manual identification of cell bodies and their contours by experts typically takes 10-20 minutes per calcium imaging recording. Our aim, therefore, was to design an automated pipeline to facilitate and optimize calcium imaging data analysis. Our pipeline aims to accelerate cell body and contour identification and production of graphical representations reflecting changes in neuronal calcium-based fluorescence. Methods: We created a Python-based pipeline that uses OpenCV (a computer vision Python package) to accurately (1) detect neuron contours, (2) extract the mean fluorescence within the contour, and (3) identify transient changes in the fluorescence due to neuronal activity. The pipeline consisted of 3 Python scripts that could both be easily accessed through a Python Jupyter notebook. In total, we tested this pipeline on ten separate calcium imaging datasets from murine dissociate cortical cultures. We next compared our automated pipeline outputs with the outputs of manually labeled data for neuronal cell location and corresponding fluorescent times series generated by an expert neuroscientist. Results: Our results show that our automated pipeline efficiently pinpoints neuronal cell body location and neuronal contours and provides a graphical representation of neural network metrics accurately reflecting changes in neuronal calcium-based fluorescence. The pipeline detected the shape, area, and location of most neuronal cell body contours by using binary thresholding and grayscale image conversion to allow computer vision to better distinguish between cells and non-cells. Its results were also comparable to manually analyzed results but with significantly reduced result acquisition times of 2-5 minutes per recording versus 10-20 minutes per recording. Based on these findings, our next step is to precisely measure the specificity and sensitivity of the automated pipeline’s cell body and contour detection to extract more robust neural network metrics and dynamics. Conclusion: Our Python-based pipeline performed automated computer vision-based analysis of calcium image recordings from neuronal cell bodies in neuronal cell cultures. Our new goal is to improve cell body and contour detection to produce more robust, accurate neural network metrics and dynamic graphs.

Keywords: calcium imaging, computer vision, neural activity, neural networks

Procedia PDF Downloads 60
62 Estimating Evapotranspiration Irrigated Maize in Brazil Using a Hybrid Modelling Approach and Satellite Image Inputs

Authors: Ivo Zution Goncalves, Christopher M. U. Neale, Hiran Medeiros, Everardo Mantovani, Natalia Souza

Abstract:

Multispectral and thermal infrared imagery from satellite sensors coupled with climate and soil datasets were used to estimate evapotranspiration and biomass in center pivots planted to maize in Brazil during the 2016 season. The hybrid remote sensing based model named Spatial EvapoTranspiration Modelling Interface (SETMI) was applied using multispectral and thermal infrared imagery from the Landsat Thematic Mapper instrument. Field data collected by the IRRIGER center pivot management company included daily weather information such as maximum and minimum temperature, precipitation, relative humidity for estimating reference evapotranspiration. In addition, soil water content data were obtained every 0.20 m in the soil profile down to 0.60 m depth throughout the season. Early season soil samples were used to obtain water-holding capacity, wilting point, saturated hydraulic conductivity, initial volumetric soil water content, layer thickness, and saturated volumetric water content. Crop canopy development parameters and irrigation application depths were also inputs of the model. The modeling approach is based on the reflectance-based crop coefficient approach contained within the SETMI hybrid ET model using relationships developed in Nebraska. The model was applied to several fields located in Minas Gerais State in Brazil with approximate latitude: -16.630434 and longitude: -47.192876. The model provides estimates of real crop evapotranspiration (ET), crop irrigation requirements and all soil water balance outputs, including biomass estimation using multi-temporal satellite image inputs. An interpolation scheme based on the growing degree-day concept was used to model the periods between satellite inputs, filling the gaps between image dates and obtaining daily data. Actual and accumulated ET, accumulated cold temperature and water stress and crop water requirements estimated by the model were compared with data measured at the experimental fields. Results indicate that the SETMI modeling approach using data assimilation, showed reliable daily ET and crop water requirements for maize, interpolated between remote sensing observations, confirming the applicability of the SETMI model using new relationships developed in Nebraska for estimating mainly ET and water requirements in Brazil under tropical conditions.

Keywords: basal crop coefficient, irrigation, remote sensing, SETMI

Procedia PDF Downloads 120
61 Comprehensive, Up-to-Date Climate System Change Indicators, Trends and Interactions

Authors: Peter Carter

Abstract:

Comprehensive climate change indicators and trends inform the state of the climate (system) with respect to present and future climate change scenarios and the urgency of mitigation and adaptation. With data records now going back for many decades, indicator trends can complement model projections. They are provided as datasets by several climate monitoring centers, reviewed by state of the climate reports, and documented by the IPCC assessments. Up-to-date indicators are provided here. Rates of change are instructive, as are extremes. The indicators include greenhouse gas (GHG) emissions (natural and synthetic), cumulative CO2 emissions, atmospheric GHG concentrations (including CO2 equivalent), stratospheric ozone, surface ozone, radiative forcing, global average temperature increase, land temperature increase, zonal temperature increases, carbon sinks, soil moisture, sea surface temperature, ocean heat content, ocean acidification, ocean oxygen, glacier mass, Arctic temperature, Arctic sea ice (extent and volume), northern hemisphere snow cover, permafrost indices, Arctic GHG emissions, ice sheet mass, sea level rise, and stratospheric and surface ozone. Global warming is not the most reliable single metric for the climate state. Radiative forcing, atmospheric CO2 equivalent, and ocean heat content are more reliable. Global warming does not provide future commitment, whereas atmospheric CO2 equivalent does. Cumulative carbon is used for estimating carbon budgets. The forcing of aerosols is briefly addressed. Indicator interactions are included. In particular, indicators can provide insight into several crucial global warming amplifying feedback loops, which are explained. All indicators are increasing (adversely), most as fast as ever and some faster. One particularly pressing indicator is rapidly increasing global atmospheric methane. In this respect, methane emissions and sources are covered in more detail. In their application, indicators used in assessing safe planetary boundaries are included. Indicators are considered with respect to recent published papers on possible catastrophic climate change and climate system tipping thresholds. They are climate-change-policy relevant. In particular, relevant policies include the 2015 Paris Agreement on “holding the increase in the global average temperature to well below 2°C above pre-industrial levels and pursuing efforts to limit the temperature increase to 1.5°C above pre-industrial levels” and the 1992 UN Framework Convention on Climate change, which has “stabilization of greenhouse gas concentrations in the atmosphere at a level that would prevent dangerous anthropogenic interference with the climate system.”

Keywords: climate change, climate change indicators, climate change trends, climate system change interactions

Procedia PDF Downloads 80
60 Soybean Seed Composition Prediction From Standing Crops Using Planet Scope Satellite Imagery and Machine Learning

Authors: Supria Sarkar, Vasit Sagan, Sourav Bhadra, Meghnath Pokharel, Felix B.Fritschi

Abstract:

Soybean and their derivatives are very important agricultural commodities around the world because of their wide applicability in human food, animal feed, biofuel, and industries. However, the significance of soybean production depends on the quality of the soybean seeds rather than the yield alone. Seed composition is widely dependent on plant physiological properties, aerobic and anaerobic environmental conditions, nutrient content, and plant phenological characteristics, which can be captured by high temporal resolution remote sensing datasets. Planet scope (PS) satellite images have high potential in sequential information of crop growth due to their frequent revisit throughout the world. In this study, we estimate soybean seed composition while the plants are in the field by utilizing PlanetScope (PS) satellite images and different machine learning algorithms. Several experimental fields were established with varying genotypes and different seed compositions were measured from the samples as ground truth data. The PS images were processed to extract 462 hand-crafted vegetative and textural features. Four machine learning algorithms, i.e., partial least squares (PLSR), random forest (RFR), gradient boosting machine (GBM), support vector machine (SVM), and two recurrent neural network architectures, i.e., long short-term memory (LSTM) and gated recurrent unit (GRU) were used in this study to predict oil, protein, sucrose, ash, starch, and fiber of soybean seed samples. The GRU and LSTM architectures had two separate branches, one for vegetative features and the other for textures features, which were later concatenated together to predict seed composition. The results show that sucrose, ash, protein, and oil yielded comparable prediction results. Machine learning algorithms that best predicted the six seed composition traits differed. GRU worked well for oil (R-Squared: of 0.53) and protein (R-Squared: 0.36), whereas SVR and PLSR showed the best result for sucrose (R-Squared: 0.74) and ash (R-Squared: 0.60), respectively. Although, the RFR and GBM provided comparable performance, the models tended to extremely overfit. Among the features, vegetative features were found as the most important variables compared to texture features. It is suggested to utilize many vegetation indices for machine learning training and select the best ones by using feature selection methods. Overall, the study reveals the feasibility and efficiency of PS images and machine learning for plot-level seed composition estimation. However, special care should be given while designing the plot size in the experiments to avoid mixed pixel issues.

Keywords: agriculture, computer vision, data science, geospatial technology

Procedia PDF Downloads 105
59 A Quantitative Analysis of Rural to Urban Migration in Morocco

Authors: Donald Wright

Abstract:

The ultimate goal of this study is to reinvigorate the philosophical underpinnings the study of urbanization with scientific data with the goal of circumventing what seems an inevitable future clash between rural and urban populations. To that end urban infrastructure must be sustainable economically, politically and ecologically over the course of several generations as cities continue to grow with the incorporation of climate refugees. Our research will provide data concerning the projected increase in population over the coming two decades in Morocco, and the population will shift from rural areas to urban centers during that period of time. As a result, urban infrastructure will need to be adapted, developed or built to fit the demand of future internal migrations from rural to urban centers in Morocco. This paper will also examine how past experiences of internally displaced people give insight into the challenges faced by future migrants and, beyond the gathering of data, how people react to internal migration. This study employs four different sets of research tools. First, a large part of this study is archival, which involves compiling the relevant literature on the topic and its complex history. This step also includes gathering data bout migrations in Morocco from public data sources. Once the datasets are collected, the next part of the project involves populating the attribute fields and preprocessing the data to make it understandable and usable by machine learning algorithms. In tandem with the mathematical interpretation of data and projected migrations, this study benefits from a theoretical understanding of the critical apparatus existing around urban development of the 20th and 21st centuries that give us insight into past infrastructure development and the rationale behind it. Once the data is ready to be analyzed, different machine learning algorithms will be experimented (k-clustering, support vector regression, random forest analysis) and the results compared for visualization of the data. The final computational part of this study involves analyzing the data and determining what we can learn from it. This paper helps us to understand future trends of population movements within and between regions of North Africa, which will have an impact on various sectors such as urban development, food distribution and water purification, not to mention the creation of public policy in the countries of this region. One of the strengths of this project is the multi-pronged and cross-disciplinary methodology to the research question, which enables an interchange of knowledge and experiences to facilitate innovative solutions to this complex problem. Multiple and diverse intersecting viewpoints allow an exchange of methodological models that provide fresh and informed interpretations of otherwise objective data.

Keywords: climate change, machine learning, migration, Morocco, urban development

Procedia PDF Downloads 116
58 A Comparison of Methods for Estimating Dichotomous Treatment Effects: A Simulation Study

Authors: Jacqueline Y. Thompson, Sam Watson, Lee Middleton, Karla Hemming

Abstract:

Introduction: The odds ratio (estimated via logistic regression) is a well-established and common approach for estimating covariate-adjusted binary treatment effects when comparing a treatment and control group with dichotomous outcomes. Its popularity is primarily because of its stability and robustness to model misspecification. However, the situation is different for the relative risk and risk difference, which are arguably easier to interpret and better suited to specific designs such as non-inferiority studies. So far, there is no equivalent, widely acceptable approach to estimate an adjusted relative risk and risk difference when conducting clinical trials. This is partly due to the lack of a comprehensive evaluation of available candidate methods. Methods/Approach: A simulation study is designed to evaluate the performance of relevant candidate methods to estimate relative risks to represent conditional and marginal estimation approaches. We consider the log-binomial, generalised linear models (GLM) with iteratively weighted least-squares (IWLS) and model-based standard errors (SE); log-binomial GLM with convex optimisation and model-based SEs; log-binomial GLM with convex optimisation and permutation tests; modified-Poisson GLM IWLS and robust SEs; log-binomial generalised estimation equations (GEE) and robust SEs; marginal standardisation and delta method SEs; and marginal standardisation and permutation test SEs. Independent and identically distributed datasets are simulated from a randomised controlled trial to evaluate these candidate methods. Simulations are replicated 10000 times for each scenario across all possible combinations of sample sizes (200, 1000, and 5000), outcomes (10%, 50%, and 80%), and covariates (ranging from -0.05 to 0.7) representing weak, moderate or strong relationships. Treatment effects (ranging from 0, -0.5, 1; on the log-scale) will consider null (H0) and alternative (H1) hypotheses to evaluate coverage and power in realistic scenarios. Performance measures (bias, mean square error (MSE), relative efficiency, and convergence rates) are evaluated across scenarios covering a range of sample sizes, event rates, covariate prognostic strength, and model misspecifications. Potential Results, Relevance & Impact: There are several methods for estimating unadjusted and adjusted relative risks. However, it is unclear which method(s) is the most efficient, preserves type-I error rate, is robust to model misspecification, or is the most powerful when adjusting for non-prognostic and prognostic covariates. GEE estimations may be biased when the outcome distributions are not from marginal binary data. Also, it seems that marginal standardisation and convex optimisation may perform better than GLM IWLS log-binomial.

Keywords: binary outcomes, statistical methods, clinical trials, simulation study

Procedia PDF Downloads 83
57 Using Computer Vision and Machine Learning to Improve Facility Design for Healthcare Facility Worker Safety

Authors: Hengameh Hosseini

Abstract:

Design of large healthcare facilities – such as hospitals, multi-service line clinics, and nursing facilities - that can accommodate patients with wide-ranging disabilities is a challenging endeavor and one that is poorly understood among healthcare facility managers, administrators, and executives. An even less-understood extension of this problem is the implications of weakly or insufficiently accommodative design of facilities for healthcare workers in physically-intensive jobs who may also suffer from a range of disabilities and who are therefore at increased risk of workplace accident and injury. Combine this reality with the vast range of facility types, ages, and designs, and the problem of universal accommodation becomes even more daunting and complex. In this study, we focus on the implication of facility design for healthcare workers suffering with low vision who also have physically active jobs. The points of difficulty are myriad and could span health service infrastructure, the equipment used in health facilities, and transport to and from appointments and other services can all pose a barrier to health care if they are inaccessible, less accessible, or even simply less comfortable for people with various disabilities. We conduct a series of surveys and interviews with employees and administrators of 7 facilities of a range of sizes and ownership models in the Northeastern United States and combine that corpus with in-facility observations and data collection to identify five major points of failure common to all the facilities that we concluded could pose safety threats to employees with vision impairments, ranging from very minor to severe. We determine that lack of design empathy is a major commonality among facility management and ownership. We subsequently propose three methods for remedying this lack of empathy-informed design, to remedy the dangers posed to employees: the use of an existing open-sourced Augmented Reality application to simulate the low-vision experience for designers and managers; the use of a machine learning model we develop to automatically infer facility shortcomings from large datasets of recorded patient and employee reviews and feedback; and the use of a computer vision model fine tuned on images of each facility to infer and predict facility features, locations, and workflows, that could again pose meaningful dangers to visually impaired employees of each facility. After conducting a series of real-world comparative experiments with each of these approaches, we conclude that each of these are viable solutions under particular sets of conditions, and finally characterize the range of facility types, workforce composition profiles, and work conditions under which each of these methods would be most apt and successful.

Keywords: artificial intelligence, healthcare workers, facility design, disability, visually impaired, workplace safety

Procedia PDF Downloads 74