Search results for: Imbalanced dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1148

Search results for: Imbalanced dataset

248 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of Big Data Technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centers or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through Vader and Roberta model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and TFIDF Vectorization, and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: counter vectorization, convolutional neural network, crawler, data technology, long short-term memory, web scraping, sentiment analysis

Procedia PDF Downloads 72
247 Classifying Affective States in Virtual Reality Environments Using Physiological Signals

Authors: Apostolos Kalatzis, Ashish Teotia, Vishnunarayan Girishan Prabhu, Laura Stanley

Abstract:

Emotions are functional behaviors influenced by thoughts, stimuli, and other factors that induce neurophysiological changes in the human body. Understanding and classifying emotions are challenging as individuals have varying perceptions of their environments. Therefore, it is crucial that there are publicly available databases and virtual reality (VR) based environments that have been scientifically validated for assessing emotional classification. This study utilized two commercially available VR applications (Guided Meditation VR™ and Richie’s Plank Experience™) to induce acute stress and calm state among participants. Subjective and objective measures were collected to create a validated multimodal dataset and classification scheme for affective state classification. Participants’ subjective measures included the use of the Self-Assessment Manikin, emotional cards and 9 point Visual Analogue Scale for perceived stress, collected using a Virtual Reality Assessment Tool developed by our team. Participants’ objective measures included Electrocardiogram and Respiration data that were collected from 25 participants (15 M, 10 F, Mean = 22.28  4.92). The features extracted from these data included heart rate variability components and respiration rate, both of which were used to train two machine learning models. Subjective responses validated the efficacy of the VR applications in eliciting the two desired affective states; for classifying the affective states, a logistic regression (LR) and a support vector machine (SVM) with a linear kernel algorithm were developed. The LR outperformed the SVM and achieved 93.8%, 96.2%, 93.8% leave one subject out cross-validation accuracy, precision and recall, respectively. The VR assessment tool and data collected in this study are publicly available for other researchers.

Keywords: affective computing, biosignals, machine learning, stress database

Procedia PDF Downloads 126
246 Qualitative and Quantitative Methods in Multidisciplinary Fields Collection Development

Authors: Hui Wang

Abstract:

Traditional collection building approaches are limited in breadth and scope and are not necessarily suitable for multidisciplinary fields development in the institutes of the Chinese Academy of Sciences. The increasing of multidisciplinary fields researches require a viable approach to collection development in these libraries. This study uses qualitative and quantitative analysis to assess collection. The quantitative analysis consists of three levels of evaluation, which including realistic demand, potential demand and trend demand analysis. For one institute, three samples were separately selected from the object institute, more than one international top institutes in highly relative research fields and future research hotspots. Each sample contains an appropriate number of papers published in recent five years. Several keywords and the organization names were reasonably combined to search in commercial databases and the institutional repositories. The publishing information and citations in the bibliographies of these papers were selected to build the dataset. One weighted evaluation model and citation analysis were used to calculate the demand intensity index of every journal and book. Principal Investigator selector and database traffic provide a qualitative evidence to describe the demand frequency. The demand intensity, demand frequency and academic committee recommendations were comprehensively considered to recommend collection development. The collection gaps or weaknesses were ascertained by comparing the current collection and the recommend collection. This approach was applied in more than 80 institutes’ libraries in Chinese Academy of Sciences in the past three years. The evaluation results provided an important evidence for collections building in the second year. The latest user survey results showed that the updated collection’s capacity to support research in a multidisciplinary subject area have increased significantly.

Keywords: citation analysis, collection assessment, collection development, quantitative analysis

Procedia PDF Downloads 194
245 Automatic Lexicon Generation for Domain Specific Dataset for Mining Public Opinion on China Pakistan Economic Corridor

Authors: Tayyaba Azim, Bibi Amina

Abstract:

The increase in the popularity of opinion mining with the rapid growth in the availability of social networks has attracted a lot of opportunities for research in the various domains of Sentiment Analysis and Natural Language Processing (NLP) using Artificial Intelligence approaches. The latest trend allows the public to actively use the internet for analyzing an individual’s opinion and explore the effectiveness of published facts. The main theme of this research is to account the public opinion on the most crucial and extensively discussed development projects, China Pakistan Economic Corridor (CPEC), considered as a game changer due to its promise of bringing economic prosperity to the region. So far, to the best of our knowledge, the theme of CPEC has not been analyzed for sentiment determination through the ML approach. This research aims to demonstrate the use of ML approaches to spontaneously analyze the public sentiment on Twitter tweets particularly about CPEC. Support Vector Machine SVM is used for classification task classifying tweets into positive, negative and neutral classes. Word2vec and TF-IDF features are used with the SVM model, a comparison of the trained model on manually labelled tweets and automatically generated lexicon is performed. The contributions of this work are: Development of a sentiment analysis system for public tweets on CPEC subject, construction of an automatic generation of the lexicon of public tweets on CPEC, different themes are identified among tweets and sentiments are assigned to each theme. It is worth noting that the applications of web mining that empower e-democracy by improving political transparency and public participation in decision making via social media have not been explored and practised in Pakistan region on CPEC yet.

Keywords: machine learning, natural language processing, sentiment analysis, support vector machine, Word2vec

Procedia PDF Downloads 130
244 In Silico Exploration of Quinazoline Derivatives as EGFR Inhibitors for Lung Cancer: A Multi-Modal Approach Integrating QSAR-3D, ADMET, Molecular Docking, and Molecular Dynamics Analyses

Authors: Mohamed Moussaoui

Abstract:

A series of thirty-one potential inhibitors targeting the epidermal growth factor receptor kinase (EGFR), derived from quinazoline, underwent 3D-QSAR analysis using CoMFA and CoMSIA methodologies. The training and test sets of quinazoline derivatives were utilized to construct and validate the QSAR models, respectively, with dataset alignment performed using the lowest energy conformer of the most active compound. The best-performing CoMFA and CoMSIA models demonstrated impressive determination coefficients, with R² values of 0.981 and 0.978, respectively, and Leave One Out cross-validation determination coefficients, Q², of 0.645 and 0.729, respectively. Furthermore, external validation using a test set of five compounds yielded predicted determination coefficients, R² test, of 0.929 and 0.909 for CoMFA and CoMSIA, respectively. Building upon these promising results, eighteen new compounds were designed and assessed for drug likeness and ADMET properties through in silico methods. Additionally, molecular docking studies were conducted to elucidate the binding interactions between the selected compounds and the enzyme. Detailed molecular dynamics simulations were performed to analyze the stability, conformational changes, and binding interactions of the quinazoline derivatives with the EGFR kinase. These simulations provided deeper insights into the dynamic behavior of the compounds within the active site. This comprehensive analysis enhances the understanding of quinazoline derivatives as potential anti-cancer agents and provides valuable insights for lead optimization in the early stages of drug discovery, particularly for developing highly potent anticancer therapeutics

Keywords: 3D-QSAR, CoMFA, CoMSIA, ADMET, molecular docking, quinazoline, molecular dynamic, egfr inhibitors, lung cancer, anticancer

Procedia PDF Downloads 25
243 The Impact of Model Specification Decisions on the Teacher ValuE-added Effectiveness: Choosing the Correct Predictors

Authors: Ismail Aslantas

Abstract:

Value-Added Models (VAMs), the statistical methods for evaluating the effectiveness of teachers and schools based on student achievement growth, has attracted decision-makers’ and researchers’ attention over the last decades. As a result of this attention, many studies have conducted in recent years to discuss these statistical models from different aspects. This research focused on the importance of conceptual variables in VAM estimations; therefor, this research was undertaken to examine the extent to which value-added effectiveness estimates for teachers can be affected by using context predictions. Using longitudinal data over three years from the international school context, value-added teacher effectiveness was estimated by ordinary least-square value-added models, and the effectiveness of the teachers was examined. The longitudinal dataset in this study consisted of three major sources: students’ attainment scores up to three years and their characteristics, teacher background information, and school characteristics. A total of 1,027 teachers and their 35,355 students who were in eighth grade were examined for understanding the impact of model specifications on the value-added teacher effectiveness evaluation. Models were created using selection methods that adding a predictor on each step, then removing it and adding another one on a subsequent step and evaluating changes in model fit was checked by reviewing changes in R² values. Cohen’s effect size statistics were also employed in order to find out the degree of the relationship between teacher characteristics and their effectiveness. Overall, the results indicated that prior attainment score is the most powerful predictor of the current attainment score. 47.1 percent of the variation in grade 8 math score can be explained by the prior attainment score in grade 7. The research findings raise issues to be considered in VAM implementations for teacher evaluations and make suggestions to researchers and practitioners.

Keywords: model specification, teacher effectiveness, teacher performance evaluation, value-added model

Procedia PDF Downloads 111
242 Formal Institutions and Women's Electoral Participation in Four European Countries

Authors: Sophia Francesca D. Lu

Abstract:

This research tried to produce evidence that formal institutions, such as electoral and internal party quotas, can advance women’s active roles in the public sphere using the cases of four European countries: Belgium, Germany, Italy, and the Netherlands. The quantitative dataset was provided by the University of Chicago and the Inter-University Consortium of Political and Social Research based on a two-year study (2008-2010) of political parties. Belgium engages in constitutionally mandated electoral quotas. Germany, Italy and the Netherlands, on the other hand, have internal party quotas, which are voluntarily adopted by political parties. In analyzing each country’s chi-square and Pearson’s r correlation, Belgium, having an electoral quota, is the only country that was analyzed for electoral quotas. Germany, Italy and the Netherlands’ internal voluntary party quotas were correlated with women’s descriptive representations. Using chi-square analysis, this study showed that the presence of electoral quotas is correlated with an increase in the percentage of women in decision-making bodies as well as with an increase in the percentage of women in decision-making bodies. Likewise, using correlational analysis, a higher number of political parties employing internal party voluntary quotas is correlated with an increase in the percentage of women occupying seats in parliament as well as an increase in the percentage of women nominees in electoral lists of political parties. In conclusion, gender quotas, such as electoral quotas or internal party quotas, are an effective policy tool for greater women’s representation in political bodies. Political parties and governments should opt to have gender quotas, whether electoral or internal party quotas, to address the underrepresentation of women in parliament, decision-making bodies, and policy-formulation.

Keywords: electoral quota, Europe, formal institutions, institutional feminism, internal party quota, women’s electoral participation

Procedia PDF Downloads 407
241 Biostimulant and Abiotic Plant Stress Interactions in Malting Barley: A Glasshouse Study

Authors: Conor Blunt, Mariluz del Pino-de Elias, Grace Cott, Saoirse Tracy, Rainer Melzer

Abstract:

The European Green Deal announced in 2021 details agricultural chemical pesticide use and synthetic fertilizer application to be reduced by 50% and 20% by 2030. Increasing and maintaining expected yields under these ambitious goals has strained the agricultural sector. This intergovernmental plan has identified plant biostimulants as one potential input to facilitate this new phase of sustainable agriculture; these products are defined as microorganisms or substances that can stimulate soil and plant functioning to enhance crop nutrient use efficiency, quality and tolerance to abiotic stresses. Spring barley is Ireland’s most widely sown tillage crop, and grain destined for malting commands the most significant market price. Heavy erratic rainfall is forecasted in Ireland’s climate future, and barley is particularly susceptible to waterlogging. Recent findings suggest that plant receptivity to biostimulants may depend on the level of stress inflicted on crops to elicit an assisted plant response. In this study, three biostimulants of different genesis (seaweed, protein hydrolysate and bacteria) are applied to ‘RGT Planet’ malting barley fertilized at three different rates (0 kg/ha, 40 kg/ha, 75 kg/ha) of calcium ammonium nitrogen (27% N) under non-stressed and waterlogged conditions. This 4x3x2 factorial trial design was planted in a completed randomized block with one plant per experimental unit. Leaf gas exchange data and key agronomic and grain quality parameters were analyzed via ANOVA. No penalty on productivity was evident on plants receiving 40 kg/ha of N and bio stimulant compared to 75 kg/ha of N treatments. The main effects of nitrogen application and waterlogging provided the most significant variation in the dataset.

Keywords: biostimulant, Barley, malting, NUE, waterlogging

Procedia PDF Downloads 60
240 Graph Clustering Unveiled: ClusterSyn - A Machine Learning Framework for Predicting Anti-Cancer Drug Synergy Scores

Authors: Babak Bahri, Fatemeh Yassaee Meybodi, Changiz Eslahchi

Abstract:

In the pursuit of effective cancer therapies, the exploration of combinatorial drug regimens is crucial to leverage synergistic interactions between drugs, thereby improving treatment efficacy and overcoming drug resistance. However, identifying synergistic drug pairs poses challenges due to the vast combinatorial space and limitations of experimental approaches. This study introduces ClusterSyn, a machine learning (ML)-powered framework for classifying anti-cancer drug synergy scores. ClusterSyn employs a two-step approach involving drug clustering and synergy score prediction using a fully connected deep neural network. For each cell line in the training dataset, a drug graph is constructed, with nodes representing drugs and edge weights denoting synergy scores between drug pairs. Drugs are clustered using the Markov clustering (MCL) algorithm, and vectors representing the similarity of drug pairs to each cluster are input into the deep neural network for synergy score prediction (synergy or antagonism). Clustering results demonstrate effective grouping of drugs based on synergy scores, aligning similar synergy profiles. Subsequently, neural network predictions and synergy scores of the two drugs on others within their clusters are used to predict the synergy score of the considered drug pair. This approach facilitates comparative analysis with clustering and regression-based methods, revealing the superior performance of ClusterSyn over state-of-the-art methods like DeepSynergy and DeepDDS on diverse datasets such as Oniel and Almanac. The results highlight the remarkable potential of ClusterSyn as a versatile tool for predicting anti-cancer drug synergy scores.

Keywords: drug synergy, clustering, prediction, machine learning., deep learning

Procedia PDF Downloads 57
239 Selection of Optimal Reduced Feature Sets of Brain Signal Analysis Using Heuristically Optimized Deep Autoencoder

Authors: Souvik Phadikar, Nidul Sinha, Rajdeep Ghosh

Abstract:

In brainwaves research using electroencephalogram (EEG) signals, finding the most relevant and effective feature set for identification of activities in the human brain is a big challenge till today because of the random nature of the signals. The feature extraction method is a key issue to solve this problem. Finding those features that prove to give distinctive pictures for different activities and similar for the same activities is very difficult, especially for the number of activities. The performance of a classifier accuracy depends on this quality of feature set. Further, more number of features result in high computational complexity and less number of features compromise with the lower performance. In this paper, a novel idea of the selection of optimal feature set using a heuristically optimized deep autoencoder is presented. Using various feature extraction methods, a vast number of features are extracted from the EEG signals and fed to the autoencoder deep neural network. The autoencoder encodes the input features into a small set of codes. To avoid the gradient vanish problem and normalization of the dataset, a meta-heuristic search algorithm is used to minimize the mean square error (MSE) between encoder input and decoder output. To reduce the feature set into a smaller one, 4 hidden layers are considered in the autoencoder network; hence it is called Heuristically Optimized Deep Autoencoder (HO-DAE). In this method, no features are rejected; all the features are combined into the response of responses of the hidden layer. The results reveal that higher accuracy can be achieved using optimal reduced features. The proposed HO-DAE is also compared with the regular autoencoder to test the performance of both. The performance of the proposed method is validated and compared with the other two methods recently reported in the literature, which reveals that the proposed method is far better than the other two methods in terms of classification accuracy.

Keywords: autoencoder, brainwave signal analysis, electroencephalogram, feature extraction, feature selection, optimization

Procedia PDF Downloads 100
238 D-Wave Quantum Computing Ising Model: A Case Study for Forecasting of Heat Waves

Authors: Dmytro Zubov, Francesco Volponi

Abstract:

In this paper, D-Wave quantum computing Ising model is used for the forecasting of positive extremes of daily mean air temperature. Forecast models are designed with two to five qubits, which represent 2-, 3-, 4-, and 5-day historical data respectively. Ising model’s real-valued weights and dimensionless coefficients are calculated using daily mean air temperatures from 119 places around the world, as well as sea level (Aburatsu, Japan). In comparison with current methods, this approach is better suited to predict heat wave values because it does not require the estimation of a probability distribution from scarce observations. Proposed forecast quantum computing algorithm is simulated based on traditional computer architecture and combinatorial optimization of Ising model parameters for the Ronald Reagan Washington National Airport dataset with 1-day lead-time on learning sample (1975-2010 yr). Analysis of the forecast accuracy (ratio of successful predictions to total number of predictions) on the validation sample (2011-2014 yr) shows that Ising model with three qubits has 100 % accuracy, which is quite significant as compared to other methods. However, number of identified heat waves is small (only one out of nineteen in this case). Other models with 2, 4, and 5 qubits have 20 %, 3.8 %, and 3.8 % accuracy respectively. Presented three-qubit forecast model is applied for prediction of heat waves at other five locations: Aurel Vlaicu, Romania – accuracy is 28.6 %; Bratislava, Slovakia – accuracy is 21.7 %; Brussels, Belgium – accuracy is 33.3 %; Sofia, Bulgaria – accuracy is 50 %; Akhisar, Turkey – accuracy is 21.4 %. These predictions are not ideal, but not zeros. They can be used independently or together with other predictions generated by different method(s). The loss of human life, as well as environmental, economic, and material damage, from extreme air temperatures could be reduced if some of heat waves are predicted. Even a small success rate implies a large socio-economic benefit.

Keywords: heat wave, D-wave, forecast, Ising model, quantum computing

Procedia PDF Downloads 482
237 A Proposed Optimized and Efficient Intrusion Detection System for Wireless Sensor Network

Authors: Abdulaziz Alsadhan, Naveed Khan

Abstract:

In recent years intrusions on computer network are the major security threat. Hence, it is important to impede such intrusions. The hindrance of such intrusions entirely relies on its detection, which is primary concern of any security tool like Intrusion Detection System (IDS). Therefore, it is imperative to accurately detect network attack. Numerous intrusion detection techniques are available but the main issue is their performance. The performance of IDS can be improved by increasing the accurate detection rate and reducing false positive. The existing intrusion detection techniques have the limitation of usage of raw data set for classification. The classifier may get jumble due to redundancy, which results incorrect classification. To minimize this problem, Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Local Binary Pattern (LBP) can be applied to transform raw features into principle features space and select the features based on their sensitivity. Eigen values can be used to determine the sensitivity. To further classify, the selected features greedy search, back elimination, and Particle Swarm Optimization (PSO) can be used to obtain a subset of features with optimal sensitivity and highest discriminatory power. These optimal feature subset used to perform classification. For classification purpose, Support Vector Machine (SVM) and Multilayer Perceptron (MLP) used due to its proven ability in classification. The Knowledge Discovery and Data mining (KDD’99) cup dataset was considered as a benchmark for evaluating security detection mechanisms. The proposed approach can provide an optimal intrusion detection mechanism that outperforms the existing approaches and has the capability to minimize the number of features and maximize the detection rates.

Keywords: Particle Swarm Optimization (PSO), Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Local Binary Pattern (LBP), Support Vector Machine (SVM), Multilayer Perceptron (MLP)

Procedia PDF Downloads 349
236 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 100
235 The Relationship between Representational Conflicts, Generalization, and Encoding Requirements in an Instance Memory Network

Authors: Mathew Wakefield, Matthew Mitchell, Lisa Wise, Christopher McCarthy

Abstract:

The properties of memory representations in artificial neural networks have cognitive implications. Distributed representations that encode instances as a pattern of activity across layers of nodes afford memory compression and enforce the selection of a single point in instance space. These encoding schemes also appear to distort the representational space, as well as trading off the ability to validate that input information is within the bounds of past experience. In contrast, a localist representation which encodes some meaningful information into individual nodes in a network layer affords less memory compression while retaining the integrity of the representational space. This allows the validity of an input to be determined. The validity (or familiarity) of input along with the capacity of localist representation for multiple instance selections affords a memory sampling approach that dynamically balances the bias-variance trade-off. When the input is familiar, bias may be high by referring only to the most similar instances in memory. When the input is less familiar, variance can be increased by referring to more instances that capture a broader range of features. Using this approach in a localist instance memory network, an experiment demonstrates a relationship between representational conflict, generalization performance, and memorization demand. Relatively small sampling ranges produce the best performance on a classic machine learning dataset of visual objects. Combining memory validity with conflict detection produces a reliable confidence judgement that can separate responses with high and low error rates. Confidence can also be used to signal the need for supervisory input. Using this judgement, the need for supervised learning as well as memory encoding can be substantially reduced with only a trivial detriment to classification performance.

Keywords: artificial neural networks, representation, memory, conflict monitoring, confidence

Procedia PDF Downloads 111
234 Factors Promoting French-English Tweets in France

Authors: Taoues Hadour

Abstract:

Twitter has become a popular means of communication used in a variety of fields, such as politics, journalism, and academia. This widely used online platform has an impact on the way people express themselves and is changing language usage worldwide at an unprecedented pace. The language used online reflects the linguistic battle that has been going on for several decades in French society. This study enables a deeper understanding of users' linguistic behavior online. The implications are important and allow for a rise in awareness of intercultural and cross-language exchanges. This project investigates the mixing of French-English language usage among French users of Twitter using a topic analysis approach. This analysis draws on Gumperz's theory of conversational switching. In order to collect tweets at a large scale, the data was collected in R using the rtweet package to access and retrieve French tweets data through Twitter’s REST and stream APIs (Application Program Interface) using the software RStudio, the integrated development environment for R. The dataset was filtered manually and certain repetitions of themes were observed. A total of nine topic categories were identified and analyzed in this study: entertainment, internet/social media, events/community, politics/news, sports, sex/pornography, innovation/technology, fashion/make up, and business. The study reveals that entertainment is the most frequent topic discussed on Twitter. Entertainment includes movies, music, games, and books. Anglicisms such as trailer, spoil, and live are identified in the data. Change in language usage is inevitable and is a natural result of linguistic interactions. The use of different languages online is just an example of what the real world would look like without linguistic regulations. Social media reveals a multicultural and multilinguistic richness which can deepen and expand our understanding of contemporary human attitudes.

Keywords: code-switching, French, sociolinguistics, Twitter

Procedia PDF Downloads 117
233 Examining Patterns in Ethnoracial Diversity in Los Angeles County Neighborhoods, 2016, Using Geographic Information System Analysis and Entropy Measure of Diversity

Authors: Joseph F. Cabrera, Rachael Dela Cruz

Abstract:

This study specifically examines patterns that define ethnoracially diverse neighborhoods. Ethnoracial diversity is important as it facilitates cross-racial interactions within neighborhoods which have been theorized to be associated with such outcomes as intergroup harmony, the reduction of racial and ethnic prejudice and discrimination, and increases in racial tolerance. Los Angeles (LA) is an ideal location to study ethnoracial spatial patterns as it is one of the most ethnoracially diverse cities in the world. A large influx of Latinos, as well as Asians, have contributed to LA’s urban landscape becoming increasingly diverse over several decades. Our dataset contains all census tracts in Los Angeles County in 2016 and incorporates Census and ACS demographic and spatial data. We quantify ethnoracial diversity using a derivative of Simpson’s Diversity Index and utilize this measure to test previous literature that suggests Latinos are one of the key drivers of changing ethnoracial spatial patterns in Los Angeles. Preliminary results suggest that there has been an overall increase in ethnoracial diversity in Los Angeles neighborhoods over the past sixteen years. Patterns associated with this trend include decreases in predominantly white and black neighborhoods, increases in predominantly Latino and Asian neighborhoods, and a general decrease in the white populations of the most diverse neighborhoods. A similar pattern is seen in neighborhoods with large Latino increases- a decrease in white population, but with an increase in Asian and black populations. We also found support for previous research that suggests increases in Latino and Asian populations act as a buffer, allowing for black population increases without a sizeable decrease in the white population. Future research is needed to understand the underlying causes involved in many of the patterns and trends highlighted in this study.

Keywords: race, race and interaction, racial harmony, social interaction

Procedia PDF Downloads 115
232 A Novel Mediterranean Diet Index from the Middle East and North Africa Region: Comparison with Europe

Authors: Farah Naja, Nahla Hwalla, Leila Itani, Shirine Baalbaki, Abla Sibai, Lara Nasreddine

Abstract:

Purpose: To propose an index for assessing adherence to a Middle-Eastern version of the Mediterranean diet as represented by the traditional Lebanese Mediterranean diet (LMD), to evaluate the association between the LMD and selected European Mediterranean diets (EMD); to examine socio-demographic and lifestyle correlates of adherence to Mediterranean diet (MD) among Lebanese adults. Methods: Using nationally representative dietary intake data of Lebanese adults, an index to measure adherence to the LMD was derived. The choice of food groups used for calculating the LMD score was based on results of previous factor analyses conducted on the same dataset. These food groups included fruits, vegetables, legumes, olive oil, burghol, dairy products, starchy vegetables, dried fruits, and eggs. Using Pearson’s correlation and scores tertiles distributions agreement, the derived LMD index was compared to previously published EMD indexes from Greece, Spain, Italy, France, and EPIC. Results: Fruits, vegetables and olive oil were common denominators to all MD scores. Food groups, specific to the LMD, included burghol and dried fruits. The LMD score significantly correlated with the EMD scores, while being closest to the Italian (r=0.57) and farthest from the French (r=0.21). Percent agreement between scores’ tertile distributions and Kappa statistics confirmed these findings. Multivariate linear regression showed that older age, higher educational, female gender, and healthy lifestyle characteristics were associated with increased adherence to all MD studied. Conclusion: A novel LMD index was proposed to characterize Mediterranean diet in Lebanon, complementing international efforts to characterize the MD and its association with disease risk.

Keywords: mediterranean diet, adherence, Middle-East, Lebanon, Europe

Procedia PDF Downloads 395
231 Optimizing Perennial Plants Image Classification by Fine-Tuning Deep Neural Networks

Authors: Khairani Binti Supyan, Fatimah Khalid, Mas Rina Mustaffa, Azreen Bin Azman, Amirul Azuani Romle

Abstract:

Perennial plant classification plays a significant role in various agricultural and environmental applications, assisting in plant identification, disease detection, and biodiversity monitoring. Nevertheless, attaining high accuracy in perennial plant image classification remains challenging due to the complex variations in plant appearance, the diverse range of environmental conditions under which images are captured, and the inherent variability in image quality stemming from various factors such as lighting conditions, camera settings, and focus. This paper proposes an adaptation approach to optimize perennial plant image classification by fine-tuning the pre-trained DNNs model. This paper explores the efficacy of fine-tuning prevalent architectures, namely VGG16, ResNet50, and InceptionV3, leveraging transfer learning to tailor the models to the specific characteristics of perennial plant datasets. A subset of the MYLPHerbs dataset consisted of 6 perennial plant species of 13481 images under various environmental conditions that were used in the experiments. Different strategies for fine-tuning, including adjusting learning rates, training set sizes, data augmentation, and architectural modifications, were investigated. The experimental outcomes underscore the effectiveness of fine-tuning deep neural networks for perennial plant image classification, with ResNet50 showcasing the highest accuracy of 99.78%. Despite ResNet50's superior performance, both VGG16 and InceptionV3 achieved commendable accuracy of 99.67% and 99.37%, respectively. The overall outcomes reaffirm the robustness of the fine-tuning approach across different deep neural network architectures, offering insights into strategies for optimizing model performance in the domain of perennial plant image classification.

Keywords: perennial plants, image classification, deep neural networks, fine-tuning, transfer learning, VGG16, ResNet50, InceptionV3

Procedia PDF Downloads 38
230 Identifying Indicative Health Behaviours and Psychosocial Factors Affecting Multi-morbidity Conditions in Ageing Populations: Preliminary Results from the ELSA study of Ageing

Authors: Briony Gray, Glenn Simpson, Hajira Dambha-Miller, Andrew Farmer

Abstract:

Multimorbidity may be strongly affected by a variety of conditions, factors, and variables requiring higher demands on health and social care services, infrastructure, and expenses. Holding one or more conditions increases one’s risk for development of future conditions; with patients over 65 years old at highest risk. Psychosocial factors such as anxiety and depression are rising exponentially globally, which has been amplified by the COVID19 pandemic. These are highly correlated and predict poorer outcomes when held in coexistence and increase the likelihood of comorbid physical health conditions. While possible future reform of social and healthcare systems may help to alleviate some of these mounting pressures, there remains an urgent need to better understand the potential role health behaviours and psychosocial conditions - such as anxiety and depression – may have on aging populations. Using the UK healthcare scene as a lens for analysis, this study uses big data collected in the UK Longitudinal Study of Aging (ELSA) to examine the role of anxiety and depression in ageing populations (65yrs+). Using logistic regression modelling, results identify the 10 most significant variables correlated with both anxiety and depression from data categorised into the areas of health behaviour, psychosocial, socioeconomic, and life satisfaction (each demonstrated through literature review to be of significance). These are compared with wider global research findings with the aim of better understanding the areas in which social and healthcare reform can support multimorbidity interventions, making suggestions for improved patient-centred care. Scope of future research is outlined, which includes analysis of 59 total multimorbidity variables from the ELSA dataset, going beyond anxiety and depression.

Keywords: multimorbidity, health behaviours, patient centred care, psychosocial factors

Procedia PDF Downloads 75
229 Predicting Radioactive Waste Glass Viscosity, Density and Dissolution with Machine Learning

Authors: Joseph Lillington, Tom Gout, Mike Harrison, Ian Farnan

Abstract:

The vitrification of high-level nuclear waste within borosilicate glass and its incorporation within a multi-barrier repository deep underground is widely accepted as the preferred disposal method. However, for this to happen, any safety case will require validation that the initially localized radionuclides will not be considerably released into the near/far-field. Therefore, accurate mechanistic models are necessary to predict glass dissolution, and these should be robust to a variety of incorporated waste species and leaching test conditions, particularly given substantial variations across international waste-streams. Here, machine learning is used to predict glass material properties (viscosity, density) and glass leaching model parameters from large-scale industrial data. A variety of different machine learning algorithms have been compared to assess performance. Density was predicted solely from composition, whereas viscosity additionally considered temperature. To predict suitable glass leaching model parameters, a large simulated dataset was created by coupling MATLAB and the chemical reactive-transport code HYTEC, considering the state-of-the-art GRAAL model (glass reactivity in allowance of the alteration layer). The trained models were then subsequently applied to the large-scale industrial, experimental data to identify potentially appropriate model parameters. Results indicate that ensemble methods can accurately predict viscosity as a function of temperature and composition across all three industrial datasets. Glass density prediction shows reliable learning performance with predictions primarily being within the experimental uncertainty of the test data. Furthermore, machine learning can predict glass dissolution model parameters behavior, demonstrating potential value in GRAAL model development and in assessing suitable model parameters for large-scale industrial glass dissolution data.

Keywords: machine learning, predictive modelling, pattern recognition, radioactive waste glass

Procedia PDF Downloads 99
228 Toward Indoor and Outdoor Surveillance using an Improved Fast Background Subtraction Algorithm

Authors: El Harraj Abdeslam, Raissouni Naoufal

Abstract:

The detection of moving objects from a video image sequences is very important for object tracking, activity recognition, and behavior understanding in video surveillance. The most used approach for moving objects detection / tracking is background subtraction algorithms. Many approaches have been suggested for background subtraction. But, these are illumination change sensitive and the solutions proposed to bypass this problem are time consuming. In this paper, we propose a robust yet computationally efficient background subtraction approach and, mainly, focus on the ability to detect moving objects on dynamic scenes, for possible applications in complex and restricted access areas monitoring, where moving and motionless persons must be reliably detected. It consists of three main phases, establishing illumination changes in variance, background/foreground modeling and morphological analysis for noise removing. We handle illumination changes using Contrast Limited Histogram Equalization (CLAHE), which limits the intensity of each pixel to user determined maximum. Thus, it mitigates the degradation due to scene illumination changes and improves the visibility of the video signal. Initially, the background and foreground images are extracted from the video sequence. Then, the background and foreground images are separately enhanced by applying CLAHE. In order to form multi-modal backgrounds we model each channel of a pixel as a mixture of K Gaussians (K=5) using Gaussian Mixture Model (GMM). Finally, we post process the resulting binary foreground mask using morphological erosion and dilation transformations to remove possible noise. For experimental test, we used a standard dataset to challenge the efficiency and accuracy of the proposed method on a diverse set of dynamic scenes.

Keywords: video surveillance, background subtraction, contrast limited histogram equalization, illumination invariance, object tracking, object detection, behavior understanding, dynamic scenes

Procedia PDF Downloads 239
227 Self-Reported Health Status and Its Consistency: Evidence from India

Authors: Dona Ghosh, Zakir Husain

Abstract:

In India, the increase in share of aged has generated many social and economic issues, of which health concerns is a major challenge that society must confront in coming years. Self-reported health (SRH) is a popular health measure in this regard but has been questioned in recent years due to its heavy dependence on the socioeconomic status. So, the validity of SRH, as a measure of health status during old age, is needed to be verified. This paper emphasizes on the self-reported health and related inconsistent responses among elderly in India. The objective of the study is bifurcated into two parts: firstly, to identify the socioeconomic determinants of subjective health status and its change over time; and secondly, to analyse the role of the socioeconomic components in providing inconsistent responses regarding the health status of elderly. Inconsistency in response can rise in two ways: positive response bias (if an individual has a health problem but reports his/her health as good) and negative response bias (if bad health is reported even if there is no health problem). However, in the present study, we focus only on the negative response bias of elderly individuals. To measure the inconsistencies in responses, self-reported health is compared with two types of physical health conditions – existence of chronicle ailment and physical immobility. Using NSS dataset of 60th and 71st rounds, the study found that subjective health has worsened over time in both rural and urban areas. Findings suggest that inconsistency in responses, related to chronic ailment, vary across social classes, living environments, geographical regions, age groups and education levels. On the contrary, variation in inconsistent responses regarding physical mobility is quite rare and difficult to explain by socioeconomic characteristics because most of the indicators are found to be insignificant in this regard. The findings indicate that in case of chronicle ailment, inconsistency between objective and subjective health status largely depends on socioeconomic conditions but the importance of such factors disappears for physical immobility.

Keywords: India, aging, self-reported health, inconsistent responses

Procedia PDF Downloads 271
226 The Impact of Cryptocurrency Classification on Money Laundering: Analyzing the Preferences of Criminals for Stable Coins, Utility Coins, and Privacy Tokens

Authors: Mohamed Saad, Huda Ismail

Abstract:

The purpose of this research is to examine the impact of cryptocurrency classification on money laundering crimes and to analyze how the preferences of criminals differ according to the type of digital currency used. Specifically, we aim to explore the roles of stablecoins, utility coins, and privacy tokens in facilitating or hindering money laundering activities and to identify the key factors that influence the choices of criminals in using these cryptocurrencies. To achieve our research objectives, we used a dataset for the most highly traded cryptocurrencies (32 currencies) that were published on the coin market cap for 2022. In addition to conducting a comprehensive review of the existing literature on cryptocurrency and money laundering, with a focus on stablecoins, utility coins, and privacy tokens, Furthermore, we conducted several Multivariate analyses. Our study reveals that the classification of cryptocurrency plays a significant role in money laundering activities, as criminals tend to prefer certain types of digital currencies over others, depending on their specific needs and goals. Specifically, we found that stablecoins are more commonly used in money laundering due to their relatively stable value and low volatility, which makes them less risky to hold and transfer. Utility coins, on the other hand, are less frequently used in money laundering due to their lack of anonymity and limited liquidity. Finally, privacy tokens, such as Monero and Zcash, are increasingly becoming a preferred choice among criminals due to their high degree of privacy and untraceability. In summary, our study highlights the importance of understanding the nuances of cryptocurrency classification in the context of money laundering and provides insights into the preferences of criminals in using digital currencies for illegal activities. Based on our findings, our recommendation to the policymakers is to address the potential misuse of cryptocurrencies for money laundering. By implementing measures to regulate stable coins, strengthening cross-border cooperation, fostering public-private partnerships, and increasing cooperation, policymakers can help prevent and detect money laundering activities involving digital currencies.

Keywords: crime, cryptocurrency, money laundering, tokens.

Procedia PDF Downloads 70
225 American Sign Language Recognition System

Authors: Rishabh Nagpal, Riya Uchagaonkar, Venkata Naga Narasimha Ashish Mernedi, Ahmed Hambaba

Abstract:

The rapid evolution of technology in the communication sector continually seeks to bridge the gap between different communities, notably between the deaf community and the hearing world. This project develops a comprehensive American Sign Language (ASL) recognition system, leveraging the advanced capabilities of convolutional neural networks (CNNs) and vision transformers (ViTs) to interpret and translate ASL in real-time. The primary objective of this system is to provide an effective communication tool that enables seamless interaction through accurate sign language interpretation. The architecture of the proposed system integrates dual networks -VGG16 for precise spatial feature extraction and vision transformers for contextual understanding of the sign language gestures. The system processes live input, extracting critical features through these sophisticated neural network models, and combines them to enhance gesture recognition accuracy. This integration facilitates a robust understanding of ASL by capturing detailed nuances and broader gesture dynamics. The system is evaluated through a series of tests that measure its efficiency and accuracy in real-world scenarios. Results indicate a high level of precision in recognizing diverse ASL signs, substantiating the potential of this technology in practical applications. Challenges such as enhancing the system’s ability to operate in varied environmental conditions and further expanding the dataset for training were identified and discussed. Future work will refine the model’s adaptability and incorporate haptic feedback to enhance the interactivity and richness of the user experience. This project demonstrates the feasibility of an advanced ASL recognition system and lays the groundwork for future innovations in assistive communication technologies.

Keywords: sign language, computer vision, vision transformer, VGG16, CNN

Procedia PDF Downloads 18
224 Integrated Risk Assessment of Storm Surge and Climate Change for the Coastal Infrastructure

Authors: Sergey V. Vinogradov

Abstract:

Coastal communities are presently facing increased vulnerabilities due to rising sea levels and shifts in global climate patterns, a trend expected to escalate in the long run. To address the needs of government entities, the public sector, and private enterprises, there is an urgent need to thoroughly investigate, assess, and manage the present and projected risks associated with coastal flooding, including storm surges, sea level rise, and nuisance flooding. In response to these challenges, a practical approach to evaluating storm surge inundation risks has been developed. This methodology offers an integrated assessment of potential flood risk in targeted coastal areas. The physical modeling framework involves simulating synthetic storms and utilizing hydrodynamic models that align with projected future climate and ocean conditions. Both publicly available and site-specific data form the basis for a risk assessment methodology designed to translate inundation model outputs into statistically significant projections of expected financial and operational consequences. This integrated approach produces measurable indicators of impacts stemming from floods, encompassing economic and other dimensions. By establishing connections between the frequency of modeled flood events and their consequences across a spectrum of potential future climate conditions, our methodology generates probabilistic risk assessments. These assessments not only account for future uncertainty but also yield comparable metrics, such as expected annual losses for each inundation event. These metrics furnish stakeholders with a dependable dataset to guide strategic planning and inform investments in mitigation. Importantly, the model's adaptability ensures its relevance across diverse coastal environments, even in instances where site-specific data for analysis may be limited.

Keywords: climate, coastal, surge, risk

Procedia PDF Downloads 37
223 Detection of Safety Goggles on Humans in Industrial Environment Using Faster-Region Based on Convolutional Neural Network with Rotated Bounding Box

Authors: Ankit Kamboj, Shikha Talwar, Nilesh Powar

Abstract:

To successfully deliver our products in the market, the employees need to be in a safe environment, especially in an industrial and manufacturing environment. The consequences of delinquency in wearing safety glasses while working in industrial plants could be high risk to employees, hence the need to develop a real-time automatic detection system which detects the persons (violators) not wearing safety glasses. In this study a convolutional neural network (CNN) algorithm called faster region based CNN (Faster RCNN) with rotated bounding box has been used for detecting safety glasses on persons; the algorithm has an advantage of detecting safety glasses with different orientation angles on the persons. The proposed method of rotational bounding boxes with a convolutional neural network first detects a person from the images, and then the method detects whether the person is wearing safety glasses or not. The video data is captured at the entrance of restricted zones of the industrial environment (manufacturing plant), which is further converted into images at 2 frames per second. In the first step, the CNN with pre-trained weights on COCO dataset is used for person detection where the detections are cropped as images. Then the safety goggles are labelled on the cropped images using the image labelling tool called roLabelImg, which is used to annotate the ground truth values of rotated objects more accurately, and the annotations obtained are further modified to depict four coordinates of the rectangular bounding box. Next, the faster RCNN with rotated bounding box is used to detect safety goggles, which is then compared with traditional bounding box faster RCNN in terms of detection accuracy (average precision), which shows the effectiveness of the proposed method for detection of rotatory objects. The deep learning benchmarking is done on a Dell workstation with a 16GB Nvidia GPU.

Keywords: CNN, deep learning, faster RCNN, roLabelImg rotated bounding box, safety goggle detection

Procedia PDF Downloads 117
222 River Habitat Modeling for the Entire Macroinvertebrate Community

Authors: Pinna Beatrice., Laini Alex, Negro Giovanni, Burgazzi Gemma, Viaroli Pierluigi, Vezza Paolo

Abstract:

Habitat models rarely consider macroinvertebrates as ecological targets in rivers. Available approaches mainly focus on single macroinvertebrate species, not addressing the ecological needs and functionality of the entire community. This research aimed to provide an approach to model the habitat of the macroinvertebrate community. The approach is based on the recently developed Flow-T index, together with a Random Forest (RF) regression, which is employed to apply the Flow-T index at the meso-habitat scale. Using different datasets gathered from both field data collection and 2D hydrodynamic simulations, the model has been calibrated in the Trebbia river (2019 campaign), and then validated in the Trebbia, Taro, and Enza rivers (2020 campaign). The three rivers are characterized by a braiding morphology, gravel riverbeds, and summer low flows. The RF model selected 12 mesohabitat descriptors as important for the macroinvertebrate community. These descriptors belong to different frequency classes of water depth, flow velocity, substrate grain size, and connectivity to the main river channel. The cross-validation R² coefficient (R²𝒸ᵥ) of the training dataset is 0.71 for the Trebbia River (2019), whereas the R² coefficient for the validation datasets (Trebbia, Taro, and Enza Rivers 2020) is 0.63. The agreement between the simulated results and the experimental data shows sufficient accuracy and reliability. The outcomes of the study reveal that the model can identify the ecological response of the macroinvertebrate community to possible flow regime alterations and to possible river morphological modifications. Lastly, the proposed approach allows extending the MesoHABSIM methodology, widely used for the fish habitat assessment, to a different ecological target community. Further applications of the approach can be related to flow design in both perennial and non-perennial rivers, including river reaches in which fish fauna is absent.

Keywords: ecological flows, macroinvertebrate community, mesohabitat, river habitat modeling

Procedia PDF Downloads 77
221 Transcriptome Analysis of Saffron (crocus sativus L.) Stigma Focusing on Identification Genes Involved in the Biosynthesis of Crocin

Authors: Parvaneh Mahmoudi, Ahmad Moeni, Seyed Mojtaba Khayam Nekoei, Mohsen Mardi, Mehrshad Zeinolabedini, Ghasem Hosseini Salekdeh

Abstract:

Saffron (Crocus sativus L.) is one of the most important spice and medicinal plants. The three-branch style of C. sativus flowers are the most important economic part of the plant and known as saffron, which has several medicinal properties. Despite the economic and biological significance of this plant, knowledge about its molecular characteristics is very limited. In the present study, we, for the first time, constructed a comprehensive dataset for C. sativus stigma through de novo transcriptome sequencing. We performed de novo transcriptome sequencing of C. sativus stigma using the Illumina paired-end sequencing technology. A total of 52075128 reads were generated and assembled into 118075 unigenes, with an average length of 629 bp and an N50 of 951 bp. A total of 66171unigenes were identified, among them, 66171 (56%) were annotated in the non-redundant National Center for Biotechnology Information (NCBI) database, 30938 (26%) were annotated in the Swiss-Prot database, 10273 (8.7%) unigenes were mapped to 141 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, while 52560 (44%) and 40756 (34%) unigenes were assigned to Gen Ontology (GO) categories and Eukaryotic Orthologous Groups of proteins (KOG), respectively. In addition, 65 candidate genes involved in three stages of crocin biosynthesis were identified. Finally, transcriptome sequencing of saffron stigma was used to identify 6779 potential microsatellites (SSRs) molecular markers. High-throughput de novo transcriptome sequencing provided a valuable resource of transcript sequences of C. sativus in public databases. In addition, most of candidate genes potentially involved in crocin biosynthesis were identified which could be further utilized in functional genomics studies. Furthermore, numerous obtained SSRs might contribute to address open questions about the origin of this amphiploid spices with probable little genetic diversity.

Keywords: saffron, transcriptome, NGS, bioinformatic

Procedia PDF Downloads 75
220 Resisting Adversarial Assaults: A Model-Agnostic Autoencoder Solution

Authors: Massimo Miccoli, Luca Marangoni, Alberto Aniello Scaringi, Alessandro Marceddu, Alessandro Amicone

Abstract:

The susceptibility of deep neural networks (DNNs) to adversarial manipulations is a recognized challenge within the computer vision domain. Adversarial examples, crafted by adding subtle yet malicious alterations to benign images, exploit this vulnerability. Various defense strategies have been proposed to safeguard DNNs against such attacks, stemming from diverse research hypotheses. Building upon prior work, our approach involves the utilization of autoencoder models. Autoencoders, a type of neural network, are trained to learn representations of training data and reconstruct inputs from these representations, typically minimizing reconstruction errors like mean squared error (MSE). Our autoencoder was trained on a dataset of benign examples; learning features specific to them. Consequently, when presented with significantly perturbed adversarial examples, the autoencoder exhibited high reconstruction errors. The architecture of the autoencoder was tailored to the dimensions of the images under evaluation. We considered various image sizes, constructing models differently for 256x256 and 512x512 images. Moreover, the choice of the computer vision model is crucial, as most adversarial attacks are designed with specific AI structures in mind. To mitigate this, we proposed a method to replace image-specific dimensions with a structure independent of both dimensions and neural network models, thereby enhancing robustness. Our multi-modal autoencoder reconstructs the spectral representation of images across the red-green-blue (RGB) color channels. To validate our approach, we conducted experiments using diverse datasets and subjected them to adversarial attacks using models such as ResNet50 and ViT_L_16 from the torch vision library. The autoencoder extracted features used in a classification model, resulting in an MSE (RGB) of 0.014, a classification accuracy of 97.33%, and a precision of 99%.

Keywords: adversarial attacks, malicious images detector, binary classifier, multimodal transformer autoencoder

Procedia PDF Downloads 88
219 Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Authors: Ebipatei Victoria Tunyan, T. A. Cao, Cheol Young Ock

Abstract:

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Keywords: subjective bias detection, machine learning, BERT–BiLSTM–Attention, text classification, natural language processing

Procedia PDF Downloads 112