Search results for: speech dataset
528 Identification of Spam Keywords Using Hierarchical Category in C2C E-Commerce
Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park
Abstract:
Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like e-bay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C e-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C e-commerce.Keywords: spam keyword, e-commerce, keyword features, spam filtering
Procedia PDF Downloads 293527 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: machine learning, imbalanced data, data mining, big data
Procedia PDF Downloads 130526 Towards Law Data Labelling Using Topic Modelling
Authors: Daniel Pinheiro Da Silva Junior, Aline Paes, Daniel De Oliveira, Christiano Lacerda Ghuerren, Marcio Duran
Abstract:
The Courts of Accounts are institutions responsible for overseeing and point out irregularities of Public Administration expenses. They have a high demand for processes to be analyzed, whose decisions must be grounded on severity laws. Despite the existing large amount of processes, there are several cases reporting similar subjects. Thus, previous decisions on already analyzed processes can be a precedent for current processes that refer to similar topics. Identifying similar topics is an open, yet essential task for identifying similarities between several processes. Since the actual amount of topics is considerably large, it is tedious and error-prone to identify topics using a pure manual approach. This paper presents a tool based on Machine Learning and Natural Language Processing to assists in building a labeled dataset. The tool relies on Topic Modelling with Latent Dirichlet Allocation to find the topics underlying a document followed by Jensen Shannon distance metric to generate a probability of similarity between documents pairs. Furthermore, in a case study with a corpus of decisions of the Rio de Janeiro State Court of Accounts, it was noted that data pre-processing plays an essential role in modeling relevant topics. Also, the combination of topic modeling and a calculated distance metric over document represented among generated topics has been proved useful in helping to construct a labeled base of similar and non-similar document pairs.Keywords: courts of accounts, data labelling, document similarity, topic modeling
Procedia PDF Downloads 177525 Time Series Analysis the Case of China and USA Trade Examining during Covid-19 Trade Enormity of Abnormal Pricing with the Exchange rate
Authors: Md. Mahadi Hasan Sany, Mumenunnessa Keya, Sharun Khushbu, Sheikh Abujar
Abstract:
Since the beginning of China's economic reform, trade between the U.S. and China has grown rapidly, and has increased since China's accession to the World Trade Organization in 2001. The US imports more than it exports from China, reducing the trade war between China and the U.S. for the 2019 trade deficit, but in 2020, the opposite happens. In international and U.S. trade, Washington launched a full-scale trade war against China in March 2016, which occurred a catastrophic epidemic. The main goal of our study is to measure and predict trade relations between China and the U.S., before and after the arrival of the COVID epidemic. The ML model uses different data as input but has no time dimension that is present in the time series models and is only able to predict the future from previously observed data. The LSTM (a well-known Recurrent Neural Network) model is applied as the best time series model for trading forecasting. We have been able to create a sustainable forecasting system in trade between China and the US by closely monitoring a dataset published by the State Website NZ Tatauranga Aotearoa from January 1, 2015, to April 30, 2021. Throughout the survey, we provided a 180-day forecast that outlined what would happen to trade between China and the US during COVID-19. In addition, we have illustrated that the LSTM model provides outstanding outcome in time series data analysis rather than RFR and SVR (e.g., both ML models). The study looks at how the current Covid outbreak affects China-US trade. As a comparative study, RMSE transmission rate is calculated for LSTM, RFR and SVR. From our time series analysis, it can be said that the LSTM model has given very favorable thoughts in terms of China-US trade on the future export situation.Keywords: RFR, China-U.S. trade war, SVR, LSTM, deep learning, Covid-19, export value, forecasting, time series analysis
Procedia PDF Downloads 196524 A Pilot Study to Investigate the Use of Machine Translation Post-Editing Training for Foreign Language Learning
Authors: Hong Zhang
Abstract:
The main purpose of this study is to show that machine translation (MT) post-editing (PE) training can help our Chinese students learn Spanish as a second language. Our hypothesis is that they might make better use of it by learning PE skills specific for foreign language learning. We have developed PE training materials based on the data collected in a previous study. Training material included the special error types of the output of MT and the error types that our Chinese students studying Spanish could not detect in the experiment last year. This year we performed a pilot study in order to evaluate the PE training materials effectiveness and to what extent PE training helps Chinese students who study the Spanish language. We used screen recording to record these moments and made note of every action done by the students. Participants were speakers of Chinese with intermediate knowledge of Spanish. They were divided into two groups: Group A performed PE training and Group B did not. We prepared a Chinese text for both groups, and participants translated it by themselves (human translation), and then used Google Translate to translate the text and asked them to post-edit the raw MT output. Comparing the results of PE test, Group A could identify and correct the errors faster than Group B students, Group A did especially better in omission, word order, part of speech, terminology, mistranslation, official names, and formal register. From the results of this study, we can see that PE training can help Chinese students learn Spanish as a second language. In the future, we could focus on the students’ struggles during their Spanish studies and complete the PE training materials to teach Chinese students learning Spanish with machine translation.Keywords: machine translation, post-editing, post-editing training, Chinese, Spanish, foreign language learning
Procedia PDF Downloads 143523 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score
Authors: Jianfeng Hu
Abstract:
Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes
Procedia PDF Downloads 283522 Automatic Detection and Filtering of Negative Emotion-Bearing Contents from Social Media in Amharic Using Sentiment Analysis and Deep Learning Methods
Authors: Derejaw Lake Melie, Alemu Kumlachew Tegegne
Abstract:
The increasing prevalence of social media in Ethiopia has exacerbated societal challenges by fostering the proliferation of negative emotional posts and comments. Illicit use of social media has further exacerbated divisions among the population. Addressing these issues through manual identification and aggregation of emotions from millions of users for swift decision-making poses significant challenges, particularly given the rapid growth of Amharic language usage on social platforms. Consequently, there is a critical need to develop an intelligent system capable of automatically detecting and categorizing negative emotional content into social, religious, and political categories while also filtering out toxic online content. This paper aims to leverage sentiment analysis techniques to achieve automatic detection and filtering of negative emotional content from Amharic social media texts, employing a comparative study of deep learning algorithms. The study utilized a dataset comprising 29,962 comments collected from social media platforms using comment exporter software. Data pre-processing techniques were applied to enhance data quality, followed by the implementation of deep learning methods for training, testing, and evaluation. The results showed that CNN, GRU, LSTM, and Bi-LSTM classification models achieved accuracies of 83%, 50%, 84%, and 86%, respectively. Among these models, Bi-LSTM demonstrated the highest accuracy of 86% in the experiment.Keywords: negative emotion, emotion detection, social media filtering sentiment analysis, deep learning.
Procedia PDF Downloads 22521 Copper Price Prediction Model for Various Economic Situations
Authors: Haidy S. Ghali, Engy Serag, A. Samer Ezeldin
Abstract:
Copper is an essential raw material used in the construction industry. During the year 2021 and the first half of 2022, the global market suffered from a significant fluctuation in copper raw material prices due to the aftermath of both the COVID-19 pandemic and the Russia-Ukraine war, which exposed its consumers to an unexpected financial risk. Thereto, this paper aims to develop two ANN-LSTM price prediction models, using Python, that can forecast the average monthly copper prices traded in the London Metal Exchange; the first model is a multivariate model that forecasts the copper price of the next 1-month and the second is a univariate model that predicts the copper prices of the upcoming three months. Historical data of average monthly London Metal Exchange copper prices are collected from January 2009 till July 2022, and potential external factors are identified and employed in the multivariate model. These factors lie under three main categories: energy prices and economic indicators of the three major exporting countries of copper, depending on the data availability. Before developing the LSTM models, the collected external parameters are analyzed with respect to the copper prices using correlation and multicollinearity tests in R software; then, the parameters are further screened to select the parameters that influence the copper prices. Then, the two LSTM models are developed, and the dataset is divided into training, validation, and testing sets. The results show that the performance of the 3-Month prediction model is better than the 1-Month prediction model, but still, both models can act as predicting tools for diverse economic situations.Keywords: copper prices, prediction model, neural network, time series forecasting
Procedia PDF Downloads 111520 Unlocking E-commerce: Analyzing User Behavior and Segmenting Customers for Strategic Insights
Authors: Aditya Patil, Arun Patil, Vaishali Patil, Sudhir Chitnis, Anjum Patel
Abstract:
Rapid growth has given e-commerce platforms a lot of client behavior and spending data. To maximize their strategy, businesses must understand how customers utilize online shopping platforms and what influences their purchases. Our research focuses on e-commerce user behavior and purchasing trends. This extensive study examines spending and user behavior. Regression and grouping disclose relevant data from the dataset. We can understand user spending trends via multilevel regression. We can analyze how pricing, user demographics, and product categories affect customer purchase decisions with this technique. Clustering groups consumers by spending. Important information was found. Purchase habits vary by user group. Our analysis illuminates the complex world of e-commerce consumer behavior and purchase trends. Understanding user behavior helps create effective e-commerce marketing strategies. This market can benefit from K-means clustering. This study focuses on tailoring strategies to user groups and improving product and price effectiveness. Customer buying behaviors across categories were shown via K-means clusters. Average spending is highest in Cluster 4 and lowest in Cluster 3. Clothing is less popular than gadgets and appliances around the holidays. Cluster spending distribution is examined using average variables. Our research enhances e-commerce analytics. Companies can improve customer service and decision-making with this data.Keywords: e-commerce, regression, clustering, k-means
Procedia PDF Downloads 17519 Environmental Controls on the Distribution of Intertidal Foraminifers in Sabkha Al-Kharrar, Saudi Arabia: Implications for Sea-Level Changes
Authors: Talha A. Al-Dubai, Rashad A. Bantan, Ramadan H. Abu-Zied, Brian G. Jones, Aaid G. Al-Zubieri
Abstract:
Contemporary foraminiferal samples sediments were collected from the intertidal sabkha of Al-Kharrar Lagoon, Saudi Arabia, to study the vertical distribution of Foraminifera and, based on a modern training set, their potential to develop a predictor of former sea-level changes in the area. Based on hierarchical cluster analysis, the intertidal sabkha is divided into three vertical zones (A, B & C) represented by three foraminiferal assemblages, where agglutinated species occupied Zone A and calcareous species occupied the other two zones. In Zone A (high intertidal), Agglutinella compressa, Clavulina angularis and C. multicamerata are dominant species with a minor presence of Peneroplis planatus, Coscinospira hemprichii, Sorites orbiculus, Quinqueloculina lamarckiana, Q. seminula, Ammonia convexa and A. tepida. In contrast, in Zone B (middle intertidal) the most abundant species are P. planatus, C. hemprichii, S. orbiculus, Q. lamarckiana, Q. seminula and Q. laevigata, while Zone C (low intertidal) is characterised by C. hemprichii, Q. costata, S. orbiculus, P. planatus, A. convexa, A. tepida, Spiroloculina communis and S. costigera. A transfer function for sea-level reconstruction was developed using a modern dataset of 75 contemporary sediment samples and 99 species collected from several transects across the sabkha. The model provided an error of 0.12m, suggesting that intertidal foraminifers are able to predict the past sea-level changes with high precision in Al-Kharrar Lagoon, and thus the future prediction of those changes in the area.Keywords: Lagoonal foraminifers, intertidal sabkha, vertical zonation, transfer function, sea level
Procedia PDF Downloads 168518 Improving Our Understanding of the in vivo Modelling of Psychotic Disorders
Authors: Zsanett Bahor, Cristina Nunes-Fonseca, Gillian L. Currie, Emily S. Sena, Lindsay D.G. Thomson, Malcolm R. Macleod
Abstract:
Psychosis is ranked as the third most disabling medical condition in the world by the World Health Organization. Despite a substantial amount of research in recent years, available treatments are not universally effective and have a wide range of adverse side effects. Since many clinical drug candidates are identified through in vivo modelling, a deeper understanding of these models, and their strengths and limitations, might help us understand reasons for difficulties in psychosis drug development. To provide an unbiased summary of the preclinical psychosis literature we performed a systematic electronic search of PubMed for publications modelling a psychotic disorder in vivo, identifying 14,721 relevant studies. Double screening of 11,000 publications from this dataset so far established 2403 animal studies of psychosis, with the most common model being schizophrenia (95%). 61% of these models are induced using pharmacological agents. For all the models only 56% of publications test a therapeutic treatment. We propose a systematic review of these studies to assess the prevalence of reporting of measures to reduce risk of bias, and a meta-analysis to assess the internal and external validity of these animal models. Our findings are likely to be relevant to future preclinical studies of psychosis as this generation of strong empirical evidence has the potential to identify weaknesses, areas for improvement and make suggestions on refinement of experimental design. Such a detailed understanding of the data which inform what we think we know will help improve the current attrition rate between bench and bedside in psychosis research.Keywords: animal models, psychosis, systematic review, schizophrenia
Procedia PDF Downloads 289517 The Impact of Food Inflation on Poverty: An Analysis of the Different Households in the Philippines
Authors: Kara Gianina D. Rosas, Jade Emily L. Tong
Abstract:
This study assesses the vulnerability of households to food price shocks. Using the Philippines as a case study, the researchers aim to understand how such shocks can cause food insecurity in different types of households. This paper measures the impact of actual food price changes during the food crisis of 2006-2009 on poverty in relation to their spatial location. Households are classified as rural or urban and agricultural or non-agricultural. By treating food prices and consumption patterns as heterogeneous, this study differs from conventional poverty analysis as actual prices are used. Merging the Family, Income and Expenditure Survey (FIES) with the Consumer Price Index dataset (CPI), the researchers were able to determine the effects on poverty measures, specifically, headcount index, poverty gap, and poverty severity. The study finds that, without other interventions, food inflation would lead to a significant increase in the number of households that fall below the poverty threshold, except for households whose income is derived from agricultural activities. It also finds that much of the inflation during these years was fueled by the rise in staple food prices. Essentially, this paper aims to broaden the economic perspective of policymakers with regard to the heterogeneity of impacts of inflation through analyzing the deeper microeconomic levels of different subgroups. In hopes of finding a solution to lessen the inequality gap of poverty between the rural and urban poor, this paper aims to aid policymakers in creating projects targeted towards food insecurity.Keywords: poverty, food inflation, agricultural households, non-agricultural households, net consumption ratio, urban poor, rural poor, head count index, poverty gap, poverty severity
Procedia PDF Downloads 245516 The Role of Urban Development Patterns for Mitigating Extreme Urban Heat: The Case Study of Doha, Qatar
Authors: Yasuyo Makido, Vivek Shandas, David J. Sailor, M. Salim Ferwati
Abstract:
Mitigating extreme urban heat is challenging in a desert climate such as Doha, Qatar, since outdoor daytime temperature area often too high for the human body to tolerate. Recent studies demonstrate that cities in arid and semiarid areas can exhibit ‘urban cool islands’ - urban areas that are cooler than the surrounding desert. However, the variation of temperatures as a result of the time of day and factors leading to temperature change remain at the question. To address these questions, we examined the spatial and temporal variation of air temperature in Doha, Qatar by conducting multiple vehicle-base local temperature observations. We also employed three statistical approaches to model surface temperatures using relevant predictors: (1) Ordinary Least Squares, (2) Regression Tree Analysis and (3) Random Forest for three time periods. Although the most important determinant factors varied by day and time, distance to the coast was the significant determinant at midday. A 70%/30% holdout method was used to create a testing dataset to validate the results through Pearson’s correlation coefficient. The Pearson’s analysis suggests that the Random Forest model more accurately predicts the surface temperatures than the other methods. We conclude with recommendations about the types of development patterns that show the greatest potential for reducing extreme heat in air climates.Keywords: desert cities, tree-structure regression model, urban cool Island, vehicle temperature traverse
Procedia PDF Downloads 392515 Russian Spatial Impersonal Sentence Models in Translation Perspective
Authors: Marina Fomina
Abstract:
The paper focuses on the category of semantic subject within the framework of a functional approach to linguistics. The semantic subject is related to similar notions such as the grammatical subject and the bearer of predicative feature. It is the multifaceted nature of the category of subject that 1) triggers a number of issues that, syntax-wise, remain to be dealt with (cf. semantic vs. syntactic functions / sentence parts vs. parts of speech issues, etc.); 2) results in a variety of approaches to the category of subject, such as formal grammatical, semantic/syntactic (functional), communicative approaches, etc. Many linguists consider the prototypical approach to the category of subject to be the most instrumental as it reveals the integrity of denotative and linguistic components of the conceptual category. This approach relates to subject as a source of non-passive predicative feature, an element of subject-predicate-object situation that can take on a variety of semantic roles, cf.: 1) an agent (He carefully surveyed the valley stretching before him), 2) an experiencer (I feel very bitter about this), 3) a recipient (I received this book as a gift), 4) a causee (The plane broke into three pieces), 5) a patient (This stove cleans easily), etc. It is believed that the variety of roles stems from the radial (prototypical) structure of the category with some members more central than others. Translation-wise, the most “treacherous” subject types are the peripheral ones. The paper 1) features a peripheral status of spatial impersonal sentence models such as U menia v ukhe zvenit (lit. I-Gen. in ear buzzes) within the category of semantic subject, 2) makes a structural and semantic analysis of the models, 3) focuses on their Russian-English translation patterns, 4) reveals non-prototypical features of subjects in the English equivalents.Keywords: bearer of predicative feature, grammatical subject, impersonal sentence model, semantic subject
Procedia PDF Downloads 369514 An End-to-end Piping and Instrumentation Diagram Information Recognition System
Authors: Taekyong Lee, Joon-Young Kim, Jae-Min Cha
Abstract:
Piping and instrumentation diagram (P&ID) is an essential design drawing describing the interconnection of process equipment and the instrumentation installed to control the process. P&IDs are modified and managed throughout a whole life cycle of a process plant. For the ease of data transfer, P&IDs are generally handed over from a design company to an engineering company as portable document format (PDF) which is hard to be modified. Therefore, engineering companies have to deploy a great deal of time and human resources only for manually converting P&ID images into a computer aided design (CAD) file format. To reduce the inefficiency of the P&ID conversion, various symbols and texts in P&ID images should be automatically recognized. However, recognizing information in P&ID images is not an easy task. A P&ID image usually contains hundreds of symbol and text objects. Most objects are pretty small compared to the size of a whole image and are densely packed together. Traditional recognition methods based on geometrical features are not capable enough to recognize every elements of a P&ID image. To overcome these difficulties, state-of-the-art deep learning models, RetinaNet and connectionist text proposal network (CTPN) were used to build a system for recognizing symbols and texts in a P&ID image. Using the RetinaNet and the CTPN model carefully modified and tuned for P&ID image dataset, the developed system recognizes texts, equipment symbols, piping symbols and instrumentation symbols from an input P&ID image and save the recognition results as the pre-defined extensible markup language format. In the test using a commercial P&ID image, the P&ID information recognition system correctly recognized 97% of the symbols and 81.4% of the texts.Keywords: object recognition system, P&ID, symbol recognition, text recognition
Procedia PDF Downloads 151513 Screening Diversity: Artificial Intelligence and Virtual Reality Strategies for Elevating Endangered African Languages in the Film and Television Industry
Authors: Samuel Ntsanwisi
Abstract:
This study investigates the transformative role of Artificial Intelligence (AI) and Virtual Reality (VR) in the preservation of endangered African languages. The study is contextualized within the film and television industry, highlighting disparities in screen representation for certain languages in South Africa, underscoring the need for increased visibility and preservation efforts; with globalization and cultural shifts posing significant threats to linguistic diversity, this research explores approaches to language preservation. By leveraging AI technologies, such as speech recognition, translation, and adaptive learning applications, and integrating VR for immersive and interactive experiences, the study aims to create a framework for teaching and passing on endangered African languages. Through digital documentation, interactive language learning applications, storytelling, and community engagement, the research demonstrates how these technologies can empower communities to revitalize their linguistic heritage. This study employs a dual-method approach, combining a rigorous literature review to analyse existing research on the convergence of AI, VR, and language preservation with primary data collection through interviews and surveys with ten filmmakers. The literature review establishes a solid foundation for understanding the current landscape, while interviews with filmmakers provide crucial real-world insights, enriching the study's depth. This balanced methodology ensures a comprehensive exploration of the intersection between AI, VR, and language preservation, offering both theoretical insights and practical perspectives from industry professionals.Keywords: language preservation, endangered languages, artificial intelligence, virtual reality, interactive learning
Procedia PDF Downloads 59512 A Survey of Skin Cancer Detection and Classification from Skin Lesion Images Using Deep Learning
Authors: Joseph George, Anne Kotteswara Roa
Abstract:
Skin disease is one of the most common and popular kinds of health issues faced by people nowadays. Skin cancer (SC) is one among them, and its detection relies on the skin biopsy outputs and the expertise of the doctors, but it consumes more time and some inaccurate results. At the early stage, skin cancer detection is a challenging task, and it easily spreads to the whole body and leads to an increase in the mortality rate. Skin cancer is curable when it is detected at an early stage. In order to classify correct and accurate skin cancer, the critical task is skin cancer identification and classification, and it is more based on the cancer disease features such as shape, size, color, symmetry and etc. More similar characteristics are present in many skin diseases; hence it makes it a challenging issue to select important features from a skin cancer dataset images. Hence, the skin cancer diagnostic accuracy is improved by requiring an automated skin cancer detection and classification framework; thereby, the human expert’s scarcity is handled. Recently, the deep learning techniques like Convolutional neural network (CNN), Deep belief neural network (DBN), Artificial neural network (ANN), Recurrent neural network (RNN), and Long and short term memory (LSTM) have been widely used for the identification and classification of skin cancers. This survey reviews different DL techniques for skin cancer identification and classification. The performance metrics such as precision, recall, accuracy, sensitivity, specificity, and F-measures are used to evaluate the effectiveness of SC identification using DL techniques. By using these DL techniques, the classification accuracy increases along with the mitigation of computational complexities and time consumption.Keywords: skin cancer, deep learning, performance measures, accuracy, datasets
Procedia PDF Downloads 127511 Physics-Informed Convolutional Neural Networks for Reservoir Simulation
Authors: Jiangxia Han, Liang Xue, Keda Chen
Abstract:
Despite the significant progress over the last decades in reservoir simulation using numerical discretization, meshing is complex. Moreover, the high degree of freedom of the space-time flow field makes the solution process very time-consuming. Therefore, we present Physics-Informed Convolutional Neural Networks(PICNN) as a hybrid scientific theory and data method for reservoir modeling. Besides labeled data, the model is driven by the scientific theories of the underlying problem, such as governing equations, boundary conditions, and initial conditions. PICNN integrates governing equations and boundary conditions into the network architecture in the form of a customized convolution kernel. The loss function is composed of data matching, initial conditions, and other measurable prior knowledge. By customizing the convolution kernel and minimizing the loss function, the neural network parameters not only fit the data but also honor the governing equation. The PICNN provides a methodology to model and history-match flow and transport problems in porous media. Numerical results demonstrate that the proposed PICNN can provide an accurate physical solution from a limited dataset. We show how this method can be applied in the context of a forward simulation for continuous problems. Furthermore, several complex scenarios are tested, including the existence of data noise, different work schedules, and different good patterns.Keywords: convolutional neural networks, deep learning, flow and transport in porous media, physics-informed neural networks, reservoir simulation
Procedia PDF Downloads 142510 ALEF: An Enhanced Approach to Arabic-English Bilingual Translation
Authors: Abdul Muqsit Abbasi, Ibrahim Chhipa, Asad Anwer, Saad Farooq, Hassan Berry, Sonu Kumar, Sundar Ali, Muhammad Owais Mahmood, Areeb Ur Rehman, Bahram Baloch
Abstract:
Accurate translation between structurally diverse languages, such as Arabic and English, presents a critical challenge in natural language processing due to significant linguistic and cultural differences. This paper investigates the effectiveness of Facebook’s mBART model, fine-tuned specifically for sequence-tosequence (seq2seq) translation tasks between Arabic and English, and enhanced through advanced refinement techniques. Our approach leverages the Alef Dataset, a meticulously curated parallel corpus spanning various domains to capture the linguistic richness, nuances, and contextual accuracy essential for high-quality translation. We further refine the model’s output using advanced language models such as GPT-3.5 and GPT-4, which improve fluency, coherence, and correct grammatical errors in translated texts. The fine-tuned model demonstrates substantial improvements, achieving a BLEU score of 38.97, METEOR score of 58.11, and TER score of 56.33, surpassing widely used systems such as Google Translate. These results underscore the potential of mBART, combined with refinement strategies, to bridge the translation gap between Arabic and English, providing a reliable, context-aware machine translation solution that is robust across diverse linguistic contexts.Keywords: natural language processing, machine translation, fine-tuning, Arabic-English translation, transformer models, seq2seq translation, translation evaluation metrics, cross-linguistic communication
Procedia PDF Downloads 6509 Mapping of Geological Structures Using Aerial Photography
Authors: Ankit Sharma, Mudit Sachan, Anurag Prakash
Abstract:
Rapid growth in data acquisition technologies through drones, have led to advances and interests in collecting high-resolution images of geological fields. Being advantageous in capturing high volume of data in short flights, a number of challenges have to overcome for efficient analysis of this data, especially while data acquisition, image interpretation and processing. We introduce a method that allows effective mapping of geological fields using photogrammetric data of surfaces, drainage area, water bodies etc, which will be captured by airborne vehicles like UAVs, we are not taking satellite images because of problems in adequate resolution, time when it is captured may be 1 yr back, availability problem, difficult to capture exact image, then night vision etc. This method includes advanced automated image interpretation technology and human data interaction to model structures and. First Geological structures will be detected from the primary photographic dataset and the equivalent three dimensional structures would then be identified by digital elevation model. We can calculate dip and its direction by using the above information. The structural map will be generated by adopting a specified methodology starting from choosing the appropriate camera, camera’s mounting system, UAVs design ( based on the area and application), Challenge in air borne systems like Errors in image orientation, payload problem, mosaicing and geo referencing and registering of different images to applying DEM. The paper shows the potential of using our method for accurate and efficient modeling of geological structures, capture particularly from remote, of inaccessible and hazardous sites.Keywords: digital elevation model, mapping, photogrammetric data analysis, geological structures
Procedia PDF Downloads 685508 Unsupervised Echocardiogram View Detection via Autoencoder-Based Representation Learning
Authors: Andrea Treviño Gavito, Diego Klabjan, Sanjiv J. Shah
Abstract:
Echocardiograms serve as pivotal resources for clinicians in diagnosing cardiac conditions, offering non-invasive insights into a heart’s structure and function. When echocardiographic studies are conducted, no standardized labeling of the acquired views is performed. Employing machine learning algorithms for automated echocardiogram view detection has emerged as a promising solution to enhance efficiency in echocardiogram use for diagnosis. However, existing approaches predominantly rely on supervised learning, necessitating labor-intensive expert labeling. In this paper, we introduce a fully unsupervised echocardiographic view detection framework that leverages convolutional autoencoders to obtain lower dimensional representations and the K-means algorithm for clustering them into view-related groups. Our approach focuses on discriminative patches from echocardiographic frames. Additionally, we propose a trainable inverse average layer to optimize decoding of average operations. By integrating both public and proprietary datasets, we obtain a marked improvement in model performance when compared to utilizing a proprietary dataset alone. Our experiments show boosts of 15.5% in accuracy and 9.0% in the F-1 score for frame-based clustering, and 25.9% in accuracy and 19.8% in the F-1 score for view-based clustering. Our research highlights the potential of unsupervised learning methodologies and the utilization of open-sourced data in addressing the complexities of echocardiogram interpretation, paving the way for more accurate and efficient cardiac diagnoses.Keywords: artificial intelligence, echocardiographic view detection, echocardiography, machine learning, self-supervised representation learning, unsupervised learning
Procedia PDF Downloads 31507 Analysing Trends in Rice Cropping Intensity and Seasonality across the Philippines Using 14 Years of Moderate Resolution Remote Sensing Imagery
Authors: Bhogendra Mishra, Andy Nelson, Mirco Boschetti, Lorenzo Busetto, Alice Laborte
Abstract:
Rice is grown on over 100 million hectares in almost every country of Asia. It is the most important staple crop for food security and has high economic and cultural importance in Asian societies. The combination of genetic diversity and management options, coupled with the large geographic extent means that there is a large variation in seasonality (when it is grown) and cropping intensity (how often it is grown per year on the same plot of land), even over relatively small distances. Seasonality and intensity can and do change over time depending on climatic, environmental and economic factors. Detecting where and when these changes happen can provide information to better understand trends in regional and even global rice production. Remote sensing offers a unique opportunity to estimate these trends. We apply the recently published PhenoRice algorithm to 14 years of moderate resolution remote sensing (MODIS) data (utilizing 250m resolution 16 day composites from Terra and Aqua) to estimate seasonality and cropping intensity per year and changes over time. We compare the results to the surveyed data collected by International Rice Research Institute (IRRI). The study results in a unique and validated dataset on the extent and change of extent, the seasonality and change in seasonality and the cropping intensity and change in cropping intensity between 2003 and 2016 for the Philippines. Observed trends and their implications for food security and trade policies are also discussed.Keywords: rice, cropping intensity, moderate resolution remote sensing (MODIS), phenology, seasonality
Procedia PDF Downloads 305506 Lung HRCT Pattern Classification for Cystic Fibrosis Using a Convolutional Neural Network
Authors: Parisa Mansour
Abstract:
Cystic fibrosis (CF) is one of the most common autosomal recessive diseases among whites. It mostly affects the lungs, causing infections and inflammation that account for 90% of deaths in CF patients. Because of this high variability in clinical presentation and organ involvement, investigating treatment responses and evaluating lung changes over time is critical to preventing CF progression. High-resolution computed tomography (HRCT) greatly facilitates the assessment of lung disease progression in CF patients. Recently, artificial intelligence was used to analyze chest CT scans of CF patients. In this paper, we propose a convolutional neural network (CNN) approach to classify CF lung patterns in HRCT images. The proposed network consists of two convolutional layers with 3 × 3 kernels and maximally connected in each layer, followed by two dense layers with 1024 and 10 neurons, respectively. The softmax layer prepares a predicted output probability distribution between classes. This layer has three exits corresponding to the categories of normal (healthy), bronchitis and inflammation. To train and evaluate the network, we constructed a patch-based dataset extracted from more than 1100 lung HRCT slices obtained from 45 CF patients. Comparative evaluation showed the effectiveness of the proposed CNN compared to its close peers. Classification accuracy, average sensitivity and specificity of 93.64%, 93.47% and 96.61% were achieved, indicating the potential of CNNs in analyzing lung CF patterns and monitoring lung health. In addition, the visual features extracted by our proposed method can be useful for automatic measurement and finally evaluation of the severity of CF patterns in lung HRCT images.Keywords: HRCT, CF, cystic fibrosis, chest CT, artificial intelligence
Procedia PDF Downloads 65505 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences
Authors: C. Xavier Mendieta, J. J McArthur
Abstract:
Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.Keywords: building archetypes, data analysis, energy benchmarks, GHG emissions
Procedia PDF Downloads 306504 Neuron Point-of-Care Stem Cell Therapy: Intrathecal Transplant of Autologous Bone Marrow-Derived Stem Cells in Patients with Cerebral Palsy
Authors: F. Ruiz-Navarro, M. Matzner, G. Kobinia
Abstract:
Background: Cerebral palsy (CP) encompasses the largest group of childhood movement disorders, the patterns and severity varies widely. Today, the management focuses only on a rehabilitation therapy that tries to secure the functions remained and prevents complications. However the treatments are not aimed to cure the disease. Stem cells (SCs) transplant via intrathecal is a new approach to the disease. Method: Our aim was to performed a pilot study under the condition of unproven treatment on clinical practice to assessed the safety and efficacy of Neuron Point-of-care Stem cell Therapy (N-POCST), an ambulatory procedure of autologous bone marrow derived SCs (BM-SCs) harvested from the posterior superior iliac crest undergo an on-site cell separation for intrathecal infusion via lumbar puncture. Results: 82 patients were treated in a period of 28 months, with a follow-up after 6 months. They had a mean age of 6,2 years old and male predominance (65,9%). Our preliminary results show that: A. No patient had any major side effects, B. Only 20% presented mild headache due to LP, C. 53% of the patients had an improvement in spasticity, D. 61% improved the coordination abilities, 23% improved the motor function, 15% improved the speech, 23% reduced the number of convulsive events with the same doses or less doses of anti-convulsive medication and 94% of the patients report a subjective general improvement. Conclusions: These results support previous worldwide publications that described the safety and effectiveness of autologous BM-SCs transplant for patients wit CP.Keywords: autologous transplant, cerebral palsy, point of care, childhood movement disorders
Procedia PDF Downloads 413503 Educational Experience, Record Keeping, Genetic Selection and Herd Management Effects on Monthly Milk Yield and Revenues of Dairy Farms in Southern Vietnam
Authors: Ngoc-Hieu Vu
Abstract:
A study was conducted to estimate the record keeping, genetic selection, educational experience, and farm management effect on monthly milk yield per farm, average milk yield per cow, monthly milk revenue per farm, and monthly milk revenue per cow of dairy farms in the Southern region of Vietnam. The dataset contained 5448 monthly record collected from January 2013 to May 2015. Results showed that longer experience increased (P < 0.001) monthly milk yields and revenues. Better educated farmers produced more monthly milk per farm and monthly milk per cow and revenues (P < 0.001) than lower educated farmers. Farm that kept records on individual animals had higher (P < 0.001) for monthly milk yields and revenues than farms that did not. Farms that used hired people produced the highest (p < 0.05) monthly milk yield per farm, milk yield per cow and revenues, followed by farms that used both hire and family members, and lowest values were for farms that used family members only. Farms that used crosses Holstein in herd were higher performance (p < 0.001) for all traits than farms that used purebred Holstein and other breeds. Farms that used genetic information and phenotypes when selecting sires were higher (p < 0.05) for all traits than farms that used only phenotypes and personal option. Farms that received help from Vet, organization staff, or government officials had higher monthly milk yield and revenues than those that decided by owner. These findings suggest that dairy farmers should be training in systematic, must be considered and continuous support to improve farm milk production and revenues, to increase the likelihood of adoption on a sustainable way.Keywords: dairy farming, education, milk yield, Southern Vietnam
Procedia PDF Downloads 330502 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score
Procedia PDF Downloads 133501 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms
Authors: Bliss Singhal
Abstract:
Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression
Procedia PDF Downloads 80500 The Redundant Kana: A Pragmatic Reading
Authors: Manal Mohammed Hisham Said Najjar
Abstract:
The Arab Grammarians shed light on the redundant kana (was) and gave it a considerable attention. However, their considerations and interpretations pertaining to using this verb varied: is it used to determine tense? Or used for further emphasis or for another function? Does it have a syntactic function? Morphologically, could it be used in other forms than the past? In addition, Arab Grammarians discussed the possibility of using kana to locate itself in between the syntactic constructs of a sentence, a phrase, or a collocation. Others questioned its position whether it is in initial or final. This study found out that the redundant kana (was) is cited in Quran and was used by the Arabs in their speech and poetry. This redundant kana, whether used in initial position or in a final position, or in between the constructs of a sentence, a phrase, or a collocation, implies pragmatic meanings intended by the speaker or the poet to serve different functions, such as to indicate the past tense, to provide emphasis, and to refer to the continuity of the effect and meaning of a verb or adjective. The study concludes that this verb kana can be utilized in different contexts to achieve a specific effect as did the old Arabs who used it to add specific shades of meanings. Kana as a redundant word could be added to further highlight the meaning aimed at in a specific utterance. In addition, this verb can be used in both the past and the present morphological form; and its availability in an utterance could be functional and could not be. In other words, the study found out that the redundant kana can be used in various positions in an utterance, initial, final, or in between a syntactic structure, provided that this use is pragmatically functional. In conclusion, this paper seeks to invite the scholars of the Arabic language to coin a new term which is the “pragmatic kana” to replace the term “kana alzae’da (redundant kana)” which might mean that its use is redundant and void of significance – a fact that is illogical due to its recurrent use in the Holy Quran. NOTE: Please take this study not the other one (sent by mistake) and titled kana alnaqisaKeywords: redundan, kana, grammarians, quran
Procedia PDF Downloads 128499 Campaign Contributions as Freedom of Expression: A Comparative Study Between the United States and Germany
Authors: Kristof Lukas Heidemann
Abstract:
In times of democratic backsliding in Western nations restoring public trust in the electoral process ranks among the most urgent tasks on the public agenda. Addressing the role of money in politics is one major part of this effort, however, such an endeavor might affect the constitutional freedom of expression. Attempts to regulate political spending in the U.S. have in recent decades increasingly been overruled by the U.S. Supreme through an expansion of the protective umbrella of the First Amendment over campaign contributions by private organizations, especially in the decisions Buckley v. Valeo and Citizens United v. FEC. In Germany on the other hand this line of argumentation has so far not been submitted to the national Supreme Court. Given that voices calling for stricter and more transparent political financing laws in Germany are growing, it seems only a matter of time until the issue will have to be addressed by the country’s judiciary as well. Therefore, this paper conducts a comparative analysis of the constitutional right to free expression in these two leading democracies in to assess whether the problem of a lack of regulatory options to achieve stricter campaign spending laws due to constitutional restrictions will also arise in Germany. In order to present a comprehensive picture of the subject, the analysis does not only touch upon doctrinal aspects of both systems but also scrutinizes the practical implications from a socio-legal perspective. Although the list of forms of expression in the wording of Art. 5 of the German constitution is generally considered to be non-exhaustive, the investigation concludes that the subsumption of election campaign donations under it is not justifiable using recognized methods of interpretation, in particular concerning a systematic interpretation in light of the principle of equality in Art. 3 of the German constitution.Keywords: comparative constitutional law, constitutional justice, constitutional law, election law, freedom of speech, fundamental rights, law reform
Procedia PDF Downloads 3