Search results for: k nearest neighbor classifier
142 Vehicle Speed Estimation Using Image Processing
Authors: Prodipta Bhowmik, Poulami Saha, Preety Mehra, Yogesh Soni, Triloki Nath Jha
Abstract:
In India, the smart city concept is growing day by day. So, for smart city development, a better traffic management and monitoring system is a very important requirement. Nowadays, road accidents increase due to more vehicles on the road. Reckless driving is mainly responsible for a huge number of accidents. So, an efficient traffic management system is required for all kinds of roads to control the traffic speed. The speed limit varies from road to road basis. Previously, there was a radar system but due to high cost and less precision, the radar system is unable to become favorable in a traffic management system. Traffic management system faces different types of problems every day and it has become a researchable topic on how to solve this problem. This paper proposed a computer vision and machine learning-based automated system for multiple vehicle detection, tracking, and speed estimation of vehicles using image processing. Detection of vehicles and estimating their speed from a real-time video is tough work to do. The objective of this paper is to detect vehicles and estimate their speed as accurately as possible. So for this, a real-time video is first captured, then the frames are extracted from that video, then from that frames, the vehicles are detected, and thereafter, the tracking of vehicles starts, and finally, the speed of the moving vehicles is estimated. The goal of this method is to develop a cost-friendly system that can able to detect multiple types of vehicles at the same time.Keywords: OpenCV, Haar Cascade classifier, DLIB, YOLOV3, centroid tracker, vehicle detection, vehicle tracking, vehicle speed estimation, computer vision
Procedia PDF Downloads 85141 Intrusion Detection in Cloud Computing Using Machine Learning
Authors: Faiza Babur Khan, Sohail Asghar
Abstract:
With an emergence of distributed environment, cloud computing is proving to be the most stimulating computing paradigm shift in computer technology, resulting in spectacular expansion in IT industry. Many companies have augmented their technical infrastructure by adopting cloud resource sharing architecture. Cloud computing has opened doors to unlimited opportunities from application to platform availability, expandable storage and provision of computing environment. However, from a security viewpoint, an added risk level is introduced from clouds, weakening the protection mechanisms, and hardening the availability of privacy, data security and on demand service. Issues of trust, confidentiality, and integrity are elevated due to multitenant resource sharing architecture of cloud. Trust or reliability of cloud refers to its capability of providing the needed services precisely and unfailingly. Confidentiality is the ability of the architecture to ensure authorization of the relevant party to access its private data. It also guarantees integrity to protect the data from being fabricated by an unauthorized user. So in order to assure provision of secured cloud, a roadmap or model is obligatory to analyze a security problem, design mitigation strategies, and evaluate solutions. The aim of the paper is twofold; first to enlighten the factors which make cloud security critical along with alleviation strategies and secondly to propose an intrusion detection model that identifies the attackers in a preventive way using machine learning Random Forest classifier with an accuracy of 99.8%. This model uses less number of features. A comparison with other classifiers is also presented.Keywords: cloud security, threats, machine learning, random forest, classification
Procedia PDF Downloads 320140 Phylogenetic Analysis of Klebsiella Species from Clinical Specimens from Nelson Mandela Academic Hospital in Mthatha, South Africa
Authors: Sandeep Vasaikar, Lary Obi
Abstract:
Rapid and discriminative genotyping methods are useful for determining the clonality of the isolates in nosocomial or household outbreaks. Multilocus sequence typing (MLST) is a nucleotide sequence-based approach for characterising bacterial isolates. The genetic diversity and the clinical relevance of the drug-resistant Klebsiella isolates from Mthatha are largely unknown. For this reason, prospective, experimental study of the molecular epidemiology of Klebsiella isolates from patients being treated in Mthatha over a three-year period was analysed. Methodology: PCR amplification and sequencing of the drug-resistance-associated genes, and multilocus sequence typing (MLST) using 7 housekeeping genes mdh, pgi, infB, FusAR, phoE, gapA and rpoB were conducted. A total of 32 isolates were analysed. Results: The percentages of multidrug-resistant (MDR), extensively drug-resistance (XDR) and pandrug-resistant (PDR) isolates were; MDR 65.6 % (21) and XDR and PDR with 0 % each. In this study, K. pneumoniae was 19/32 (59.4 %). MLST results showed 22 sequence types (STs) were identified, which were further separated by Maximum Parsimony into 10 clonal complexes and 12 singletons. The most dominant group was Klebsiella pneumoniae with 23/32 (71.8 %) isolates, Klebsiella oxytoca as a second group with 2/32 (6.25 %) isolates, and a single (3.1 %) K. varricola as a third group while 6 isolates were of unknown sequences. Conclusions/significance: A phylogenetic analysis of the concatenated sequences of the 7 housekeeping genes showed that strains of K. pneumoniae form a distinct lineage within the genus Klebsiella, with K. oxytoca and K. varricola its nearest phylogenetic neighbours. With the analysis of 7 genes were determined 1 K. variicola, which was mistakenly identified as K. pneumoniae by phenotypic methods. Two misidentifications of K. oxytoca were found when phenotypic methods were used. No significant differences were observed between ESBL blaCTX-M, blaTEM and blaSHV groups in the distribution of Sequence types (STs) or Clonal complexes (CCs).Keywords: phylogenetic analysis, phylogeny, klebsiella phylogenetic, klebsiella
Procedia PDF Downloads 374139 Genotyping and Phylogeny of Phaeomoniella Genus Associated with Grapevine Trunk Diseases in Algeria
Authors: A. Berraf-Tebbal, Z. Bouznad, , A.J.L. Phillips
Abstract:
Phaeomoniella is a fungus genus in the mitosporic ascomycota which includes Phaeomoniella chlamydospora specie associated with two declining diseases on grapevine (Vitis vinifera) namely Petri disease and esca. Recent studies have shown that several Phaeomoniella species also cause disease on many other woody crops, such as forest trees and woody ornamentals. Two new species, Phaeomoniella zymoides and Phaeomoniella pinifoliorum H.B. Lee, J.Y. Park, R.C. Summerbell et H.S. Jung, were isolated from the needle surface of Pinus densiflora Sieb. et Zucc. in Korea. The identification of species in Phaeomoniella genus can be a difficult task if based solely on morphological and cultural characters. In this respect, the application of molecular methods, particularly PCR-based techniques, may provide an important contribution. MSP-PCR (microsatellite primed-PCR) fingerprinting has proven useful in the molecular typing of fungal strains. The high discriminatory potential of this method is particularly useful when dealing with closely related or cryptic species. In the present study, the application of PCR fingerprinting was performed using the micro satellite primer M13 for the purpose of species identification and strain typing of 84 Phaeomoniella -like isolates collected from grapevines with typical symptoms of dieback. The bands produced by MSP-PCR profiles divided the strains into 3 clusters and 5 singletons with a reproducibility level of 80%. Representative isolates from each group and, when possible, isolates from Eutypa dieback and esca symptoms were selected for sequencing of the ITS region. The ITS sequences for the 16 isolates selected from the MSP-PCR profiles were combined and aligned with sequences of 18 isolates retrieved from GenBank, representing a selection of all known Phaeomoniella species. DNA sequences were compared with those available in GenBank using Neighbor-joining (NJ) and Maximum-parsimony (MP) analyses. The phylogenetic trees of the ITS region revealed that the Phaeomoniella isolates clustered with Phaeomoniella chlamydospora reference sequences with a bootstrap support of 100 %. The complexity of the pathosystems vine-trunk diseases shows clearly the need to identify unambiguously the fungal component in order to allow a better understanding of the etiology of these diseases and justify the establishment of control strategies against these fungal agents.Keywords: Genotyping, MSP-PCR, ITS, phylogeny, trunk diseases
Procedia PDF Downloads 481138 The Reasons for Vegetarianism in Estonia and its Effects to Body Composition
Authors: Ülle Parm, Kata Pedamäe, Jaak Jürimäe, Evelin Lätt, Aivar Orav, Anna-Liisa Tamm
Abstract:
Vegetarianism has gained popularity across the world. It`s being chosen for multiple reasons, but among Estonians, these have remained unknown. Previously, attention to bone health and probable nutrient deficiency of vegetarians has been paid and in vegetarians lower body mass index (BMI) and blood cholesterol level has been found but the results are inconclusive. The goal was to explain reasons for choosing vegetarian diet in Estonia and impact of vegetarianism to body composition – BMI, fat percentage (fat%), fat mass (FM), and fat free mass (FFM). The study group comprised of 68 vegetarians and 103 omnivorous. The determining body composition with DXA (Hologic) was concluded in 2013. Body mass (medical electronic scale, A&D Instruments, Abingdon, UK) and height (Martin metal anthropometer to the nearest 0.1 cm) were measured and BMI calculated (kg/m2). General data (physical activity level included) was collected with questionnaires. The main reasons why vegetarianism was chosen were the healthiness of the vegetarian diet (59%) and the wish to fight for animal rights (72%) Food additives were consumed by less than half of vegetarians, more often by men. Vegetarians had lower BMI than omnivores, especially amongst men. Based on BMI classification, vegetarians were less obese than omnivores. However, there were no differences in the FM, FFM and fat percentage figures of the two groups. Higher BMI might be the cause of higher physical activity level among omnivores compared with vegetarians. For classifying people as underweight, normal weight, overweight and obese both BMI and fat% criteria were used. By BMI classification in comparison with fat%, more people in the normal weight group were considered; by using fat% in comparison with BMI classification, however, more people categorized as overweight. It can be concluded that the main reasons for vegetarianism chosen in Estonia are healthiness of the vegetarian diet and the wish to fight for animal rights and vegetarian diet has no effect on body fat percentage, FM and FFM.Keywords: body composition, body fat percentage, body mass index, vegetarianism
Procedia PDF Downloads 418137 Monte Carlo and Biophysics Analysis in a Criminal Trial
Authors: Luca Indovina, Carmela Coppola, Carlo Altucci, Riccardo Barberi, Rocco Romano
Abstract:
In this paper a real court case, held in Italy at the Court of Nola, in which a correct physical description, conducted with both a Monte Carlo and biophysical analysis, would have been sufficient to arrive at conclusions confirmed by documentary evidence, is considered. This will be an example of how forensic physics can be useful in confirming documentary evidence in order to reach hardly questionable conclusions. This was a libel trial in which the defendant, Mr. DS (Defendant for Slander), had falsely accused one of his neighbors, Mr. OP (Offended Person), of having caused him some damages. The damages would have been caused by an external plaster piece that would have detached from the neighbor’s property and would have hit Mr DS while he was in his garden, much more than a meter far away from the facade of the building from which the plaster piece would have detached. In the trial, Mr. DS claimed to have suffered a scratch on his forehead, but he never showed the plaster that had hit him, nor was able to tell from where the plaster would have arrived. Furthermore, Mr. DS presented a medical certificate with a diagnosis of contusion of the cerebral cortex. On the contrary, the images of Mr. OP’s security cameras do not show any movement in the garden of Mr. DS in a long interval of time (about 2 hours) around the time of the alleged accident, nor do they show any people entering or coming out from the house of Mr. DS in the same interval of time. Biophysical analysis shows that both the diagnosis of the medical certificate and the wound declared by the defendant, already in conflict with each other, are not compatible with the fall of external plaster pieces too small to be found. The wind was at a level 1 of the Beaufort scale, that is, unable to raise even dust (level 4 of the Beaufort scale). Therefore, the motion of the plaster pieces can be described as a projectile motion, whereas collisions with the building cornice can be treated using Newtons law of coefficients of restitution. Numerous numerical Monte Carlo simulations show that the pieces of plaster would not have been able to reach even the garden of Mr. DS, let alone a distance over 1.30 meters. Results agree with the documentary evidence (images of Mr. OP’s security cameras) that Mr. DS could not have been hit by plaster pieces coming from Mr. OP’s property.Keywords: biophysics analysis, Monte Carlo simulations, Newton’s law of restitution, projectile motion
Procedia PDF Downloads 132136 Iris Recognition Based on the Low Order Norms of Gradient Components
Authors: Iman A. Saad, Loay E. George
Abstract:
Iris pattern is an important biological feature of human body; it becomes very hot topic in both research and practical applications. In this paper, an algorithm is proposed for iris recognition and a simple, efficient and fast method is introduced to extract a set of discriminatory features using first order gradient operator applied on grayscale images. The gradient based features are robust, up to certain extents, against the variations may occur in contrast or brightness of iris image samples; the variations are mostly occur due lightening differences and camera changes. At first, the iris region is located, after that it is remapped to a rectangular area of size 360x60 pixels. Also, a new method is proposed for detecting eyelash and eyelid points; it depends on making image statistical analysis, to mark the eyelash and eyelid as a noise points. In order to cover the features localization (variation), the rectangular iris image is partitioned into N overlapped sub-images (blocks); then from each block a set of different average directional gradient densities values is calculated to be used as texture features vector. The applied gradient operators are taken along the horizontal, vertical and diagonal directions. The low order norms of gradient components were used to establish the feature vector. Euclidean distance based classifier was used as a matching metric for determining the degree of similarity between the features vector extracted from the tested iris image and template features vectors stored in the database. Experimental tests were performed using 2639 iris images from CASIA V4-Interival database, the attained recognition accuracy has reached up to 99.92%.Keywords: iris recognition, contrast stretching, gradient features, texture features, Euclidean metric
Procedia PDF Downloads 336135 The Motivation System Development: Case-Study of the Trade Metal Company in Russian Federation
Authors: Elena V. Lysenko
Abstract:
Motivating as the leading function of a modern Human Resources Management involves issues of increasing the effectiveness of the organization in a broader context. During the formation of motivational systems, the top-management of organization should pay equal attention to both external motivation (incentive system) and internal (self-motivation). The balance of internal and external motivation harmonizes the relations between employers and employees, increases the level of job satisfaction by the organization staff, which in turn leads the organization to success and ensures the organization`s profitability and competitiveness in the market environment. The article is devoted to the study of personnel motivation system in the small metal trade company, which is located in Yekaterinburg, Russian Federation. The study took place during November-December, 2016 ordered by the Company Director to analyze the motivational potential of work (managerial aspect of motivation) and motivation of personnel (personnel aspect of motivation) with the purpose to construct a system of employees’ motivation. The research tools included 6 specially selected tests of motivation, which are: “Motivation profile of your job”, “Constructive motivational attitudes”, Tests about Motivation of achievements (1st variant: Test by А.Mehrabian by the theory of D.С.McClelland and 2nd variant: Test about leading needs according with the theory of D.С.MacClelland), Tests by T.Elers (1st variant: “Determination of the motivation towards success or to avoid failure” and 2nd variant: “Trends to achieve results or to avoid failure”). The results of the study showed only one, but fundamental problem of the whole organization: high level of both motivational potential in work and self-motivation, especially in terms of achievement motivation, but serious lack of productivity. According the results which study showed this problem is derived from insufficient staff competence. The research suggests basic guidelines in order to build the new personnel motivation system for this Company, which is planned to be developed in the nearest future.Keywords: incentive system, motivation of achievements, motivation system, self-motivation
Procedia PDF Downloads 311134 Fake News Detection Based on Fusion of Domain Knowledge and Expert Knowledge
Authors: Yulan Wu
Abstract:
The spread of fake news on social media has posed significant societal harm to the public and the nation, with its threats spanning various domains, including politics, economics, health, and more. News on social media often covers multiple domains, and existing models studied by researchers and relevant organizations often perform well on datasets from a single domain. However, when these methods are applied to social platforms with news spanning multiple domains, their performance significantly deteriorates. Existing research has attempted to enhance the detection performance of multi-domain datasets by adding single-domain labels to the data. However, these methods overlook the fact that a news article typically belongs to multiple domains, leading to the loss of domain knowledge information contained within the news text. To address this issue, research has found that news records in different domains often use different vocabularies to describe their content. In this paper, we propose a fake news detection framework that combines domain knowledge and expert knowledge. Firstly, it utilizes an unsupervised domain discovery module to generate a low-dimensional vector for each news article, representing domain embeddings, which can retain multi-domain knowledge of the news content. Then, a feature extraction module uses the domain embeddings discovered through unsupervised domain knowledge to guide multiple experts in extracting news knowledge for the total feature representation. Finally, a classifier is used to determine whether the news is fake or not. Experiments show that this approach can improve multi-domain fake news detection performance while reducing the cost of manually labeling domain labels.Keywords: fake news, deep learning, natural language processing, multiple domains
Procedia PDF Downloads 75133 Genetic Diversity of Sugar Beet Pollinators
Authors: Ksenija Taški-Ajdukovic, Nevena Nagl, Živko Ćurčić, Dario Danojević
Abstract:
Information about genetic diversity of sugar beet parental populations is of a great importance for hybrid breeding programs. The aim of this research was to evaluate genetic diversity among and within populations and lines of diploid sugar beet pollinators, by using SSR markers. As plant material were used eight pollinators originating from three USDA-ARS breeding programs and four pollinators from Institute of Field and Vegetable Crops, Novi Sad. Depending on the presence of self-fertility gene, the pollinators were divided into three groups: autofertile (inbred lines), autosterile (open-pollinating populations), and group with partial presence of autofertility gene. A total of 40 SSR primers were screened, out of which 34 were selected for the analysis of genetic diversity. A total of 129 different alleles were obtained with mean value 3.2 alleles per SSR primer. According to the results of genetic variability assessment the number and percentage of polymorphic loci was the maximal in pollinators NS1 and tester cms2 while effective number of alleles, expected heterozygosis and Shannon’s index was highest in pollinator EL0204. Analysis of molecular variance (AMOVA) showed that 77.34% of the total genetic variation was attributed to intra-varietal variance. Correspondence analysis results were very similar to grouping by neighbor-joining algorithm. Number of groups was smaller by one, because correspondence analysis merged IFVCNS pollinators with CZ25 into one group. Pollinators FC220, FC221 and C 51 were in the next group, while self-fertile pollinators CR10 and C930-35 from USDA-Salinas were separated. On another branch were self-sterile pollinators ЕL0204 and ЕL53 from USDA-East Lansing. Sterile testers cms1 and cms2 formed separate group. The presented results confirmed that SSR analysis can be successfully used in estimation of genetic diversity within and among sugar beet populations. Since the tested pollinator differed considering the presence of self-fertility gene, their heterozygosity differed as well. It was lower in genotypes with fixed self-fertility genes. Since the most of tested populations were open-pollinated, which rarely self-pollinate, high variability within the populations was expected. Cluster analysis grouped populations according to their origin.Keywords: auto fertility, genetic diversity, pollinator, SSR, sugar beet
Procedia PDF Downloads 461132 Roof and Road Network Detection through Object Oriented SVM Approach Using Low Density LiDAR and Optical Imagery in Misamis Oriental, Philippines
Authors: Jigg L. Pelayo, Ricardo G. Villar, Einstine M. Opiso
Abstract:
The advances of aerial laser scanning in the Philippines has open-up entire fields of research in remote sensing and machine vision aspire to provide accurate timely information for the government and the public. Rapid mapping of polygonal roads and roof boundaries is one of its utilization offering application to disaster risk reduction, mitigation and development. The study uses low density LiDAR data and high resolution aerial imagery through object-oriented approach considering the theoretical concept of data analysis subjected to machine learning algorithm in minimizing the constraints of feature extraction. Since separating one class from another in distinct regions of a multi-dimensional feature-space, non-trivial computing for fitting distribution were implemented to formulate the learned ideal hyperplane. Generating customized hybrid feature which were then used in improving the classifier findings. Supplemental algorithms for filtering and reshaping object features are develop in the rule set for enhancing the final product. Several advantages in terms of simplicity, applicability, and process transferability is noticeable in the methodology. The algorithm was tested in the different random locations of Misamis Oriental province in the Philippines demonstrating robust performance in the overall accuracy with greater than 89% and potential to semi-automation. The extracted results will become a vital requirement for decision makers, urban planners and even the commercial sector in various assessment processes.Keywords: feature extraction, machine learning, OBIA, remote sensing
Procedia PDF Downloads 363131 Credit Card Fraud Detection with Ensemble Model: A Meta-Heuristic Approach
Authors: Gong Zhilin, Jing Yang, Jian Yin
Abstract:
The purpose of this paper is to develop a novel system for credit card fraud detection based on sequential modeling of data using hybrid deep learning models. The projected model encapsulates five major phases are pre-processing, imbalance-data handling, feature extraction, optimal feature selection, and fraud detection with an ensemble classifier. The collected raw data (input) is pre-processed to enhance the quality of the data through alleviation of the missing data, noisy data as well as null values. The pre-processed data are class imbalanced in nature, and therefore they are handled effectively with the K-means clustering-based SMOTE model. From the balanced class data, the most relevant features like improved Principal Component Analysis (PCA), statistical features (mean, median, standard deviation) and higher-order statistical features (skewness and kurtosis). Among the extracted features, the most optimal features are selected with the Self-improved Arithmetic Optimization Algorithm (SI-AOA). This SI-AOA model is the conceptual improvement of the standard Arithmetic Optimization Algorithm. The deep learning models like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and optimized Quantum Deep Neural Network (QDNN). The LSTM and CNN are trained with the extracted optimal features. The outcomes from LSTM and CNN will enter as input to optimized QDNN that provides the final detection outcome. Since the QDNN is the ultimate detector, its weight function is fine-tuned with the Self-improved Arithmetic Optimization Algorithm (SI-AOA).Keywords: credit card, data mining, fraud detection, money transactions
Procedia PDF Downloads 131130 Representation and Reality: Media Influences on Japanese Attitudes towards China
Authors: Shuk Ting Kinnia Yau
Abstract:
As China has become more and more influential in the global and geo-political arena, mutual understanding between Japan and China has also become a topic of paramount importance. There have always been tensions between the two countries, but unfortunately, each country tends to blame the other for fanning emotions. This research will investigate portrayals of China and the Chinese people in Japanese media such as newspapers, TV news, TV drama, and cinema over this period, focusing on media sources that have particularly wide viewership or readership. By doing so, it attempts to detect any general trends in the positive or negative character of such portrayals and to see if they correlate with the results of surveys of attitudes among the general population. To the degree that correlations may be found, the question arises as to whether the media portrayals are a reflection of societal attitudes towards the Chinese, on one hand, or may be playing a role in promoting such attitudes, on the other. The relationship here is, without doubt, more complex than a simple one-way relationship of cause and effect, but indications of some direction of causality may be suggested by trends in one occurring before or after the other. Evidence will also be sought of possible longer-term trends in media portrayals of China and the Chinese people in Japan during the post-2012 period, i.e., Abe Shinzo’s second term as prime minister, in comparison to earlier periods. Perceptions of Japan’s view of China and the Chinese, both inside and outside the scholarly world, tend to be oversimplified and are often incomprehensive. This research calls attention to the role played by the media in promoting or de-promoting Sino-Japanese relations. By analyzing the nature and background of images of China and the Chinese people presented in the Japanese media, especially under the new Abe Regime, this research seeks to promote a more balanced and comprehensive understanding of attitudes in Japanese society towards its gigantic neighbor. Scholars have seen the increasingly fragile Sino-Japanese relationship as inseparable from the real-world political conflicts that have become more frequent in recent years and have sought to draw a correlation between the two. The influence of the media, however, remains a mostly under-explored domain in the academic world. Against this background, this research aims to provide an enriched scholarly understanding of Japan’s perception of China by investigating to what extent such perception can be seen to be affected by subjective or selective forms of presentation of China found in the Japanese media, or vice versa.Keywords: Abe Shinzo, China, Japan, media
Procedia PDF Downloads 311129 Mild Hypothermia Versus Normothermia in Patients Undergoing Cardiac Surgery: A Propensity Matched Analysis
Authors: Ramanish Ravishankar, Azar Hussain, Mahmoud Loubani, Mubarak Chaudhry
Abstract:
Background and Aims: Currently, there are no strict guidelines in cardiopulmonary bypass temperature management in cardiac surgery not involving the aortic arch. This study aims to compare patient outcomes undergoing mild hypothermia and normothermia. The aim of this study was to compare patient outcomes between mild hypothermia and normothermia undergoing on-pump cardiac surgery not involving the aortic arch. Methods: This was a retrospective cohort study from January 2015 until May 2023. Patients who underwent cardiac surgery with cardiopulmonary bypass temperatures ≥32oC were included and stratified into mild hypothermia (32oC – 35oC) and normothermia (>35oC) cohorts. Propensity matching was applied through the nearest neighbour method (1:1) using the risk factors detailed in the EuroScore using RStudio. The primary outcome was mortality. Secondary outcomes included post-op stay, intensive care unit readmission, re-admission, stroke, and renal complications. Patients who had major aortic surgery and off-pump operations were excluded. Results: Each cohort had 1675 patients. There was a significant increase in overall mortality with the mild hypothermia cohort (3.59% vs. 2.32%; p=0.04912). There was also a greater stroke incidence (2.09% vs. 1.13%; p=0.0396) and transient ischaemic attack (TIA) risk (3.1% vs. 1.49%; p=0.0027). There was no significant difference in renal complications (9.13% vs. 7.88%; p=0.2155). Conclusions: Patient’s who underwent mild hypothermia during cardiopulmonary bypass have a significantly greater mortality, stroke, and transient ischaemic attack incidence. Mild hypothermia does not appear to provide any benefit over normothermia and does not appear to provide any neuroprotective benefits. This shows different results to that of other major studies; further trials and studies need to be conducted to reach a consensus.Keywords: cardiac surgery, therapeutic hypothermia, neuroprotection, cardiopulmonary bypass
Procedia PDF Downloads 68128 The Impact of Adopting Cross Breed Dairy Cows on Households’ Income and Food Security in the Case of Dejen Woreda, Amhara Region, Ethiopia
Authors: Misganaw Chere Siferih
Abstract:
This study assessed the impact of crossbreed dairy cows on household income and food security. The study area is found in Dejen Woreda, East Gojam Zone, and Amhara region of Ethiopia. Random sampling technique was used to obtain a sample of 80 crossbreed dairy cow owners and 176 indigenous dairy cow owners. The study employed food consumption score analytical framework to measure food security status of the household. No Statistical significant mean difference is found between crossbreed owners and indigenous owners. Logistic regression was employed to investigate crossbreed dairy cow adoption determinants , the result indicates that gender, education, labor number, land size cultivated, dairy cooperatives membership, net income and food security status of the household are statistically significant independent variables, which explained the binary dependent variable, crossbreed dairy cow adoption. Propensity score matching (PSM) was employed to analyze the impact of crossbreed dairy cow owners on farmers’ income and food security. The average net income of crossbreed dairy cow owners was found to be significantly higher than indigenous dairy cow owners. Estimates of average treatment effect of the treated (ATT) indicated that crossbreed dairy cow is able to impact households’ net income by 42%, 38.5%, 30.8% and 44.5% higher in kernel, radius, nearest neighborhood and stratification matching algorithms respectively as compared to indigenous dairy cow owners. However, estimates of average treatment of the treated (ATT) suggest that being an owner of crossbreed dairy cow is not able to affect food security significantly. Thus, crossbreed dairy cow enables farmers to increase income but not their food security in the study area. Finally, the study recommended establishing dairy cooperatives and advice farmers to become a member of them, attention to promoting the impact of crossbreed dairy cows and promotion of nutrition focus projects.Keywords: crossbreed dairy cow, net income, food security, propensity score matching
Procedia PDF Downloads 65127 Low-Cost Image Processing System for Evaluating Pavement Surface Distress
Authors: Keerti Kembhavi, M. R. Archana, V. Anjaneyappa
Abstract:
Most asphalt pavement condition evaluation use rating frameworks in which asphalt pavement distress is estimated by type, extent, and severity. Rating is carried out by the pavement condition rating (PCR), which is tedious and expensive. This paper presents the development of a low-cost technique for image pavement distress analysis that permits the identification of pothole and cracks. The paper explores the application of image processing tools for the detection of potholes and cracks. Longitudinal cracking and pothole are detected using Fuzzy-C- Means (FCM) and proceeded with the Spectral Theory algorithm. The framework comprises three phases, including image acquisition, processing, and extraction of features. A digital camera (Gopro) with the holder is used to capture pavement distress images on a moving vehicle. FCM classifier and Spectral Theory algorithms are used to compute features and classify the longitudinal cracking and pothole. The Matlab2016Ra Image preparing tool kit utilizes performance analysis to identify the viability of pavement distress on selected urban stretches of Bengaluru city, India. The outcomes of image evaluation with the utilization semi-computerized image handling framework represented the features of longitudinal crack and pothole with an accuracy of about 80%. Further, the detected images are validated with the actual dimensions, and it is seen that dimension variability is about 0.46. The linear regression model y=1.171x-0.155 is obtained using the existing and experimental / image processing area. The R2 correlation square obtained from the best fit line is 0.807, which is considered in the linear regression model to be ‘large positive linear association’.Keywords: crack detection, pothole detection, spectral clustering, fuzzy-c-means
Procedia PDF Downloads 182126 Analysis of Biomarkers Intractable Epileptogenic Brain Networks with Independent Component Analysis and Deep Learning Algorithms: A Comprehensive Framework for Scalable Seizure Prediction with Unimodal Neuroimaging Data in Pediatric Patients
Authors: Bliss Singhal
Abstract:
Epilepsy is a prevalent neurological disorder affecting approximately 50 million individuals worldwide and 1.2 million Americans. There exist millions of pediatric patients with intractable epilepsy, a condition in which seizures fail to come under control. The occurrence of seizures can result in physical injury, disorientation, unconsciousness, and additional symptoms that could impede children's ability to participate in everyday tasks. Predicting seizures can help parents and healthcare providers take precautions, prevent risky situations, and mentally prepare children to minimize anxiety and nervousness associated with the uncertainty of a seizure. This research proposes a comprehensive framework to predict seizures in pediatric patients by evaluating machine learning algorithms on unimodal neuroimaging data consisting of electroencephalogram signals. The bandpass filtering and independent component analysis proved to be effective in reducing the noise and artifacts from the dataset. Various machine learning algorithms’ performance is evaluated on important metrics such as accuracy, precision, specificity, sensitivity, F1 score and MCC. The results show that the deep learning algorithms are more successful in predicting seizures than logistic Regression, and k nearest neighbors. The recurrent neural network (RNN) gave the highest precision and F1 Score, long short-term memory (LSTM) outperformed RNN in accuracy and convolutional neural network (CNN) resulted in the highest Specificity. This research has significant implications for healthcare providers in proactively managing seizure occurrence in pediatric patients, potentially transforming clinical practices, and improving pediatric care.Keywords: intractable epilepsy, seizure, deep learning, prediction, electroencephalogram channels
Procedia PDF Downloads 86125 Risk of Heatstroke Occurring in Indoor Built Environment Determined with Nationwide Sports and Health Database and Meteorological Outdoor Data
Authors: Go Iwashita
Abstract:
The paper describes how the frequencies of heatstroke occurring in indoor built environment are related to the outdoor thermal environment with big statistical data. As the statistical accident data of heatstroke, the nationwide accident data were obtained from the National Agency for the Advancement of Sports and Health (NAASH) . The meteorological database of the Japanese Meteorological Agency supplied data about 1-hour average temperature, humidity, wind speed, solar radiation, and so forth. Each heatstroke data point from the NAASH database was linked to the meteorological data point acquired from the nearest meteorological station where the accident of heatstroke occurred. This analysis was performed for a 10-year period (2005–2014). During the 10-year period, 3,819 cases of heatstroke were reported in the NAASH database for the investigated secondary/high schools of the nine Japanese representative cities. Heatstroke most commonly occurred in the outdoor schoolyard at a wet-bulb globe temperature (WBGT) of 31°C and in the indoor gymnasium during athletic club activities at a WBGT > 31°C. The determined accident ratio (number of accidents during each club activity divided by the club’s population) in the gymnasium during the female badminton club activities was the highest. Although badminton is played in a gymnasium, these WBGT results show that the risk level during badminton under hot and humid conditions is equal to that of baseball or rugby played in the schoolyard. Except sports, the high risk of heatstroke was observed in schools houses during cultural activities. The risk level for indoor environment under hot and humid condition would be equal to that for outdoor environment based on the above results of WBGT. Therefore control measures against hot and humid indoor condition were needed as installing air conditions not only schools but also residences.Keywords: accidents in schools, club activity, gymnasium, heatstroke
Procedia PDF Downloads 217124 Development the Sensor Lock Knee Joint and Evaluation of Its Effect on Walking and Energy Consumption in Subjects With Quadriceps Weakness
Authors: Mokhtar Arazpour
Abstract:
Objectives: Recently a new kind of stance control knee joint has been developed called the 'sensor lock.' This study aimed to develop and evaluate 'sensor lock', which could potentially solve the problems of walking parameters and gait symmetry in subjects with quadriceps weakness. Methods: Nine subjects with quadriceps weakness were enrolled in this study. A custom-made knee ankle foot orthosis (KAFO) with the same set of components was constructed for each participant. Testing began after orthotic gait training was completed with each of the KAFOs and subjects demonstrated that they could safely walk with crutches. Subjects rested 30 minutes between each trial. The 10 meters walking test is used to assess walking speed in meters/second (m/s). The total time taken to ambulate 6 meters (m) is recorded to the nearest hundredth of a second. 6 m is then divided by the total time (in seconds) taken to ambulate and recorded in m/s. The 6 Minutes Walking Test was used to assess walking endurance in this study. Participants walked around the perimeter of a set circuit for a total of six minutes. To evaluate Physiological cost index (PCI), the subjects were asked to walk using each type of KAFOs along a pre-determined 40 m rectangular walkway at their comfortable self-selected speed. A stopwatch was used to calculate the speed of walking by measuring the time between starting and stopping time and the distance walked. Results: The use of a KAFO fitted with the “sensor lock” knee joint resulted in improvements to walking speed, distance walked and physiological cost index when compared with the knee joint in lock mode. Conclusions: This study demonstrated that the use of a KAFO with the “sensor lock” knee joint could provide significant benefits for subjects with a quadriceps weakness when compared to a KAFO with the knee joint in lock mode.Keywords: stance control knee joint, knee ankle foot orthosis, quadriceps weakness, walking, energy consumption
Procedia PDF Downloads 125123 Identification and Molecular Profiling of A Family I Cystatin Homologue from Sebastes schlegeli Deciphering Its Putative Role in Host Immunity
Authors: Don Anushka Sandaruwan Elvitigala, P. D. S. U. Wickramasinghe, Jehee Lee
Abstract:
Cystatins are a large superfamily of proteins which act as reversible inhibitors of cysteine proteases. Papain proteases and cysteine cathepsins are predominant substrates of cystatins. Cystatin superfamily can be further clustered into three groups as Stefins, Cystatins, and Kininogens. Among them, stefines are also known as family 1 cystatins which harbors cystatin Bs and cystatin As. In this study, a homologue of family one cystatins more close to cystatin Bs was identified from Korean black rockfish (Sebastes schlegeli) using a prior constructed cDNA (complementary deoxyribonucleic acid) database and designated as RfCyt1. The full-length cDNA of RfCyt1 consisted of 573 bp, with a coding region of 294 bp. It comprised a 5´-untranslated region (UTR) of 55 bp, and 3´-UTR of 263 bp. The coding sequence encodes a polypeptide consisting of 97 amino acids with a predicted molecular weight of 11kDa and theoretical isoelectric point of 6.3. The RfCyt1 shared homology with other teleosts and vertebrate species and consisted conserved features of cystatin family signature including single cystatin-like domain, cysteine protease inhibitory signature of pentapeptide (QXVXG) consensus sequence and N-terminal two conserved neighboring glycine (⁸GG⁹) residues. As expected, phylogenetic reconstruction developed using the neighbor-joining method showed that RfCyt1 is clustered with the cystatin family 1 members, in which more closely with its teleostan orthologues. An SYBR Green qPCR (quantitative polymerase chain reaction) assay was performed to quantify the RfCytB transcripts in different tissues in healthy and immune stimulated fish. RfCyt1 was ubiquitously expressed in all tissue types of healthy animals with gill and spleen being the highest. Temporal expression of RfCyt1 displayed significant up-regulation upon infection with Aeromonas salmonicida. Recombinantly expressed RfCyt1 showed concentration-dependent papain inhibitory activity. Collectively these findings evidence for detectable protease inhibitory and immunity relevant roles of RfCyt1 in Sebastes schlegeli.Keywords: Sebastes schlegeli, family 1 cystatin, immune stimulation, expressional modulation
Procedia PDF Downloads 136122 Sentiment Analysis of Fake Health News Using Naive Bayes Classification Models
Authors: Danielle Shackley, Yetunde Folajimi
Abstract:
As more people turn to the internet seeking health-related information, there is more risk of finding false, inaccurate, or dangerous information. Sentiment analysis is a natural language processing technique that assigns polarity scores to text, ranging from positive, neutral, and negative. In this research, we evaluate the weight of a sentiment analysis feature added to fake health news classification models. The dataset consists of existing reliably labeled health article headlines that were supplemented with health information collected about COVID-19 from social media sources. We started with data preprocessing and tested out various vectorization methods such as Count and TFIDF vectorization. We implemented 3 Naive Bayes classifier models, including Bernoulli, Multinomial, and Complement. To test the weight of the sentiment analysis feature on the dataset, we created benchmark Naive Bayes classification models without sentiment analysis, and those same models were reproduced, and the feature was added. We evaluated using the precision and accuracy scores. The Bernoulli initial model performed with 90% precision and 75.2% accuracy, while the model supplemented with sentiment labels performed with 90.4% precision and stayed constant at 75.2% accuracy. Our results show that the addition of sentiment analysis did not improve model precision by a wide margin; while there was no evidence of improvement in accuracy, we had a 1.9% improvement margin of the precision score with the Complement model. Future expansion of this work could include replicating the experiment process and substituting the Naive Bayes for a deep learning neural network model.Keywords: sentiment analysis, Naive Bayes model, natural language processing, topic analysis, fake health news classification model
Procedia PDF Downloads 97121 Lexical Based Method for Opinion Detection on Tripadvisor Collection
Authors: Faiza Belbachir, Thibault Schienhinski
Abstract:
The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.Keywords: Tripadvisor, opinion detection, SentiWordNet, trust score
Procedia PDF Downloads 200120 Development of a Data-Driven Method for Diagnosing the State of Health of Battery Cells, Based on the Use of an Electrochemical Aging Model, with a View to Their Use in Second Life
Authors: Desplanches Maxime
Abstract:
Accurate estimation of the remaining useful life of lithium-ion batteries for electronic devices is crucial. Data-driven methodologies encounter challenges related to data volume and acquisition protocols, particularly in capturing a comprehensive range of aging indicators. To address these limitations, we propose a hybrid approach that integrates an electrochemical model with state-of-the-art data analysis techniques, yielding a comprehensive database. Our methodology involves infusing an aging phenomenon into a Newman model, leading to the creation of an extensive database capturing various aging states based on non-destructive parameters. This database serves as a robust foundation for subsequent analysis. Leveraging advanced data analysis techniques, notably principal component analysis and t-Distributed Stochastic Neighbor Embedding, we extract pivotal information from the data. This information is harnessed to construct a regression function using either random forest or support vector machine algorithms. The resulting predictor demonstrates a 5% error margin in estimating remaining battery life, providing actionable insights for optimizing usage. Furthermore, the database was built from the Newman model calibrated for aging and performance using data from a European project called Teesmat. The model was then initialized numerous times with different aging values, for instance, with varying thicknesses of SEI (Solid Electrolyte Interphase). This comprehensive approach ensures a thorough exploration of battery aging dynamics, enhancing the accuracy and reliability of our predictive model. Of particular importance is our reliance on the database generated through the integration of the electrochemical model. This database serves as a crucial asset in advancing our understanding of aging states. Beyond its capability for precise remaining life predictions, this database-driven approach offers valuable insights for optimizing battery usage and adapting the predictor to various scenarios. This underscores the practical significance of our method in facilitating better decision-making regarding lithium-ion battery management.Keywords: Li-ion battery, aging, diagnostics, data analysis, prediction, machine learning, electrochemical model, regression
Procedia PDF Downloads 70119 Using Time Series NDVI to Model Land Cover Change: A Case Study in the Berg River Catchment Area, Western Cape, South Africa
Authors: Adesuyi Ayodeji Steve, Zahn Munch
Abstract:
This study investigates the use of MODIS NDVI to identify agricultural land cover change areas on an annual time step (2007 - 2012) and characterize the trend in the study area. An ISODATA classification was performed on the MODIS imagery to select only the agricultural class producing 3 class groups namely: agriculture, agriculture/semi-natural, and semi-natural. NDVI signatures were created for the time series to identify areas dominated by cereals and vineyards with the aid of ancillary, pictometry and field sample data. The NDVI signature curve and training samples aided in creating a decision tree model in WEKA 3.6.9. From the training samples two classification models were built in WEKA using decision tree classifier (J48) algorithm; Model 1 included ISODATA classification and Model 2 without, both having accuracies of 90.7% and 88.3% respectively. The two models were used to classify the whole study area, thus producing two land cover maps with Model 1 and 2 having classification accuracies of 77% and 80% respectively. Model 2 was used to create change detection maps for all the other years. Subtle changes and areas of consistency (unchanged) were observed in the agricultural classes and crop practices over the years as predicted by the land cover classification. 41% of the catchment comprises of cereals with 35% possibly following a crop rotation system. Vineyard largely remained constant over the years, with some conversion to vineyard (1%) from other land cover classes. Some of the changes might be as a result of misclassification and crop rotation system.Keywords: change detection, land cover, modis, NDVI
Procedia PDF Downloads 403118 A Machine Learning-Based Model to Screen Antituberculosis Compound Targeted against LprG Lipoprotein of Mycobacterium tuberculosis
Authors: Syed Asif Hassan, Syed Atif Hassan
Abstract:
Multidrug-resistant Tuberculosis (MDR-TB) is an infection caused by the resistant strains of Mycobacterium tuberculosis that do not respond either to isoniazid or rifampicin, which are the most important anti-TB drugs. The increase in the occurrence of a drug-resistance strain of MTB calls for an intensive search of novel target-based therapeutics. In this context LprG (Rv1411c) a lipoprotein from MTB plays a pivotal role in the immune evasion of Mtb leading to survival and propagation of the bacterium within the host cell. Therefore, a machine learning method will be developed for generating a computational model that could predict for a potential anti LprG activity of the novel antituberculosis compound. The present study will utilize dataset from PubChem database maintained by National Center for Biotechnology Information (NCBI). The dataset involves compounds screened against MTB were categorized as active and inactive based upon PubChem activity score. PowerMV, a molecular descriptor generator, and visualization tool will be used to generate the 2D molecular descriptors for the actives and inactive compounds present in the dataset. The 2D molecular descriptors generated from PowerMV will be used as features. We feed these features into three different classifiers, namely, random forest, a deep neural network, and a recurring neural network, to build separate predictive models and choosing the best performing model based on the accuracy of predicting novel antituberculosis compound with an anti LprG activity. Additionally, the efficacy of predicted active compounds will be screened using SMARTS filter to choose molecule with drug-like features.Keywords: antituberculosis drug, classifier, machine learning, molecular descriptors, prediction
Procedia PDF Downloads 392117 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition
Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan
Abstract:
This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.Keywords: linked open data, information integration, digital libraries, data mining
Procedia PDF Downloads 428116 Delineating Floodplain along the Nasia River in Northern Ghana Using HAND Contour
Authors: Benjamin K. Ghansah, Richard K. Appoh, Iliya Nababa, Eric K. Forkuo
Abstract:
The Nasia River is an important source of water for domestic and agricultural purposes to the inhabitants of its catchment. Major farming activities takes place within the floodplain of the river and its network of tributaries. The actual inundation extent of the river system is; however, unknown. Reasons for this lack of information include financial constraints and inadequate human resources as flood modelling is becoming increasingly complex by the day. Knowledge of the inundation extent will help in the assessment of risk posed by the annual flooding of the river, and help in the planning of flood recession agricultural activities. This study used a simple terrain based algorithm, Height Above Nearest Drainage (HAND), to delineate the floodplain of the Nasia River and its tributaries. The HAND model is a drainage normalized digital elevation model, which has its height reference based on the local drainage systems rather than the average mean sea level (AMSL). The underlying principle guiding the development of the HAND model is that hillslope flow paths behave differently when the reference gradient is to the local drainage network as compared to the seaward gradient. The new terrain model of the catchment was created using the NASA’s SRTM Digital Elevation Model (DEM) 30m as the only data input. Contours (HAND Contour) were then generated from the normalized DEM. Based on field flood inundation survey, historical information of flooding of the area as well as satellite images, a HAND Contour of 2m was found to best correlates with the flood inundation extent of the river and its tributaries. A percentage accuracy of 75% was obtained when the surface area created by the 2m contour was compared with surface area of the floodplain computed from a satellite image captured during the peak flooding season in September 2016. It was estimated that the flooding of the Nasia River and its tributaries created a floodplain area of 1011 km².Keywords: digital elevation model, floodplain, HAND contour, inundation extent, Nasia River
Procedia PDF Downloads 457115 Local Directional Encoded Derivative Binary Pattern Based Coral Image Classification Using Weighted Distance Gray Wolf Optimization Algorithm
Authors: Annalakshmi G., Sakthivel Murugan S.
Abstract:
This paper presents a local directional encoded derivative binary pattern (LDEDBP) feature extraction method that can be applied for the classification of submarine coral reef images. The classification of coral reef images using texture features is difficult due to the dissimilarities in class samples. In coral reef image classification, texture features are extracted using the proposed method called local directional encoded derivative binary pattern (LDEDBP). The proposed approach extracts the complete structural arrangement of the local region using local binary batten (LBP) and also extracts the edge information using local directional pattern (LDP) from the edge response available in a particular region, thereby achieving extra discriminative feature value. Typically the LDP extracts the edge details in all eight directions. The process of integrating edge responses along with the local binary pattern achieves a more robust texture descriptor than the other descriptors used in texture feature extraction methods. Finally, the proposed technique is applied to an extreme learning machine (ELM) method with a meta-heuristic algorithm known as weighted distance grey wolf optimizer (GWO) to optimize the input weight and biases of single-hidden-layer feed-forward neural networks (SLFN). In the empirical results, ELM-WDGWO demonstrated their better performance in terms of accuracy on all coral datasets, namely RSMAS, EILAT, EILAT2, and MLC, compared with other state-of-the-art algorithms. The proposed method achieves the highest overall classification accuracy of 94% compared to the other state of art methods.Keywords: feature extraction, local directional pattern, ELM classifier, GWO optimization
Procedia PDF Downloads 164114 An Unsupervised Domain-Knowledge Discovery Framework for Fake News Detection
Authors: Yulan Wu
Abstract:
With the rapid development of social media, the issue of fake news has gained considerable prominence, drawing the attention of both the public and governments. The widespread dissemination of false information poses a tangible threat across multiple domains of society, including politics, economy, and health. However, much research has concentrated on supervised training models within specific domains, their effectiveness diminishes when applied to identify fake news across multiple domains. To solve this problem, some approaches based on domain labels have been proposed. By segmenting news to their specific area in advance, judges in the corresponding field may be more accurate on fake news. However, these approaches disregard the fact that news records can pertain to multiple domains, resulting in a significant loss of valuable information. In addition, the datasets used for training must all be domain-labeled, which creates unnecessary complexity. To solve these problems, an unsupervised domain knowledge discovery framework for fake news detection is proposed. Firstly, to effectively retain the multidomain knowledge of the text, a low-dimensional vector for each news text to capture domain embeddings is generated. Subsequently, a feature extraction module utilizing the unsupervisedly discovered domain embeddings is used to extract the comprehensive features of news. Finally, a classifier is employed to determine the authenticity of the news. To verify the proposed framework, a test is conducted on the existing widely used datasets, and the experimental results demonstrate that this method is able to improve the detection performance for fake news across multiple domains. Moreover, even in datasets that lack domain labels, this method can still effectively transfer domain knowledge, which can educe the time consumed by tagging without sacrificing the detection accuracy.Keywords: fake news, deep learning, natural language processing, multiple domains
Procedia PDF Downloads 101113 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis
Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan
Abstract:
Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of Big Data Technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centers or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through Vader and Roberta model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and TFIDF Vectorization, and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.Keywords: counter vectorization, convolutional neural network, crawler, data technology, long short-term memory, web scraping, sentiment analysis
Procedia PDF Downloads 88