Search results for: categorical datasets
388 Exploring Syntactic and Semantic Features for Text-Based Authorship Attribution
Authors: Haiyan Wu, Ying Liu, Shaoyun Shi
Abstract:
Authorship attribution is to extract features to identify authors of anonymous documents. Many previous works on authorship attribution focus on statistical style features (e.g., sentence/word length), content features (e.g., frequent words, n-grams). Modeling these features by regression or some transparent machine learning methods gives a portrait of the authors' writing style. But these methods do not capture the syntactic (e.g., dependency relationship) or semantic (e.g., topics) information. In recent years, some researchers model syntactic trees or latent semantic information by neural networks. However, few works take them together. Besides, predictions by neural networks are difficult to explain, which is vital in authorship attribution tasks. In this paper, we not only utilize the statistical style and content features but also take advantage of both syntactic and semantic features. Different from an end-to-end neural model, feature selection and prediction are two steps in our method. An attentive n-gram network is utilized to select useful features, and logistic regression is applied to give prediction and understandable representation of writing style. Experiments show that our extracted features can improve the state-of-the-art methods on three benchmark datasets.Keywords: authorship attribution, attention mechanism, syntactic feature, feature extraction
Procedia PDF Downloads 136387 A Study of ZY3 Satellite Digital Elevation Model Verification and Refinement with Shuttle Radar Topography Mission
Authors: Bo Wang
Abstract:
As the first high-resolution civil optical satellite, ZY-3 satellite is able to obtain high-resolution multi-view images with three linear array sensors. The images can be used to generate Digital Elevation Models (DEM) through dense matching of stereo images. However, due to the clouds, forest, water and buildings covered on the images, there are some problems in the dense matching results such as outliers and areas failed to be matched (matching holes). This paper introduced an algorithm to verify the accuracy of DEM that generated by ZY-3 satellite with Shuttle Radar Topography Mission (SRTM). Since the accuracy of SRTM (Internal accuracy: 5 m; External accuracy: 15 m) is relatively uniform in the worldwide, it may be used to improve the accuracy of ZY-3 DEM. Based on the analysis of mass DEM and SRTM data, the processing can be divided into two aspects. The registration of ZY-3 DEM and SRTM can be firstly performed using the conjugate line features and area features matched between these two datasets. Then the ZY-3 DEM can be refined by eliminating the matching outliers and filling the matching holes. The matching outliers can be eliminated based on the statistics on Local Vector Binning (LVB). The matching holes can be filled by the elevation interpolated from SRTM. Some works are also conducted for the accuracy statistics of the ZY-3 DEM.Keywords: ZY-3 satellite imagery, DEM, SRTM, refinement
Procedia PDF Downloads 343386 Attention-Based ResNet for Breast Cancer Classification
Authors: Abebe Mulugojam Negash, Yongbin Yu, Ekong Favour, Bekalu Nigus Dawit, Molla Woretaw Teshome, Aynalem Birtukan Yirga
Abstract:
Breast cancer remains a significant health concern, necessitating advancements in diagnostic methodologies. Addressing this, our paper confronts the notable challenges in breast cancer classification, particularly the imbalance in datasets and the constraints in the accuracy and interpretability of prevailing deep learning approaches. We proposed an attention-based residual neural network (ResNet), which effectively combines the robust features of ResNet with an advanced attention mechanism. Enhanced through strategic data augmentation and positive weight adjustments, this approach specifically targets the issue of data imbalance. The proposed model is tested on the BreakHis dataset and achieved accuracies of 99.00%, 99.04%, 98.67%, and 98.08% in different magnifications (40X, 100X, 200X, and 400X), respectively. We evaluated the performance by using different evaluation metrics such as precision, recall, and F1-Score and made comparisons with other state-of-the-art methods. Our experiments demonstrate that the proposed model outperforms existing approaches, achieving higher accuracy in breast cancer classification.Keywords: residual neural network, attention mechanism, positive weight, data augmentation
Procedia PDF Downloads 101385 Global City Typologies: 300 Cities and Over 100 Datasets
Authors: M. Novak, E. Munoz, A. Jana, M. Nelemans
Abstract:
Cities and local governments the world over are interested to employ circular strategies as a means to bring about food security, create employment and increase resilience. The selection and implementation of circular strategies is facilitated by modeling the effects of strategies locally and understanding the impacts such strategies have had in other (comparable) cities and how that would translate locally. Urban areas are heterogeneous because of their geographic, economic, social characteristics, governance, and culture. In order to better understand the effect of circular strategies on urban systems, we create a dataset for over 300 cities around the world designed to facilitate circular strategy scenario modeling. This new dataset integrates data from over 20 prominent global national and urban data sources, such as the Global Human Settlements layer and International Labour Organisation, as well as incorporating employment data from over 150 cities collected bottom up from local departments and data providers. The dataset is made to be reproducible. Various clustering techniques are explored in the paper. The result is sets of clusters of cities, which can be used for further research, analysis, and support comparative, regional, and national policy making on circular cities.Keywords: data integration, urban innovation, cluster analysis, circular economy, city profiles, scenario modelling
Procedia PDF Downloads 180384 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures
Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat
Abstract:
In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.Keywords: association rules, clustering, similarity measure, statistical approaches
Procedia PDF Downloads 320383 Influence of Pretreatment Magnetic Resonance Imaging on Local Therapy Decisions in Intermediate-Risk Prostate Cancer Patients
Authors: Christian Skowronski, Andrew Shanholtzer, Brent Yelton, Muayad Almahariq, Daniel J. Krauss
Abstract:
Prostate cancer has the third highest incidence rate and is the second leading cause of cancer death for men in the United States. Of the diagnostic tools available for intermediate-risk prostate cancer, magnetic resonance imaging (MRI) provides superior soft tissue delineation serving as a valuable tool for both diagnosis and treatment planning. Currently, there is minimal data regarding the practical utility of MRI for evaluation of intermediate-risk prostate cancer. As such, the National Comprehensive Cancer Network’s guidelines indicate MRI as optional in intermediate-risk prostate cancer evaluation. This project aims to elucidate whether MRI affects radiation treatment decisions for intermediate-risk prostate cancer. This was a retrospective study evaluating 210 patients with intermediate-risk prostate cancer, treated with definitive radiotherapy at our institution between 2019-2020. NCCN risk stratification criteria were used to define intermediate-risk prostate cancer. Patients were divided into two groups: those with pretreatment prostate MRI, and those without pretreatment prostate MRI. We compared the use of external beam radiotherapy, brachytherapy alone, brachytherapy boost, and androgen depravation therapy between the two groups. Inverse probability of treatment weighting was used to match the two groups for age, comorbidity index, American Urologic Association symptoms index, pretreatment PSA, grade group, and percent core involvement on prostate biopsy. Wilcoxon Rank Sum and Chi-squared tests were used to compare continuous and categorical variables. Of the patients who met the study’s eligibility criteria, 133 had a prostate MRI and 77 did not. Following propensity matching, there were no differences between baseline characteristics between the two groups. There were no statistically significant differences in treatments pursued between the two groups: 42% vs 47% were treated with brachytherapy alone, 40% vs 42% were treated with external beam radiotherapy alone, 18% vs 12% were treated with external beam radiotherapy with a brachytherapy boost, and 24% vs 17% received androgen deprivation therapy in the non-MRI and MRI groups, respectively. This analysis suggests that pretreatment MRI does not significantly impact radiation therapy or androgen deprivation therapy decisions in patients with intermediate-risk prostate cancer. Obtaining a pretreatment prostate MRI should be used judiciously and pursued only to answer a specific question, for which the answer is likely to impact treatment decision. Further follow up is needed to correlate MRI findings with their impacts on specific oncologic outcomes.Keywords: magnetic resonance imaging, prostate cancer, definitive radiotherapy, gleason score 7
Procedia PDF Downloads 89382 Burden of Communicable and Non-Communicable Disease in India: A Regional Analysis
Authors: Ajit Kumar Yadav, Priyanka Yadav, F. Ram
Abstract:
In present study is an effort to analyse the burden of diseases in the state. Disability Adjusted Life Years (DALY) is estimated non-communicable diseases. Multi-rounds (52nd, 60th and 71st round) of the National Sample Surveys (NSSO), conducted in 1995-96, 2004 and 2014 respectively, and Million Deaths Study (MDS) of 2001-03, 2006 and 2013-14 datasets are used. Descriptive and multivariate analyses are carried out to identify the determinants of different types of self-reported morbidity and DALY. The prevalence was higher for population aged 60 and above, among females, illiterates, and rich across the time period and for all the selected morbidities. The results were found to be significant at P<0.001. The estimation of DALY revealed that, the burden of communicable diseases was higher during infancy, noticeably among males than females in 2002. However, females aged 1-5 years were more vulnerable to report communicable diseases than the corresponding males. The age distribution of DALY indicates that individuals aged below 5 years and above 60 year were more susceptible to ill health. The growing incidence of non-communicable diseases especially among the older generations put additional burden on the health system in the state. The state has to grapple with the unsettled preventable infectious diseases in one hand and growing non-communicable in other hand.Keywords: disease burden, non-communicable, communicable, India and region
Procedia PDF Downloads 251381 Regional Anesthesia in Carotid Surgery: A Single Center Experience
Authors: Daniel Thompson, Muhammad Peerbux, Sophie Cerutti, Hansraj Riteesh Bookun
Abstract:
Patients with carotid stenosis, which may be asymptomatic or symptomatic in the form of transient ischaemic attack (TIA), amaurosis fugax, or stroke, often require an endarterectomy to reduce stroke risk. Risks of this procedure include stroke, death, myocardial infarction, and cranial nerve damage. Carotid endarterectomy is most commonly performed under general anaesthetic, however, it can also be undertaken with a regional anaesthetic approach. Our tertiary centre generally performs carotid endarterectomy under regional anaesthetic. Our major tertiary hospital mostly utilises regional anaesthesia for carotid endarterectomy. We completed a cross-sectional analysis of all cases of carotid endarterectomy performed under regional anaesthesia across a 10-year period between January 2010 to March 2020 at our institution. 350 patients were included in this descriptive analysis, and demographic details for patients, indications for surgery, procedural details, length of surgery, and complications were collected. Data was cross tabulated and presented in frequency tables to describe these categorical variables. 263 of the 350 patients in the analysis were male, with a mean age of 71 ± 9. 172 patients had a history of ischaemic heart disease, 104 had diabetes mellitus, 318 had hypertension, and 17 patients had chronic kidney disease greater than Stage 3. 13.1% (46 patients) were current smokers, and the majority (63%) were ex-smokers. Most commonly, carotid endarterectomy was performed conventionally with patch arterioplasty 96% of the time (337 patients). The most common indication was TIA and stroke in 64% of patients, 18.9% were classified as asymptomatic, and 13.7% had amaurosis fugax. There were few general complications, with 9 wound complications/infections, 7 postoperative haematomas requiring return to theatre, 3 myocardial infarctions, 3 arrhythmias, 1 exacerbation of congestive heart failure, 1 chest infection, and 1 urinary tract infection. Specific complications to carotid endarterectomy included 3 strokes, 1 postoperative TIA, and 1 cerebral bleed. There were no deaths in our cohort. This analysis of a large cohort of patients from a major tertiary centre who underwent carotid endarterectomy under regional anaesthesia indicates the safety of such an approach for these patients. Regional anaesthesia holds the promise of less general respiratory and cardiac events compared to general anaesthesia, and in this vulnerable patient group, calls for comparative research between local and general anaesthesia in carotid surgery.Keywords: anaesthesia, carotid endarterectomy, stroke, carotid stenosis
Procedia PDF Downloads 121380 Sparse Modelling of Cancer Patients’ Survival Based on Genomic Copy Number Alterations
Authors: Khaled M. Alqahtani
Abstract:
Copy number alterations (CNA) are variations in the structure of the genome, where certain regions deviate from the typical two chromosomal copies. These alterations are pivotal in understanding tumor progression and are indicative of patients' survival outcomes. However, effectively modeling patients' survival based on their genomic CNA profiles while identifying relevant genomic regions remains a statistical challenge. Various methods, such as the Cox proportional hazard (PH) model with ridge, lasso, or elastic net penalties, have been proposed but often overlook the inherent dependencies between genomic regions, leading to results that are hard to interpret. In this study, we enhance the elastic net penalty by incorporating an additional penalty that accounts for these dependencies. This approach yields smooth parameter estimates and facilitates variable selection, resulting in a sparse solution. Our findings demonstrate that this method outperforms other models in predicting survival outcomes, as evidenced by our simulation study. Moreover, it allows for a more meaningful interpretation of genomic regions associated with patients' survival. We demonstrate the efficacy of our approach using both real data from a lung cancer cohort and simulated datasets.Keywords: copy number alterations, cox proportional hazard, lung cancer, regression, sparse solution
Procedia PDF Downloads 45379 On the Cluster of the Families of Hybrid Polynomial Kernels in Kernel Density Estimation
Authors: Benson Ade Eniola Afere
Abstract:
Over the years, kernel density estimation has been extensively studied within the context of nonparametric density estimation. The fundamental components of kernel density estimation are the kernel function and the bandwidth. While the mathematical exploration of the kernel component has been relatively limited, its selection and development remain crucial. The Mean Integrated Squared Error (MISE), serving as a measure of discrepancy, provides a robust framework for assessing the effectiveness of any kernel function. A kernel function with a lower MISE is generally considered to perform better than one with a higher MISE. Hence, the primary aim of this article is to create kernels that exhibit significantly reduced MISE when compared to existing classical kernels. Consequently, this article introduces a cluster of hybrid polynomial kernel families. The construction of these proposed kernel functions is carried out heuristically by combining two kernels from the classical polynomial kernel family using probability axioms. We delve into the analysis of error propagation within these kernels. To assess their performance, simulation experiments, and real-life datasets are employed. The obtained results demonstrate that the proposed hybrid kernels surpass their classical kernel counterparts in terms of performance.Keywords: classical polynomial kernels, cluster of families, global error, hybrid Kernels, Kernel density estimation, Monte Carlo simulation
Procedia PDF Downloads 93378 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network
Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang
Abstract:
As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.Keywords: GUI, deep learning, GAN, data augmentation
Procedia PDF Downloads 184377 High Injury Prevalence in Adolescent Field Hockey Players: Implications for Future Practice
Authors: Pillay J. D., D. De Wit, J. F. Ducray
Abstract:
Field hockey is a popular international sport which is played in more than 100 countries across the world. Due to the nature of hockey, players repeatedly perform a combination of forward flexion and rotational movements of the spine in order to strike the ball. These movements have been shown to increase the risk of pain and injury to the lumbar spine. The aim of this study was to determine the prevalence and incidence of low back pain (LBP) in male adolescent field hockey players and the characteristics of LBP in terms of location, chronicity, disability, and treatment sought, as well as its association with selected risk factors. A survey was conducted on 112 male adolescent field hockey players in the eThekwini Municipality of KwaZulu-Natal, South Africa. The questionnaire contained sections on the demographics of participants, general characteristics of participants, health and lifestyle characteristics, low back pain patterns, treatment of low back pain, and the level of disability associated with LBP. The data were statistically analysed using IBM SPSS version 25 with statistical significance set at p-value <0.05. Descriptive statistics such as mean and standard deviation were used to summarise responses to continuous variables as appropriate. Categorical variables were described using frequency tables. Associations between risk factors and low back pain were tested using Pearson’s chi-square test and t-tests as appropriate. A total of 68 questionnaires were completed for analysis (67% participation rate); the period prevalence of LBP was 63.2% (35.0%:beginning of the season, 32.4%:mid-season, 22.1%: end of season). Incidence was 38.2%. The most common location for LBP was the middle low back region (39.5%), and the most common duration of pain was a few hours (32.6%). Most participants (79.1%) did not classify their pain as a disability, and only 44.2% of participants received medical treatment for their LBP. An interesting finding was the association between hydration and LBP (p = 0.050), i.e., those individuals who did not hydrate frequently during matches and training were significantly more likely to experience LBP. The results of this study, although limited to a select group of adolescents, showed a higher prevalence of LBP than that of previous studies. More importantly, even though most participants did not experience LBP classified as a disability, LBP still had a large impact on participants, as nearly half of the participants consulted with a medical professional for treatment. Need for the application of further strategies in the prevention and management of LBP in field hockey, such as adequate warm-up and cool-down, stretching exercises, rest between sessions, etc., are recommended as simple strategies to reduce LBP prevalence.Keywords: adolescents, field hockey players, incidence, low back pain, prevalence, risk factors
Procedia PDF Downloads 57376 On Enabling Miner Self-Rescue with In-Mine Robots using Real-Time Object Detection with Thermal Images
Authors: Cyrus Addy, Venkata Sriram Siddhardh Nadendla, Kwame Awuah-Offei
Abstract:
Surface robots in modern underground mine rescue operations suffer from several limitations in enabling a prompt self-rescue. Therefore, the possibility of designing and deploying in-mine robots to expedite miner self-rescue can have a transformative impact on miner safety. These in-mine robots for miner self-rescue can be envisioned to carry out diverse tasks such as object detection, autonomous navigation, and payload delivery. Specifically, this paper investigates the challenges in the design of object detection algorithms for in-mine robots using thermal images, especially to detect people in real-time. A total of 125 thermal images were collected in the Missouri S&T Experimental Mine with the help of student volunteers using the FLIR TG 297 infrared camera, which were pre-processed into training and validation datasets with 100 and 25 images, respectively. Three state-of-the-art, pre-trained real-time object detection models, namely YOLOv5, YOLO-FIRI, and YOLOv8, were considered and re-trained using transfer learning techniques on the training dataset. On the validation dataset, the re-trained YOLOv8 outperforms the re-trained versions of both YOLOv5, and YOLO-FIRI.Keywords: miner self-rescue, object detection, underground mine, YOLO
Procedia PDF Downloads 81375 Load Forecasting Using Neural Network Integrated with Economic Dispatch Problem
Authors: Mariyam Arif, Ye Liu, Israr Ul Haq, Ahsan Ashfaq
Abstract:
High cost of fossil fuels and intensifying installations of alternate energy generation sources are intimidating main challenges in power systems. Making accurate load forecasting an important and challenging task for optimal energy planning and management at both distribution and generation side. There are many techniques to forecast load but each technique comes with its own limitation and requires data to accurately predict the forecast load. Artificial Neural Network (ANN) is one such technique to efficiently forecast the load. Comparison between two different ranges of input datasets has been applied to dynamic ANN technique using MATLAB Neural Network Toolbox. It has been observed that selection of input data on training of a network has significant effects on forecasted results. Day-wise input data forecasted the load accurately as compared to year-wise input data. The forecasted load is then distributed among the six generators by using the linear programming to get the optimal point of generation. The algorithm is then verified by comparing the results of each generator with their respective generation limits.Keywords: artificial neural networks, demand-side management, economic dispatch, linear programming, power generation dispatch
Procedia PDF Downloads 189374 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks
Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam
Abstract:
In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion
Procedia PDF Downloads 123373 A Targeted Maximum Likelihood Estimation for a Non-Binary Causal Variable: An Application
Authors: Mohamed Raouf Benmakrelouf, Joseph Rynkiewicz
Abstract:
Targeted maximum likelihood estimation (TMLE) is well-established method for causal effect estimation with desirable statistical properties. TMLE is a doubly robust maximum likelihood based approach that includes a secondary targeting step that optimizes the target statistical parameter. A causal interpretation of the statistical parameter requires assumptions of the Rubin causal framework. The causal effect of binary variable, E, on outcomes, Y, is defined in terms of comparisons between two potential outcomes as E[YE=1 − YE=0]. Our aim in this paper is to present an adaptation of TMLE methodology to estimate the causal effect of a non-binary categorical variable, providing a large application. We propose coding on the initial data in order to operate a binarization of the interest variable. For each category, we get a transformation of the non-binary interest variable into a binary variable, taking value 1 to indicate the presence of category (or group of categories) for an individual, 0 otherwise. Such a dummy variable makes it possible to have a pair of potential outcomes and oppose a category (or a group of categories) to another category (or a group of categories). Let E be a non-binary interest variable. We propose a complete disjunctive coding of our variable E. We transform the initial variable to obtain a set of binary vectors (dummy variables), E = (Ee : e ∈ {1, ..., |E|}), where each vector (variable), Ee, takes the value of 0 when its category is not present, and the value of 1 when its category is present, which allows to compute a pairwise-TMLE comparing difference in the outcome between one category and all remaining categories. In order to illustrate the application of our strategy, first, we present the implementation of TMLE to estimate the causal effect of non-binary variable on outcome using simulated data. Secondly, we apply our TMLE adaptation to survey data from the French Political Barometer (CEVIPOF), to estimate the causal effect of education level (A five-level variable) on a potential vote in favor of the French extreme right candidate Jean-Marie Le Pen. Counterfactual reasoning requires us to consider some causal questions (additional causal assumptions). Leading to different coding of E, as a set of binary vectors, E = (Ee : e ∈ {2, ..., |E|}), where each vector (variable), Ee, takes the value of 0 when the first category (reference category) is present, and the value of 1 when its category is present, which allows to apply a pairwise-TMLE comparing difference in the outcome between the first level (fixed) and each remaining level. We confirmed that the increase in the level of education decreases the voting rate for the extreme right party.Keywords: statistical inference, causal inference, super learning, targeted maximum likelihood estimation
Procedia PDF Downloads 103372 An Empirical Study on Switching Activation Functions in Shallow and Deep Neural Networks
Authors: Apoorva Vinod, Archana Mathur, Snehanshu Saha
Abstract:
Though there exists a plethora of Activation Functions (AFs) used in single and multiple hidden layer Neural Networks (NN), their behavior always raised curiosity, whether used in combination or singly. The popular AFs –Sigmoid, ReLU, and Tanh–have performed prominently well for shallow and deep architectures. Most of the time, AFs are used singly in multi-layered NN, and, to the best of our knowledge, their performance is never studied and analyzed deeply when used in combination. In this manuscript, we experiment with multi-layered NN architecture (both on shallow and deep architectures; Convolutional NN and VGG16) and investigate how well the network responds to using two different AFs (Sigmoid-Tanh, Tanh-ReLU, ReLU-Sigmoid) used alternately against a traditional, single (Sigmoid-Sigmoid, Tanh-Tanh, ReLUReLU) combination. Our results show that using two different AFs, the network achieves better accuracy, substantially lower loss, and faster convergence on 4 computer vision (CV) and 15 Non-CV (NCV) datasets. When using different AFs, not only was the accuracy greater by 6-7%, but we also accomplished convergence twice as fast. We present a case study to investigate the probability of networks suffering vanishing and exploding gradients when using two different AFs. Additionally, we theoretically showed that a composition of two or more AFs satisfies Universal Approximation Theorem (UAT).Keywords: activation function, universal approximation function, neural networks, convergence
Procedia PDF Downloads 158371 SAMRA: Dataset in Al-Soudani Arabic Maghrebi Script for Recognition of Arabic Ancient Words Handwritten
Authors: Sidi Ahmed Maouloud, Cheikh Ba
Abstract:
Much of West Africa’s cultural heritage is written in the Al-Soudani Arabic script, which was widely used in West Africa before the time of European colonization. This Al-Soudani Arabic script is an African version of the Maghrebi script, in particular, the Al-Mebssout script. However, the local African qualities were incorporated into the Al-Soudani script in a way that gave it a unique African diversity and character. Despite the existence of several Arabic datasets in Oriental script, allowing for the analysis, layout, and recognition of texts written in these calligraphies, many Arabic scripts and written traditions remain understudied. In this paper, we present a dataset of words from Al-Soudani calligraphy scripts. This dataset consists of 100 images selected from three different manuscripts written in Al-Soudani Arabic script by different copyists. The primary source for this database was the libraries of Boston University and Cambridge University. This dataset highlights the unique characteristics of the Al-Soudani Arabic script as well as the new challenges it presents in terms of automatic word recognition of Arabic manuscripts. An HTR system based on a hybrid ANN (CRNN-CTC) is also proposed to test this dataset. SAMRA is a dataset of annotated Arabic manuscript words in the Al-Soudani script that can help researchers automatically recognize and analyze manuscript words written in this script.Keywords: dataset, CRNN-CTC, handwritten words recognition, Al-Soudani Arabic script, HTR, manuscripts
Procedia PDF Downloads 129370 Predictive Analytics in Traffic Flow Management: Integrating Temporal Dynamics and Traffic Characteristics to Estimate Travel Time
Authors: Maria Ezziani, Rabie Zine, Amine Amar, Ilhame Kissani
Abstract:
This paper introduces a predictive model for urban transportation engineering, which is vital for efficient traffic management. Utilizing comprehensive datasets and advanced statistical techniques, the model accurately forecasts travel times by considering temporal variations and traffic dynamics. Machine learning algorithms, including regression trees and neural networks, are employed to capture sequential dependencies. Results indicate significant improvements in predictive accuracy, particularly during peak hours and holidays, with the incorporation of traffic flow and speed variables. Future enhancements may integrate weather conditions and traffic incidents. The model's applications range from adaptive traffic management systems to route optimization algorithms, facilitating congestion reduction and enhancing journey reliability. Overall, this research extends beyond travel time estimation, offering insights into broader transportation planning and policy-making realms, empowering stakeholders to optimize infrastructure utilization and improve network efficiency.Keywords: predictive analytics, traffic flow, travel time estimation, urban transportation, machine learning, traffic management
Procedia PDF Downloads 84369 Constructing a Physics Guided Machine Learning Neural Network to Predict Tonal Noise Emitted by a Propeller
Authors: Arthur D. Wiedemann, Christopher Fuller, Kyle A. Pascioni
Abstract:
With the introduction of electric motors, small unmanned aerial vehicle designers have to consider trade-offs between acoustic noise and thrust generated. Currently, there are few low-computational tools available for predicting acoustic noise emitted by a propeller into the far-field. Artificial neural networks offer a highly non-linear and adaptive model for predicting isolated and interactive tonal noise. But neural networks require large data sets, exceeding practical considerations in modeling experimental results. A methodology known as physics guided machine learning has been applied in this study to reduce the required data set to train the network. After building and evaluating several neural networks, the best model is investigated to determine how the network successfully predicts the acoustic waveform. Lastly, a post-network transfer function is developed to remove discontinuity from the predicted waveform. Overall, methodologies from physics guided machine learning show a notable improvement in prediction performance, but additional loss functions are necessary for constructing predictive networks on small datasets.Keywords: aeroacoustics, machine learning, propeller, rotor, neural network, physics guided machine learning
Procedia PDF Downloads 228368 Ontology Mapping with R-GNN for IT Infrastructure: Enhancing Ontology Construction and Knowledge Graph Expansion
Authors: Andrey Khalov
Abstract:
The rapid growth of unstructured data necessitates advanced methods for transforming raw information into structured knowledge, particularly in domain-specific contexts such as IT service management and outsourcing. This paper presents a methodology for automatically constructing domain ontologies using the DOLCE framework as the base ontology. The research focuses on expanding ITIL-based ontologies by integrating concepts from ITSMO, followed by the extraction of entities and relationships from domain-specific texts through transformers and statistical methods like formal concept analysis (FCA). In particular, this work introduces an R-GNN-based approach for ontology mapping, enabling more efficient entity extraction and ontology alignment with existing knowledge bases. Additionally, the research explores transfer learning techniques using pre-trained transformer models (e.g., DeBERTa-v3-large) fine-tuned on synthetic datasets generated via large language models such as LLaMA. The resulting ontology, termed IT Ontology (ITO), is evaluated against existing methodologies, highlighting significant improvements in precision and recall. This study advances the field of ontology engineering by automating the extraction, expansion, and refinement of ontologies tailored to the IT domain, thus bridging the gap between unstructured data and actionable knowledge.Keywords: ontology mapping, knowledge graphs, R-GNN, ITIL, NER
Procedia PDF Downloads 15367 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm
Authors: Ghada Badr, Arwa Alturki
Abstract:
The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.Keywords: alignment, RNA secondary structure, pairwise, component-based, data mining
Procedia PDF Downloads 458366 Online Pose Estimation and Tracking Approach with Siamese Region Proposal Network
Authors: Cheng Fang, Lingwei Quan, Cunyue Lu
Abstract:
Human pose estimation and tracking are to accurately identify and locate the positions of human joints in the video. It is a computer vision task which is of great significance for human motion recognition, behavior understanding and scene analysis. There has been remarkable progress on human pose estimation in recent years. However, more researches are needed for human pose tracking especially for online tracking. In this paper, a framework, called PoseSRPN, is proposed for online single-person pose estimation and tracking. We use Siamese network attaching a pose estimation branch to incorporate Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) into one framework. The pose estimation branch has a simple network structure that replaces the complex upsampling and convolution network structure with deconvolution. By augmenting the loss of fully convolutional Siamese network with the pose estimation task, pose estimation and tracking can be trained in one stage. Once trained, PoseSRPN only relies on a single bounding box initialization and producing human joints location. The experimental results show that while maintaining the good accuracy of pose estimation on COCO and PoseTrack datasets, the proposed method achieves a speed of 59 frame/s, which is superior to other pose tracking frameworks.Keywords: computer vision, pose estimation, pose tracking, Siamese network
Procedia PDF Downloads 153365 A Graph-Based Retrieval Model for Passage Search
Authors: Junjie Zhong, Kai Hong, Lei Wang
Abstract:
Passage Retrieval (PR) plays an important role in many Natural Language Processing (NLP) tasks. Traditional efficient retrieval models relying on exact term-matching, such as TF-IDF or BM25, have nowadays been exceeded by pre-trained language models which match by semantics. Though they gain effectiveness, deep language models often require large memory as well as time cost. To tackle the trade-off between efficiency and effectiveness in PR, this paper proposes Graph Passage Retriever (GraphPR), a graph-based model inspired by the development of graph learning techniques. Different from existing works, GraphPR is end-to-end and integrates both term-matching information and semantics. GraphPR constructs a passage-level graph from BM25 retrieval results and trains a GCN-like model on the graph with graph-based objectives. Passages were regarded as nodes in the constructed graph and were embedded in dense vectors. PR can then be implemented using embeddings and a fast vector-similarity search. Experiments on a variety of real-world retrieval datasets show that the proposed model outperforms related models in several evaluation metrics (e.g., mean reciprocal rank, accuracy, F1-scores) while maintaining a relatively low query latency and memory usage.Keywords: efficiency, effectiveness, graph learning, language model, passage retrieval, term-matching model
Procedia PDF Downloads 148364 The Relationship between Self-Injurious Behavior and Manner of Death
Authors: Sait Ozsoy, Hacer Yasar Teke, Mustafa Dalgic, Cetin Ketenci, Ertugrul Gok, Kenan Karbeyaz, Azem Irez, Mesut Akyol
Abstract:
Self-mutilating behavior or self-injury behavior (SIB) is defined as: intentional harm to one’s body without intends to commit suicide”. SIB cases are commonly seen in psychiatry and forensic medicine practices. Despite variety of SIB methods, cuts in the skin is the most common (70-97%) injury in this group of patients. Subjects with SIB have one or more other comorbidities which include depression, anxiety, depersonalization, and feeling of worthlessness, borderline personality disorder, antisocial behaviors, and histrionic personality. These individuals feel a high level of hostility towards themselves and their surroundings. Researches have also revealed a strong relationship between antisocial personality disorder, criminal behavior, and SIB. This study has retrospectively evaluated 6,599 autopsy cases performed at forensic medicine institutes of six major cities (Ankara, Izmir, Diyarbakir, Erzurum, Trabzon, Eskisehir) of Turkey in 2013. The study group consisted of all cases with SIB findings (psychopathic cuts, cigarette burns, scars, and etc.). The relationship between causes of death in the study group (SIB subjects) and the control group was investigated. The control group was created from subjects without signs of SIB. Mann-Whitney U test was used for age variables and Chi-square test for categorical variables. Multinomial logistic regression analysis was used in order to analyze group differences in respect to manner of death (natural, accident, homicide, suicide) and analysis of risk factors associated with each group was determined by the Binomial logistic regression analysis. This study used SPSS statistics 15.0 for all its statistical and calculation needs. The statistical significance was p <0.05. There was no significant difference between accidental and natural death among the groups (p=0.737). Also there was a unit increase in number of cuts in psychopathic group while number of accidental death decreased (95% CI: 0.941-0.993) by 0.967 times (p=0.015). In contrast, there was a significant difference between suicidal and natural death (p<0.001), and also between homicidal and natural death (p=0.025). SIB is often seen with borderline and antisocial personality disorder but may be associated with many psychiatric illnesses. Studies have shown a relationship between antisocial personality disorders with criminal behavior and SIB with suicidal behavior. In our study, rate of suicide, murder and intoxication was higher compared to the control group. It could be concluded that SIB can be used as a predictor of possibility of one’s harm to him/herself and other people.Keywords: autopsy, cause of death, forensic science, self-injury behaviour
Procedia PDF Downloads 510363 The Impact of Coronal STIR Imaging in Routine Lumbar MRI: Uncovering Hidden Causes to Enhanced Diagnostic Yield of Back Pain and Sciatica
Authors: Maysoon Nasser Samhan, Somaya Alkiswani, Abdullah Alzibdeh
Abstract:
Background: Routine lumbar MRIs for back pain may yield normal results despite persistent symptoms, which means the possibility of other causes for this pain, which was not shown on the routine images. Research suggests including coronal STIR imaging to detect additional pathologies like sacroiliitis. Objectives: This study aims to enhance diagnostic accuracy and aid in determining treatment processes for patients with persistent back pain who have normal routine lumbar MRI (T1 and T2 images) by incorporating coronal STIR into the examination. Methods: A prospectively conducted study involving 274 patients, 115 males and 159 females, with an age range of 6–92 years, reviewed their medical records and imaging data following a lumbar spine MRI. This study included patients with back pain and sciatica as their primary complaints, all of whom underwent lumbar spine MRIs at our hospital to identify potential pathologies. Using a GE Signa HD 1.5T MRI System, each patient received a standard MRI protocol that included T1 and T2 sagittal and axial sequences, as well as a coronal STIR sequence. We collected relevant MRI findings, including abnormalities and structural variations, from radiology reports. We classified these findings into tables and documented them as counts and percentages, using Fisher’s exact test to assess differences between categorical variables. We conducted a statistical analysis using Prism GraphPad software version 10.1.2. The study adhered to ethical guidelines, institutional review board approvals, and patient confidentiality regulations. Results: Exclusion of the coronal STIR sequence led to 83 subjects (30.29%) being classified as within normal limits on MRI examination. 36 patients without abnormalities on T1 and T2 sequences showed abnormalities on the coronal STIR sequence, with 26 cases attributed to spinal pathologies and 10 to non-spinal pathologies. In addition to that, Fisher's exact test demonstrated a significant association between sacroiliitis diagnosis and abnormalities identified solely through the coronal STIR sequence (P < 0.0001). Conclusion: Implementing coronal STIR imaging as part of routine lumbar MRI protocols has the potential to improve patient care by facilitating a more comprehensive evaluation and management of persistent back pain.Keywords: magnetic resonance imaging, lumber MRI, radiology, neurology
Procedia PDF Downloads 10362 An Optimized Association Rule Mining Algorithm
Authors: Archana Singh, Jyoti Agarwal, Ajay Rana
Abstract:
Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph
Procedia PDF Downloads 420361 Preferred Left-Handed Conformation of Glycyls at Pathogenic Sites
Authors: Purva Mishra, Rajesh Potlia, Kuljeet Singh Sandhu
Abstract:
The role of glycyl residues in the protein structure has lingered within the research community for the last several decades. Glycyl residue is the only amino acid that is achiral due to the lack of a side chain and can, therefore, exhibit Ramachandran conformations that are disallowed for L-amino acids. The structural and functional significance of glycyl residues with L-disallowed conformation, however, remains obscure. Through statistical analysis of various datasets, we found that the glycyls with L-disallowed conformations are over-represented at disease-associated sites and tend to be evolutionarily conserved. The mutations of L-disallowed glycyls tend to destabilize the native conformation, reduce protein solubility, and promote inter-molecular aggregations. We uncovered a structural motif referred to as “β-crescent” formed around the L-disallowed glycyl, which prevents β-sheet aggregation by disrupting the alternating pattern of β-pleats. The L-disallowed conformation of glycyls also holds predictive power to infer the pathogenic missense variants. Altogether, our observations highlight that the L-disallowed conformation of glycyls is selected to facilitate native folding and prevent inter-molecular aggregations. The findings may also have implications for designing more stable proteins and prioritizing the genetic lesions implicated in diseases.Keywords: Ramachandran plot, β-sheet, protein stability, protein aggregation
Procedia PDF Downloads 72360 CTHTC: A Convolution-Backed Transformer Architecture for Temporal Knowledge Graph Embedding with Periodicity Recognition
Authors: Xinyuan Chen, Mohd Nizam Husen, Zhongmei Zhou, Gongde Guo, Wei Gao
Abstract:
Temporal Knowledge Graph Completion (TKGC) has attracted increasing attention for its enormous value; however, existing models lack capabilities to capture both local interactions and global dependencies simultaneously with evolutionary dynamics, while the latest achievements in convolutions and Transformers haven't been employed in this area. What’s more, periodic patterns in TKGs haven’t been fully explored either. To this end, a multi-stage hybrid architecture with convolution-backed Transformers is introduced in TKGC tasks for the first time combining the Hawkes process to model evolving event sequences in a continuous-time domain. In addition, the seasonal-trend decomposition is adopted to identify periodic patterns. Experiments on six public datasets are conducted to verify model effectiveness against state-of-the-art (SOTA) methods. An extensive ablation study is carried out accordingly to evaluate architecture variants as well as the contributions of independent components in addition, paving the way for further potential exploitation. Besides complexity analysis, input sensitivity and safety challenges are also thoroughly discussed for comprehensiveness with novel methods.Keywords: temporal knowledge graph completion, convolution, transformer, Hawkes process, periodicity
Procedia PDF Downloads 78359 Linguistic Features for Sentence Difficulty Prediction in Aspect-Based Sentiment Analysis
Authors: Adrian-Gabriel Chifu, Sebastien Fournier
Abstract:
One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: ”Laptops”, ”Restaurants”, and ”MTSC” (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.Keywords: sentiment analysis, difficulty, classification, machine learning
Procedia PDF Downloads 89