Search results for: dataset development
17108 ARABEX: Automated Dotted Arabic Expiration Date Extraction using Optimized Convolutional Autoencoder and Custom Convolutional Recurrent Neural Network
Authors: Hozaifa Zaki, Ghada Soliman
Abstract:
In this paper, we introduced an approach for Automated Dotted Arabic Expiration Date Extraction using Optimized Convolutional Autoencoder (ARABEX) with bidirectional LSTM. This approach is used for translating the Arabic dot-matrix expiration dates into their corresponding filled-in dates. A custom lightweight Convolutional Recurrent Neural Network (CRNN) model is then employed to extract the expiration dates. Due to the lack of available dataset images for the Arabic dot-matrix expiration date, we generated synthetic images by creating an Arabic dot-matrix True Type Font (TTF) matrix to address this limitation. Our model was trained on a realistic synthetic dataset of 3287 images, covering the period from 2019 to 2027, represented in the format of yyyy/mm/dd. We then trained our custom CRNN model using the generated synthetic images to assess the performance of our model (ARABEX) by extracting expiration dates from the translated images. Our proposed approach achieved an accuracy of 99.4% on the test dataset of 658 images, while also achieving a Structural Similarity Index (SSIM) of 0.46 for image translation on our dataset. The ARABEX approach demonstrates its ability to be applied to various downstream learning tasks, including image translation and reconstruction. Moreover, this pipeline (ARABEX+CRNN) can be seamlessly integrated into automated sorting systems to extract expiry dates and sort products accordingly during the manufacturing stage. By eliminating the need for manual entry of expiration dates, which can be time-consuming and inefficient for merchants, our approach offers significant results in terms of efficiency and accuracy for Arabic dot-matrix expiration date recognition.Keywords: computer vision, deep learning, image processing, character recognition
Procedia PDF Downloads 8217107 Empirical Roughness Progression Models of Heavy Duty Rural Pavements
Authors: Nahla H. Alaswadko, Rayya A. Hassan, Bayar N. Mohammed
Abstract:
Empirical deterministic models have been developed to predict roughness progression of heavy duty spray sealed pavements for a dataset representing rural arterial roads. The dataset provides a good representation of the relevant network and covers a wide range of operating and environmental conditions. A sample with a large size of historical time series data for many pavement sections has been collected and prepared for use in multilevel regression analysis. The modelling parameters include road roughness as performance parameter and traffic loading, time, initial pavement strength, reactivity level of subgrade soil, climate condition, and condition of drainage system as predictor parameters. The purpose of this paper is to report the approaches adopted for models development and validation. The study presents multilevel models that can account for the correlation among time series data of the same section and to capture the effect of unobserved variables. Study results show that the models fit the data very well. The contribution and significance of relevant influencing factors in predicting roughness progression are presented and explained. The paper concludes that the analysis approach used for developing the models confirmed their accuracy and reliability by well-fitting to the validation data.Keywords: roughness progression, empirical model, pavement performance, heavy duty pavement
Procedia PDF Downloads 16817106 Features Reduction Using Bat Algorithm for Identification and Recognition of Parkinson Disease
Authors: P. Shrivastava, A. Shukla, K. Verma, S. Rungta
Abstract:
Parkinson's disease is a chronic neurological disorder that directly affects human gait. It leads to slowness of movement, causes muscle rigidity and tremors. Gait serve as a primary outcome measure for studies aiming at early recognition of disease. Using gait techniques, this paper implements efficient binary bat algorithm for an early detection of Parkinson's disease by selecting optimal features required for classification of affected patients from others. The data of 166 people, both fit and affected is collected and optimal feature selection is done using PSO and Bat algorithm. The reduced dataset is then classified using neural network. The experiments indicate that binary bat algorithm outperforms traditional PSO and genetic algorithm and gives a fairly good recognition rate even with the reduced dataset.Keywords: parkinson, gait, feature selection, bat algorithm
Procedia PDF Downloads 54517105 A Transformer-Based Question Answering Framework for Software Contract Risk Assessment
Authors: Qisheng Hu, Jianglei Han, Yue Yang, My Hoa Ha
Abstract:
When a company is considering purchasing software for commercial use, contract risk assessment is critical to identify risks to mitigate the potential adverse business impact, e.g., security, financial and regulatory risks. Contract risk assessment requires reviewers with specialized knowledge and time to evaluate the legal documents manually. Specifically, validating contracts for a software vendor requires the following steps: manual screening, interpreting legal documents, and extracting risk-prone segments. To automate the process, we proposed a framework to assist legal contract document risk identification, leveraging pre-trained deep learning models and natural language processing techniques. Given a set of pre-defined risk evaluation problems, our framework utilizes the pre-trained transformer-based models for question-answering to identify risk-prone sections in a contract. Furthermore, the question-answering model encodes the concatenated question-contract text and predicts the start and end position for clause extraction. Due to the limited labelled dataset for training, we leveraged transfer learning by fine-tuning the models with the CUAD dataset to enhance the model. On a dataset comprising 287 contract documents and 2000 labelled samples, our best model achieved an F1 score of 0.687.Keywords: contract risk assessment, NLP, transfer learning, question answering
Procedia PDF Downloads 12917104 Problems in Computational Phylogenetics: The Germano-Italo-Celtic Clade
Authors: Laura Mclean
Abstract:
A recurring point of interest in computational phylogenetic analysis of Indo-European family trees is the inference of a Germano-Italo-Celtic clade in some versions of the trees produced. The presence of this clade in the models is intriguing as there is little evidence for innovations shared among Germanic, Italic, and Celtic, the evidence generally used in the traditional method to construct a subgroup. One source of this unexpected outcome could be the input to the models. The datasets in the various models used so far, for the most part, take as their basis the Swadesh list, a list compiled by Morris Swadesh and then revised several times, containing up to 207 words that he believed were resistant to change among languages. The judgments made by Swadesh for this list, however, were subjective and based on his intuition rather than rigorous analysis. Some scholars used the Swadesh 200 list as the basis for their Indo-European dataset and made cognacy judgements for each of the words on the list. Another dataset is largely based on the Swadesh 207 list as well although the authors include additional lexical and non-lexical data, and they implement ‘split coding’ to deal with cases of polymorphic characters. A different team of scholars uses a different dataset, IECoR, which combines several different lists, one of which is the Swadesh 200 list. In fact, the Swadesh list is used in some form in every study surveyed and each dataset has three words that, when they are coded as cognates, seemingly contribute to the inference of a Germano-Italo-Celtic clade which could happen due to these clades sharing three words among only themselves. These three words are ‘fish’, ‘flower’, and ‘man’ (in the case of ‘man’, one dataset includes Lithuanian in the cognacy coding and removes the word ‘man’ from the screened data). This collection of cognates shared among Germanic, Italic, and Celtic that were deemed important enough to be included on the Swadesh list, without the ability to account for possible reasons for shared cognates that are not shared innovations, gives an impression of affinity between the Germanic, Celtic, and Italic branches without adequate methodological support. However, by changing how cognacy is defined (ie. root cognates, borrowings vs inherited cognates etc.), we will be able to identify whether these three cognates are significant enough to infer a clade for Germanic, Celtic, and Italic. This paper examines the question of what definition of cognacy should be used for phylogenetic datasets by examining the Germano-Italo-Celtic clade as a case study and offers insights into the reconstruction of a Germano-Italo-Celtic clade.Keywords: historical, computational, Italo-Celtic, Germanic
Procedia PDF Downloads 5017103 Sustainable Development as a Part of Development and Foreign Trade in Turkey
Authors: Sadife Güngör, Sevilay Konya
Abstract:
Sustainable development is an economic development scope which covers the economic growth included environmental factors. With the help of economic development, the needs of the future generations are going to be met the needs. As it is aimed the environmental conscious, sustainable development focuses on decreasing the damage of natural sources. From this point of view, while sustainable development is environmentally conscious, it also improving the life standards of individuals. The relationship between development and foreign trade on sustainable development is theoretically searched in this study. In the second part, sustainable development at world and EU is searched and in the last part, the sustainability of trade and development in Turkey is stated.Keywords: development, sustainable development, foreign trade, Turkey
Procedia PDF Downloads 46017102 Improving Chest X-Ray Disease Detection with Enhanced Data Augmentation Using Novel Approach of Diverse Conditional Wasserstein Generative Adversarial Networks
Authors: Malik Muhammad Arslan, Muneeb Ullah, Dai Shihan, Daniyal Haider, Xiaodong Yang
Abstract:
Chest X-rays are instrumental in the detection and monitoring of a wide array of diseases, including viral infections such as COVID-19, tuberculosis, pneumonia, lung cancer, and various cardiac and pulmonary conditions. To enhance the accuracy of diagnosis, artificial intelligence (AI) algorithms, particularly deep learning models like Convolutional Neural Networks (CNNs), are employed. However, these deep learning models demand a substantial and varied dataset to attain optimal precision. Generative Adversarial Networks (GANs) can be employed to create new data, thereby supplementing the existing dataset and enhancing the accuracy of deep learning models. Nevertheless, GANs have their limitations, such as issues related to stability, convergence, and the ability to distinguish between authentic and fabricated data. In order to overcome these challenges and advance the detection and classification of CXR normal and abnormal images, this study introduces a distinctive technique known as DCWGAN (Diverse Conditional Wasserstein GAN) for generating synthetic chest X-ray (CXR) images. The study evaluates the effectiveness of this Idiosyncratic DCWGAN technique using the ResNet50 model and compares its results with those obtained using the traditional GAN approach. The findings reveal that the ResNet50 model trained on the DCWGAN-generated dataset outperformed the model trained on the classic GAN-generated dataset. Specifically, the ResNet50 model utilizing DCWGAN synthetic images achieved impressive performance metrics with an accuracy of 0.961, precision of 0.955, recall of 0.970, and F1-Measure of 0.963. These results indicate the promising potential for the early detection of diseases in CXR images using this Inimitable approach.Keywords: CNN, classification, deep learning, GAN, Resnet50
Procedia PDF Downloads 8817101 The Effect of Finding and Development Costs and Gas Price on Basins in the Barnett Shale
Authors: Michael Kenomore, Mohamed Hassan, Amjad Shah, Hom Dhakal
Abstract:
Shale gas reservoirs have been of greater importance compared to shale oil reservoirs since 2009 and with the current nature of the oil market, understanding the technical and economic performance of shale gas reservoirs is of importance. Using the Barnett shale as a case study, an economic model was developed to quantify the effect of finding and development costs and gas prices on the basins in the Barnett shale using net present value as an evaluation parameter. A rate of return of 20% and a payback period of 60 months or less was used as the investment hurdle in the model. The Barnett was split into four basins (Strawn Basin, Ouachita Folded Belt, Forth-worth Syncline and Bend-arch Basin) with analysis conducted on each of the basin to provide a holistic outlook. The dataset consisted of only horizontal wells that started production from 2008 to at most 2015 with 1835 wells coming from the strawn basin, 137 wells from the Ouachita folded belt, 55 wells from the bend-arch basin and 724 wells from the forth-worth syncline. The data was analyzed initially on Microsoft Excel to determine the estimated ultimate recoverable (EUR). The range of EUR from each basin were loaded in the Palisade Risk software and a log normal distribution typical of Barnett shale wells was fitted to the dataset. Monte Carlo simulation was then carried out over a 1000 iterations to obtain a cumulative distribution plot showing the probabilistic distribution of EUR for each basin. From the cumulative distribution plot, the P10, P50 and P90 EUR values for each basin were used in the economic model. Gas production from an individual well with a EUR similar to the calculated EUR was chosen and rescaled to fit the calculated EUR values for each basin at the respective percentiles i.e. P10, P50 and P90. The rescaled production was entered into the economic model to determine the effect of the finding and development cost and gas price on the net present value (10% discount rate/year) as well as also determine the scenario that satisfied the proposed investment hurdle. The finding and development costs used in this paper (assumed to consist only of the drilling and completion costs) were £1 million, £2 million and £4 million while the gas price was varied from $2/MCF-$13/MCF based on Henry Hub spot prices from 2008-2015. One of the major findings in this study was that wells in the bend-arch basin were least economic, higher gas prices are needed in basins containing non-core counties and 90% of the Barnet shale wells were not economic at all finding and development costs irrespective of the gas price in all the basins. This study helps to determine the percentage of wells that are economic at different range of costs and gas prices, determine the basins that are most economic and the wells that satisfy the investment hurdle.Keywords: shale gas, Barnett shale, unconventional gas, estimated ultimate recoverable
Procedia PDF Downloads 30117100 Machine Learning Methods for Network Intrusion Detection
Authors: Mouhammad Alkasassbeh, Mohammad Almseidin
Abstract:
Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE. Procedia PDF Downloads 23417099 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India
Authors: Jonardan Koner
Abstract:
The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model
Procedia PDF Downloads 37117098 Improving Lane Detection for Autonomous Vehicles Using Deep Transfer Learning
Authors: Richard O’Riordan, Saritha Unnikrishnan
Abstract:
Autonomous Vehicles (AVs) are incorporating an increasing number of ADAS features, including automated lane-keeping systems. In recent years, many research papers into lane detection algorithms have been published, varying from computer vision techniques to deep learning methods. The transition from lower levels of autonomy defined in the SAE framework and the progression to higher autonomy levels requires increasingly complex models and algorithms that must be highly reliable in their operation and functionality capacities. Furthermore, these algorithms have no room for error when operating at high levels of autonomy. Although the current research details existing computer vision and deep learning algorithms and their methodologies and individual results, the research also details challenges faced by the algorithms and the resources needed to operate, along with shortcomings experienced during their detection of lanes in certain weather and lighting conditions. This paper will explore these shortcomings and attempt to implement a lane detection algorithm that could be used to achieve improvements in AV lane detection systems. This paper uses a pre-trained LaneNet model to detect lane or non-lane pixels using binary segmentation as the base detection method using an existing dataset BDD100k followed by a custom dataset generated locally. The selected roads will be modern well-laid roads with up-to-date infrastructure and lane markings, while the second road network will be an older road with infrastructure and lane markings reflecting the road network's age. The performance of the proposed method will be evaluated on the custom dataset to compare its performance to the BDD100k dataset. In summary, this paper will use Transfer Learning to provide a fast and robust lane detection algorithm that can handle various road conditions and provide accurate lane detection.Keywords: ADAS, autonomous vehicles, deep learning, LaneNet, lane detection
Procedia PDF Downloads 10417097 Understanding Cognitive Fatigue From FMRI Scans With Self-supervised Learning
Authors: Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Fillia Makedon, Glenn Wylie
Abstract:
Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that records neural activations in the brain by capturing the blood oxygen level in different regions based on the task performed by a subject. Given fMRI data, the problem of predicting the state of cognitive fatigue in a person has not been investigated to its full extent. This paper proposes tackling this issue as a multi-class classification problem by dividing the state of cognitive fatigue into six different levels, ranging from no-fatigue to extreme fatigue conditions. We built a spatio-temporal model that uses convolutional neural networks (CNN) for spatial feature extraction and a long short-term memory (LSTM) network for temporal modeling of 4D fMRI scans. We also applied a self-supervised method called MoCo (Momentum Contrast) to pre-train our model on a public dataset BOLD5000 and fine-tuned it on our labeled dataset to predict cognitive fatigue. Our novel dataset contains fMRI scans from Traumatic Brain Injury (TBI) patients and healthy controls (HCs) while performing a series of N-back cognitive tasks. This method establishes a state-of-the-art technique to analyze cognitive fatigue from fMRI data and beats previous approaches to solve this problem.Keywords: fMRI, brain imaging, deep learning, self-supervised learning, contrastive learning, cognitive fatigue
Procedia PDF Downloads 18917096 Developing Primary Care Datasets for a National Asthma Audit
Authors: Rachael Andrews, Viktoria McMillan, Shuaib Nasser, Christopher M. Roberts
Abstract:
Background and objective: The National Review of Asthma Deaths (NRAD) found that asthma management and care was inadequate in 26% of cases reviewed. Major shortfalls identified were adherence to national guidelines and standards and, particularly, the organisation of care, including supervision and monitoring in primary care, with 70% of cases reviewed having at least one avoidable factor in this area. 5.4 million people in the UK are diagnosed with and actively treated for asthma, and approximately 60,000 are admitted to hospital with acute exacerbations each year. The majority of people with asthma receive management and treatment solely in primary care. This has therefore created concern that many people within the UK are receiving sub-optimal asthma care resulting in unnecessary morbidity and risk of adverse outcome. NRAD concluded that a national asthma audit programme should be established to measure and improve processes, organisation, and outcomes of asthma care. Objective: To develop a primary care dataset enabling extraction of information from GP practices in Wales and providing robust data by which results and lessons could be drawn and drive service development and improvement. Methods: A multidisciplinary group of experts, including general practitioners, primary care organisation representatives, and asthma patients was formed and used as a source of governance and guidance. A review of asthma literature, guidance, and standards took place and was used to identify areas of asthma care which, if improved, would lead to better patient outcomes. Modified Delphi methodology was used to gain consensus from the expert group on which of the areas identified were to be prioritised, and an asthma patient and carer focus group held to seek views and feedback on areas of asthma care that were important to them. Areas of asthma care identified by both groups were mapped to asthma guidelines and standards to inform and develop primary and secondary care datasets covering both adult and pediatric care. Dataset development consisted of expert review and a targeted consultation process in order to seek broad stakeholder views and feedback. Results: Areas of asthma care identified as requiring prioritisation by the National Asthma Audit were: (i) Prescribing, (ii) Asthma diagnosis (iii) Asthma Reviews (iv) Personalised Asthma Action Plans (PAAPs) (v) Primary care follow-up after discharge from hospital (vi) Methodologies and primary care queries were developed to cover each of the areas of poor and variable asthma care identified and the queries designed to extract information directly from electronic patients’ records. Conclusion: This paper describes the methodological approach followed to develop primary care datasets for a National Asthma Audit. It sets out the principles behind the establishment of a National Asthma Audit programme in response to a national asthma mortality review and describes the development activities undertaken. Key process elements included: (i) mapping identified areas of poor and variable asthma care to national guidelines and standards, (ii) early engagement of experts, including clinicians and patients in the process, and (iii) targeted consultation of the queries to provide further insight into measures that were collectable, reproducible and relevant.Keywords: asthma, primary care, general practice, dataset development
Procedia PDF Downloads 17517095 DCDNet: Lightweight Document Corner Detection Network Based on Attention Mechanism
Authors: Kun Xu, Yuan Xu, Jia Qiao
Abstract:
The document detection plays an important role in optical character recognition and text analysis. Because the traditional detection methods have weak generalization ability, and deep neural network has complex structure and large number of parameters, which cannot be well applied in mobile devices, this paper proposes a lightweight Document Corner Detection Network (DCDNet). DCDNet is a two-stage architecture. The first stage with Encoder-Decoder structure adopts depthwise separable convolution to greatly reduce the network parameters. After introducing the Feature Attention Union (FAU) module, the second stage enhances the feature information of spatial and channel dim and adaptively adjusts the size of receptive field to enhance the feature expression ability of the model. Aiming at solving the problem of the large difference in the number of pixel distribution between corner and non-corner, Weighted Binary Cross Entropy Loss (WBCE Loss) is proposed to define corner detection problem as a classification problem to make the training process more efficient. In order to make up for the lack of Dataset of document corner detection, a Dataset containing 6620 images named Document Corner Detection Dataset (DCDD) is made. Experimental results show that the proposed method can obtain fast, stable and accurate detection results on DCDD.Keywords: document detection, corner detection, attention mechanism, lightweight
Procedia PDF Downloads 35417094 Dynamic Interaction between Renwable Energy Consumption and Sustainable Development: Evidence from Ecowas Region
Authors: Maman Ali M. Moustapha, Qian Yu, Benjamin Adjei Danquah
Abstract:
This paper investigates the dynamic interaction between renewable energy consumption (REC) and economic growth using dataset from the Economic Community of West African States (ECOWAS) from 2002 to 2016. For this study the Autoregressive Distributed Lag- Bounds test approach (ARDL) was used to examine the long run relationship between real gross domestic product and REC, while VECM based on Granger causality has been used to examine the direction of Granger causality. Our empirical findings indicate that REC has significant and positive impact on real gross domestic product. In addition, we found that REC and the percentage of access to electricity had unidirectional Granger causality to economic growth while carbon dioxide emission has bidirectional Granger causality to economic growth. Our findings indicate also that 1 per cent increase in the REC leads to an increase in Real GDP by 0.009 in long run. Thus, REC can be a means to ensure sustainable economic growth in the ECOWAS sub-region. However, it is necessary to increase further support and investments on renewable energy production in order to speed up sustainable economic development throughout the regionKeywords: Economic Growth, Renewable Energy, Sustainable Development, Sustainable Energy
Procedia PDF Downloads 20917093 Intelligent Campus Monitoring: YOLOv8-Based High-Accuracy Activity Recognition
Authors: A. Degale Desta, Tamirat Kebamo
Abstract:
Background: Recent advances in computer vision and pattern recognition have significantly improved activity recognition through video analysis, particularly with the application of Deep Convolutional Neural Networks (CNNs). One-stage detectors now enable efficient video-based recognition by simultaneously predicting object categories and locations. Such advancements are highly relevant in educational settings where CCTV surveillance could automatically monitor academic activities, enhancing security and classroom management. However, current datasets and recognition systems lack the specific focus on campus environments necessary for practical application in these settings.Objective: This study aims to address this gap by developing a dataset and testing an automated activity recognition system specifically tailored for educational campuses. The EthioCAD dataset was created to capture various classroom activities and teacher-student interactions, facilitating reliable recognition of academic activities using deep learning models. Method: EthioCAD, a novel video-based dataset, was created with a design science research approach to encompass teacher-student interactions across three domains and 18 distinct classroom activities. Using the Roboflow AI framework, the data was processed, with 4.224 KB of frames and 33.485 MB of images managed for frame extraction, labeling, and organization. The Ultralytics YOLOv8 model was then implemented within Google Colab to evaluate the dataset’s effectiveness, achieving high mean Average Precision (mAP) scores. Results: The YOLOv8 model demonstrated robust activity recognition within campus-like settings, achieving an mAP50 of 90.2% and an mAP50-95 of 78.6%. These results highlight the potential of EthioCAD, combined with YOLOv8, to provide reliable detection and classification of classroom activities, supporting automated surveillance needs on educational campuses. Discussion: The high performance of YOLOv8 on the EthioCAD dataset suggests that automated activity recognition for surveillance is feasible within educational environments. This system addresses current limitations in campus-specific data and tools, offering a tailored solution for academic monitoring that could enhance the effectiveness of CCTV systems in these settings. Conclusion: The EthioCAD dataset, alongside the YOLOv8 model, provides a promising framework for automated campus activity recognition. This approach lays the groundwork for future advancements in CCTV-based educational surveillance systems, enabling more refined and reliable monitoring of classroom activities.Keywords: deep CNN, EthioCAD, deep learning, YOLOv8, activity recognition
Procedia PDF Downloads 1017092 Music Note Detection and Dictionary Generation from Music Sheet Using Image Processing Techniques
Authors: Muhammad Ammar, Talha Ali, Abdul Basit, Bakhtawar Rajput, Zobia Sohail
Abstract:
Music note detection is an area of study for the past few years and has its own influence in music file generation from sheet music. We proposed a method to detect music notes on sheet music using basic thresholding and blob detection. Subsequently, we created a notes dictionary using a semi-supervised learning approach. After notes detection, for each test image, the new symbols are added to the dictionary. This makes the notes detection semi-automatic. The experiments are done on images from a dataset and also on the captured images. The developed approach showed almost 100% accuracy on the dataset images, whereas varying results have been seen on captured images.Keywords: music note, sheet music, optical music recognition, blob detection, thresholding, dictionary generation
Procedia PDF Downloads 18117091 Parametric Approach for Reserve Liability Estimate in Mortgage Insurance
Authors: Rajinder Singh, Ram Valluru
Abstract:
Chain Ladder (CL) method, Expected Loss Ratio (ELR) method and Bornhuetter-Ferguson (BF) method, in addition to more complex transition-rate modeling, are commonly used actuarial reserving methods in general insurance. There is limited published research about their relative performance in the context of Mortgage Insurance (MI). In our experience, these traditional techniques pose unique challenges and do not provide stable claim estimates for medium to longer term liabilities. The relative strengths and weaknesses among various alternative approaches revolve around: stability in the recent loss development pattern, sufficiency and reliability of loss development data, and agreement/disagreement between reported losses to date and ultimate loss estimate. CL method results in volatile reserve estimates, especially for accident periods with little development experience. The ELR method breaks down especially when ultimate loss ratios are not stable and predictable. While the BF method provides a good tradeoff between the loss development approach (CL) and ELR, the approach generates claim development and ultimate reserves that are disconnected from the ever-to-date (ETD) development experience for some accident years that have more development experience. Further, BF is based on subjective a priori assumption. The fundamental shortcoming of these methods is their inability to model exogenous factors, like the economy, which impact various cohorts at the same chronological time but at staggered points along their life-time development. This paper proposes an alternative approach of parametrizing the loss development curve and using logistic regression to generate the ultimate loss estimate for each homogeneous group (accident year or delinquency period). The methodology was tested on an actual MI claim development dataset where various cohorts followed a sigmoidal trend, but levels varied substantially depending upon the economic and operational conditions during the development period spanning over many years. The proposed approach provides the ability to indirectly incorporate such exogenous factors and produce more stable loss forecasts for reserving purposes as compared to the traditional CL and BF methods.Keywords: actuarial loss reserving techniques, logistic regression, parametric function, volatility
Procedia PDF Downloads 13017090 Unsupervised Learning with Self-Organizing Maps for Named Entity Recognition in the CONLL2003 Dataset
Authors: Assel Jaxylykova, Alexnder Pak
Abstract:
This study utilized a Self-Organizing Map (SOM) for unsupervised learning on the CONLL-2003 dataset for Named Entity Recognition (NER). The process involved encoding words into 300-dimensional vectors using FastText. These vectors were input into a SOM grid, where training adjusted node weights to minimize distances. The SOM provided a topological representation for identifying and clustering named entities, demonstrating its efficacy without labeled examples. Results showed an F1-measure of 0.86, highlighting SOM's viability. Although some methods achieve higher F1 measures, SOM eliminates the need for labeled data, offering a scalable and efficient alternative. The SOM's ability to uncover hidden patterns provides insights that could enhance existing supervised methods. Further investigation into potential limitations and optimization strategies is suggested to maximize benefits.Keywords: named entity recognition, natural language processing, self-organizing map, CONLL-2003, semantics
Procedia PDF Downloads 4517089 Optimized Brain Computer Interface System for Unspoken Speech Recognition: Role of Wernicke Area
Authors: Nassib Abdallah, Pierre Chauvet, Abd El Salam Hajjar, Bassam Daya
Abstract:
In this paper, we propose an optimized brain computer interface (BCI) system for unspoken speech recognition, based on the fact that the constructions of unspoken words rely strongly on the Wernicke area, situated in the temporal lobe. Our BCI system has four modules: (i) the EEG Acquisition module based on a non-invasive headset with 14 electrodes; (ii) the Preprocessing module to remove noise and artifacts, using the Common Average Reference method; (iii) the Features Extraction module, using Wavelet Packet Transform (WPT); (iv) the Classification module based on a one-hidden layer artificial neural network. The present study consists of comparing the recognition accuracy of 5 Arabic words, when using all the headset electrodes or only the 4 electrodes situated near the Wernicke area, as well as the selection effect of the subbands produced by the WPT module. After applying the articial neural network on the produced database, we obtain, on the test dataset, an accuracy of 83.4% with all the electrodes and all the subbands of 8 levels of the WPT decomposition. However, by using only the 4 electrodes near Wernicke Area and the 6 middle subbands of the WPT, we obtain a high reduction of the dataset size, equal to approximately 19% of the total dataset, with 67.5% of accuracy rate. This reduction appears particularly important to improve the design of a low cost and simple to use BCI, trained for several words.Keywords: brain-computer interface, speech recognition, artificial neural network, electroencephalography, EEG, wernicke area
Procedia PDF Downloads 27017088 Application of Artificial Immune Systems Combined with Collaborative Filtering in Movie Recommendation System
Authors: Pei-Chann Chang, Jhen-Fu Liao, Chin-Hung Teng, Meng-Hui Chen
Abstract:
This research combines artificial immune system with user and item based collaborative filtering to create an efficient and accurate recommendation system. By applying the characteristic of antibodies and antigens in the artificial immune system and using Pearson correlation coefficient as the affinity threshold to cluster the data, our collaborative filtering can effectively find useful users and items for rating prediction. This research uses MovieLens dataset as our testing target to evaluate the effectiveness of the algorithm developed in this study. The experimental results show that the algorithm can effectively and accurately predict the movie ratings. Compared to some state of the art collaborative filtering systems, our system outperforms them in terms of the mean absolute error on the MovieLens dataset.Keywords: artificial immune system, collaborative filtering, recommendation system, similarity
Procedia PDF Downloads 53517087 Detection and Classification of Mammogram Images Using Principle Component Analysis and Lazy Classifiers
Authors: Rajkumar Kolangarakandy
Abstract:
Feature extraction and selection is the primary part of any mammogram classification algorithms. The choice of feature, attribute or measurements have an important influence in any classification system. Discrete Wavelet Transformation (DWT) coefficients are one of the prominent features for representing images in frequency domain. The features obtained after the decomposition of the mammogram images using wavelet transformations have higher dimension. Even though the features are higher in dimension, they were highly correlated and redundant in nature. The dimensionality reduction techniques play an important role in selecting the optimum number of features from the higher dimension data, which are highly correlated. PCA is a mathematical tool that reduces the dimensionality of the data while retaining most of the variation in the dataset. In this paper, a multilevel classification of mammogram images using reduced discrete wavelet transformation coefficients and lazy classifiers is proposed. The classification is accomplished in two different levels. In the first level, mammogram ROIs extracted from the dataset is classified as normal and abnormal types. In the second level, all the abnormal mammogram ROIs is classified into benign and malignant too. A further classification is also accomplished based on the variation in structure and intensity distribution of the images in the dataset. The Lazy classifiers called Kstar, IBL and LWL are used for classification. The classification results obtained with the reduced feature set is highly promising and the result is also compared with the performance obtained without dimension reduction.Keywords: PCA, wavelet transformation, lazy classifiers, Kstar, IBL, LWL
Procedia PDF Downloads 33517086 Digital Forensics Showdown: Encase and FTK Head-to-Head
Authors: Rida Nasir, Waseem Iqbal
Abstract:
Due to the constant revolution in technology and the increase in anti-forensic techniques used by attackers to remove their traces, professionals often struggle to choose the best tool to be used in digital forensic investigations. This paper compares two of the most well-known and widely used licensed commercial tools, i.e., Encase & FTK. The comparison was drawn on various parameters and features to provide an authentic evaluation of licensed versions of these well-known commercial tools against various real-world scenarios. In order to discover the popularity of these tools within the digital forensic community, a survey was conducted publicly to determine the preferred choice. The dataset used is the Computer Forensics Reference Dataset (CFReDS). A total of 70 features were selected from various categories. Upon comparison, both FTK and EnCase produce remarkable results. However, each tool has some limitations, and none of the tools is declared best. The comparison drawn is completely unbiased, based on factual data.Keywords: digital forensics, commercial tools, investigation, forensic evaluation
Procedia PDF Downloads 1917085 Fast Short-Term Electrical Load Forecasting under High Meteorological Variability with a Multiple Equation Time Series Approach
Authors: Charline David, Alexandre Blondin Massé, Arnaud Zinflou
Abstract:
In 2016, Clements, Hurn, and Li proposed a multiple equation time series approach for the short-term load forecasting, reporting an average mean absolute percentage error (MAPE) of 1.36% on an 11-years dataset for the Queensland region in Australia. We present an adaptation of their model to the electrical power load consumption for the whole Quebec province in Canada. More precisely, we take into account two additional meteorological variables — cloudiness and wind speed — on top of temperature, as well as the use of multiple meteorological measurements taken at different locations on the territory. We also consider other minor improvements. Our final model shows an average MAPE score of 1:79% over an 8-years dataset.Keywords: short-term load forecasting, special days, time series, multiple equations, parallelization, clustering
Procedia PDF Downloads 10317084 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification
Procedia PDF Downloads 46417083 Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening
Authors: Ksheeraj Sai Vepuri, Nada Attar
Abstract:
We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.Keywords: facial expression recognittion, image preprocessing, deep learning, CNN
Procedia PDF Downloads 14317082 How Western Donors Allocate Official Development Assistance: New Evidence From a Natural Language Processing Approach
Authors: Daniel Benson, Yundan Gong, Hannah Kirk
Abstract:
Advancement in national language processing techniques has led to increased data processing speeds, and reduced the need for cumbersome, manual data processing that is often required when processing data from multilateral organizations for specific purposes. As such, using named entity recognition (NER) modeling and the Organisation of Economically Developed Countries (OECD) Creditor Reporting System database, we present the first geotagged dataset of OECD donor Official Development Assistance (ODA) projects on a global, subnational basis. Our resulting data contains 52,086 ODA projects geocoded to subnational locations across 115 countries, worth a combined $87.9bn. This represents the first global, OECD donor ODA project database with geocoded projects. We use this new data to revisit old questions of how ‘well’ donors allocate ODA to the developing world. This understanding is imperative for policymakers seeking to improve ODA effectiveness.Keywords: international aid, geocoding, subnational data, natural language processing, machine learning
Procedia PDF Downloads 7817081 Sourcing and Compiling a Maltese Traffic Dataset MalTra
Authors: Gabriele Borg, Alexei De Bono, Charlie Abela
Abstract:
There on a constant rise in the availability of high volumes of data gathered from multiple sources, resulting in an abundance of unprocessed information that can be used to monitor patterns and trends in user behaviour. Similarly, year after year, Malta is also constantly experiencing ongoing population growth and an increase in mobilization demand. This research takes advantage of data which is continuously being sourced and converting it into useful information related to the traffic problem on the Maltese roads. The scope of this paper is to provide a methodology to create a custom dataset (MalTra - Malta Traffic) compiled from multiple participants from various locations across the island to identify the most common routes taken to expose the main areas of activity. This use of big data is seen being used in various technologies and is referred to as ITSs (Intelligent Transportation Systems), which has been concluded that there is significant potential in utilising such sources of data on a nationwide scale.Keywords: Big Data, vehicular traffic, traffic management, mobile data patterns
Procedia PDF Downloads 10917080 Image Ranking to Assist Object Labeling for Training Detection Models
Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman
Abstract:
Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack rarely seen examples. This is due to the process commonly used during human annotation where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.Keywords: computer vision, deep learning, object detection, semiconductor
Procedia PDF Downloads 13617079 Geographical Information System and Multi-Criteria Based Approach to Locate Suitable Sites for Industries to Minimize Agriculture Land Use Changes in Bangladesh
Authors: Nazia Muhsin, Tofael Ahamed, Ryozo Noguchi, Tomohiro Takigawa
Abstract:
One of the most challenging issues to achieve sustainable development on food security is land use changes. The crisis of lands for agricultural production mainly arises from the unplanned transformation of agricultural lands to infrastructure development i.e. urbanization and industrialization. Land use without sustainability assessment could have impact on the food security and environmental protections. Bangladesh, as the densely populated country with limited arable lands is now facing challenges to meet sustainable food security. Agricultural lands are using for economic growth by establishing industries. The industries are spreading from urban areas to the suburban areas and using the agricultural lands. To minimize the agricultural land losses for unplanned industrialization, compact economic zones should be find out in a scientific approach. Therefore, the purpose of the study was to find out suitable sites for industrial growth by land suitability analysis (LSA) by using Geographical Information System (GIS) and multi-criteria analysis (MCA). The goal of the study was to emphases both agricultural lands and industries for sustainable development in land use. The study also attempted to analysis the agricultural land use changes in a suburban area by statistical data of agricultural lands and primary data of the existing industries of the study place. The criteria were selected as proximity to major roads, and proximity to local roads, distant to rivers, waterbodies, settlements, flood-flow zones, agricultural lands for the LSA. The spatial dataset for the criteria were collected from the respective departments of Bangladesh. In addition, the elevation spatial dataset were used from the SRTM (Shuttle Radar Topography Mission) data source. The criteria were further analyzed with factors and constraints in ArcGIS®. Expert’s opinion were applied for weighting the criteria according to the analytical hierarchy process (AHP), a multi-criteria technique. The decision rule was set by using ‘weighted overlay’ tool to aggregate the factors and constraints with the weights of the criteria. The LSA found only 5% of land was most suitable for industrial sites and few compact lands for industrial zones. The developed LSA are expected to help policy makers of land use and urban developers to ensure the sustainability of land uses and agricultural production.Keywords: AHP (analytical hierarchy process), GIS (geographic information system), LSA (land suitability analysis), MCA (multi-criteria analysis)
Procedia PDF Downloads 263