Search results for: forest cover-type dataset
1688 Sentiment Classification Using Enhanced Contextual Valence Shifters
Authors: Vo Ngoc Phu, Phan Thi Tuoi
Abstract:
We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.Keywords: sentiment classification, sentiment orientation, valence shifters, contextual, valence shifters, term counting
Procedia PDF Downloads 5051687 Using the Smith-Waterman Algorithm to Extract Features in the Classification of Obesity Status
Authors: Rosa Figueroa, Christopher Flores
Abstract:
Text categorization is the problem of assigning a new document to a set of predetermined categories, on the basis of a training set of free-text data that contains documents whose category membership is known. To train a classification model, it is necessary to extract characteristics in the form of tokens that facilitate the learning and classification process. In text categorization, the feature extraction process involves the use of word sequences also known as N-grams. In general, it is expected that documents belonging to the same category share similar features. The Smith-Waterman (SW) algorithm is a dynamic programming algorithm that performs a local sequence alignment in order to determine similar regions between two strings or protein sequences. This work explores the use of SW algorithm as an alternative to feature extraction in text categorization. The dataset used for this purpose, contains 2,610 annotated documents with the classes Obese/Non-Obese. This dataset was represented in a matrix form using the Bag of Word approach. The score selected to represent the occurrence of the tokens in each document was the term frequency-inverse document frequency (TF-IDF). In order to extract features for classification, four experiments were conducted: the first experiment used SW to extract features, the second one used unigrams (single word), the third one used bigrams (two word sequence) and the last experiment used a combination of unigrams and bigrams to extract features for classification. To test the effectiveness of the extracted feature set for the four experiments, a Support Vector Machine (SVM) classifier was tuned using 20% of the dataset. The remaining 80% of the dataset together with 5-Fold Cross Validation were used to evaluate and compare the performance of the four experiments of feature extraction. Results from the tuning process suggest that SW performs better than the N-gram based feature extraction. These results were confirmed by using the remaining 80% of the dataset, where SW performed the best (accuracy = 97.10%, weighted average F-measure = 97.07%). The second best was obtained by the combination of unigrams-bigrams (accuracy = 96.04, weighted average F-measure = 95.97) closely followed by the bigrams (accuracy = 94.56%, weighted average F-measure = 94.46%) and finally unigrams (accuracy = 92.96%, weighted average F-measure = 92.90%).Keywords: comorbidities, machine learning, obesity, Smith-Waterman algorithm
Procedia PDF Downloads 2971686 Classification of Contexts for Mentioning Love in Interviews with Victims of the Holocaust
Authors: Marina Yurievna Aleksandrova
Abstract:
Research of the Holocaust retains value not only for history but also for sociology and psychology. One of the most important fields of study is how people were coping during and after this traumatic event. The aim of this paper is to identify the main contexts of the topic of love and to determine which contexts are more characteristic for different groups of victims of the Holocaust (gender, nationality, age). In this research, transcripts of interviews with Holocaust victims that were collected during 1946 for the "Voices of the Holocaust" project were used as data. Main contexts were analyzed with methods of network analysis and latent semantic analysis and classified by gender, age, and nationality with random forest. The results show that love is articulated and described significantly differently for male and female informants, nationality is shown results with lower values of quality metrics, as well as the age.Keywords: Holocaust, latent semantic analysis, network analysis, text-mining, random forest
Procedia PDF Downloads 1811685 Public Participation Best Practices in Environmental Decision-making in Newfoundland and Labrador: Analyzing the Forestry Management Planning Process
Authors: Kimberley K. Whyte-Jones
Abstract:
Public participation may improve the quality of environmental management decisions. However, the quality of such a decision is strongly dependent on the quality of the process that leads to it. In order to ensure an effective and efficient process, key features of best practice in participation should be carefully observed; this would also combat disillusionment of citizens, decision-makers and practitioners. The overarching aim of this study is to determine what constitutes an effective public participation process relevant to the Newfoundland and Labrador, Canada context, and to discover whether the public participation process that led to the 2014-2024 Provincial Sustainable Forest Management Strategy (PSFMS) met best practices criteria. The research design uses an exploratory case study strategy to consider a specific participatory process in environmental decision-making in Newfoundland and Labrador. Data collection methods include formal semi-structured interviews and the review of secondary data sources. The results of this study will determine the validity of a specific public participation best practice framework. The findings will be useful for informing citizen participation processes in general and will deduce best practices in public participation in environmental management in the province. The study is, therefore, meaningful for guiding future policies and practices in the management of forest resources in the province of Newfoundland and Labrador, and will help in filling a noticeable gap in research compiling best practices for environmentally related public participation processes.Keywords: best practices, environmental decision-making, forest management, public participation
Procedia PDF Downloads 3221684 An Application-Driven Procedure for Optimal Signal Digitization of Automotive-Grade Ultrasonic Sensors
Authors: Mohamed Shawki Elamir, Heinrich Gotzig, Raoul Zoellner, Patrick Maeder
Abstract:
In this work, a methodology is presented for identifying the optimal digitization parameters for the analog signal of ultrasonic sensors. These digitization parameters are the resolution of the analog to digital conversion and the sampling rate. This is accomplished through the derivation of characteristic curves based on Fano inequality and the calculation of the mutual information content over a given dataset. The mutual information is calculated between the examples in the dataset and the corresponding variation in the feature that needs to be estimated. The optimal parameters are identified in a manner that ensures optimal estimation performance while preventing inefficiency in using unnecessarily powerful analog to digital converters.Keywords: analog to digital conversion, digitization, sampling rate, ultrasonic
Procedia PDF Downloads 2071683 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second
Authors: P. V. Pramila , V. Mahesh
Abstract:
Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest
Procedia PDF Downloads 3111682 Applying Semi-Automatic Digital Aerial Survey Technology and Canopy Characters Classification for Surface Vegetation Interpretation of Archaeological Sites
Authors: Yung-Chung Chuang
Abstract:
The cultural layers of archaeological sites are mainly affected by surface land use, land cover, and root system of surface vegetation. For this reason, continuous monitoring of land use and land cover change is important for archaeological sites protection and management. However, in actual operation, on-site investigation and orthogonal photograph interpretation require a lot of time and manpower. For this reason, it is necessary to perform a good alternative for surface vegetation survey in an automated or semi-automated manner. In this study, we applied semi-automatic digital aerial survey technology and canopy characters classification with very high-resolution aerial photographs for surface vegetation interpretation of archaeological sites. The main idea is based on different landscape or forest type can easily be distinguished with canopy characters (e.g., specific texture distribution, shadow effects and gap characters) extracted by semi-automatic image classification. A novel methodology to classify the shape of canopy characters using landscape indices and multivariate statistics was also proposed. Non-hierarchical cluster analysis was used to assess the optimal number of canopy character clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy character classification (seven categories). Therefore, people could easily predict the forest type and vegetation land cover by corresponding to the specific canopy character category. The results showed that the semi-automatic classification could effectively extract the canopy characters of forest and vegetation land cover. As for forest type and vegetation type prediction, the average prediction accuracy reached 80.3%~91.7% with different sizes of test frame. It represented this technology is useful for archaeological site survey, and can improve the classification efficiency and data update rate.Keywords: digital aerial survey, canopy characters classification, archaeological sites, multivariate statistics
Procedia PDF Downloads 1431681 Multi-Objective Optimal Threshold Selection for Similarity Functions in Siamese Networks for Semantic Textual Similarity Tasks
Authors: Kriuk Boris, Kriuk Fedor
Abstract:
This paper presents a comparative study of fundamental similarity functions for Siamese networks in semantic textual similarity (STS) tasks. We evaluate various similarity functions using the STS Benchmark dataset, analyzing their performance and stability. Additionally, we introduce a multi-objective approach for optimal threshold selection. Our findings provide insights into the effectiveness of different similarity functions and offer a straightforward method for threshold selection optimization, contributing to the advancement of Siamese network architectures in STS applications.Keywords: siamese networks, semantic textual similarity, similarity functions, STS benchmark dataset, threshold selection
Procedia PDF Downloads 391680 The Use of Remotely Sensed Data to Model Habitat Selections of Pileated Woodpeckers (Dryocopus pileatus) in Fragmented Landscapes
Authors: Ruijia Hu, Susanna T.Y. Tong
Abstract:
Light detection and ranging (LiDAR) and four-channel red, green, blue, and near-infrared (RGBI) remote sensed imageries allow an accurate quantification and contiguous measurement of vegetation characteristics and forest structures. This information facilitates the generation of habitat structure variables for forest species distribution modelling. However, applications of remote sensing data, especially the combination of structural and spectral information, to support evidence-based decisions in forest managements and conservation practices at local scale are not widely adopted. In this study, we examined the habitat requirements of pileated woodpecker (Dryocopus pileatus) (PW) in Hamilton County, Ohio, using ecologically relevant forest structural and vegetation characteristics derived from LiDAR and RGBI data. We hypothesized that the habitat of PW is shaped by vegetation characteristics that are directly associated with the availability of food, hiding and nesting resources, the spatial arrangement of habitat patches within home range, as well as proximity to water sources. We used 186 PW presence or absence locations to model their presence and absence in generalized additive model (GAM) at two scales, representing foraging and home range size, respectively. The results confirm PW’s preference for tall and large mature stands with structural complexity, typical of late-successional or old-growth forests. Besides, the crown size of dead trees shows a positive relationship with PW occurrence, therefore indicating the importance of declining living trees or early-stage dead trees within PW home range. These locations are preferred by PW for nest cavity excavation as it attempts to balance the ease of excavation and tree security. In addition, we found that PW can adjust its travel distance to the nearest water resource, suggesting that habitat fragmentation can have certain impacts on PW. Based on our findings, we recommend that forest managers should use different priorities to manage nesting, roosting, and feeding habitats. Particularly, when devising forest management and hazard tree removal plans, one needs to consider retaining enough cavity trees within high-quality PW habitat. By mapping PW habitat suitability for the study area, we highlight the importance of riparian corridor in facilitating PW to adjust to the fragmented urban landscape. Indeed, habitat improvement for PW in the study area could be achieved by conserving riparian corridors and promoting riparian forest succession along major rivers in Hamilton County.Keywords: deadwood detection, generalized additive model, individual tree crown delineation, LiDAR, pileated woodpecker, RGBI aerial imagery, species distribution models
Procedia PDF Downloads 531679 Tongue Image Retrieval Based Using Machine Learning
Authors: Ahmad FAROOQ, Xinfeng Zhang, Fahad Sabah, Raheem Sarwar
Abstract:
In Traditional Chinese Medicine, tongue diagnosis is a vital inspection tool (TCM). In this study, we explore the potential of machine learning in tongue diagnosis. It begins with the cataloguing of the various classifications and characteristics of the human tongue. We infer 24 kinds of tongues from the material and coating of the tongue, and we identify 21 attributes of the tongue. The next step is to apply machine learning methods to the tongue dataset. We use the Weka machine learning platform to conduct the experiment for performance analysis. The 457 instances of the tongue dataset are used to test the performance of five different machine learning methods, including SVM, Random Forests, Decision Trees, and Naive Bayes. Based on accuracy and Area under the ROC Curve, the Support Vector Machine algorithm was shown to be the most effective for tongue diagnosis (AUC).Keywords: medical imaging, image retrieval, machine learning, tongue
Procedia PDF Downloads 821678 Hybridization of Manually Extracted and Convolutional Features for Classification of Chest X-Ray of COVID-19
Authors: M. Bilal Ishfaq, Adnan N. Qureshi
Abstract:
COVID-19 is the most infectious disease these days, it was first reported in Wuhan, the capital city of Hubei in China then it spread rapidly throughout the whole world. Later on 11 March 2020, the World Health Organisation (WHO) declared it a pandemic. Since COVID-19 is highly contagious, it has affected approximately 219M people worldwide and caused 4.55M deaths. It has brought the importance of accurate diagnosis of respiratory diseases such as pneumonia and COVID-19 to the forefront. In this paper, we propose a hybrid approach for the automated detection of COVID-19 using medical imaging. We have presented the hybridization of manually extracted and convolutional features. Our approach combines Haralick texture features and convolutional features extracted from chest X-rays and CT scans. We also employ a minimum redundancy maximum relevance (MRMR) feature selection algorithm to reduce computational complexity and enhance classification performance. The proposed model is evaluated on four publicly available datasets, including Chest X-ray Pneumonia, COVID-19 Pneumonia, COVID-19 CTMaster, and VinBig data. The results demonstrate high accuracy and effectiveness, with 0.9925 on the Chest X-ray pneumonia dataset, 0.9895 on the COVID-19, Pneumonia and Normal Chest X-ray dataset, 0.9806 on the Covid CTMaster dataset, and 0.9398 on the VinBig dataset. We further evaluate the effectiveness of the proposed model using ROC curves, where the AUC for the best-performing model reaches 0.96. Our proposed model provides a promising tool for the early detection and accurate diagnosis of COVID-19, which can assist healthcare professionals in making informed treatment decisions and improving patient outcomes. The results of the proposed model are quite plausible and the system can be deployed in a clinical or research setting to assist in the diagnosis of COVID-19.Keywords: COVID-19, feature engineering, artificial neural networks, radiology images
Procedia PDF Downloads 751677 The Material Behavior in Curved Glulam Beam of Jabon Timber
Authors: Erma Desmaliana, Saptahari Sugiri
Abstract:
Limited availability of solid timber in large dimensions becomes a problem. The demands of timbers in Indonesia is more increasing compared to its supply from natural forest. It is associated with the issues of global warming and environmental preservation. The uses of timbers from HTI (Industrial Planting Forest) and HTR (Society Planting Forest), such as Jabon, is an alternative source that required to solve these problems. Having shorter lifespan is the benefit of HTI/HTR timbers, although they are relatively smaller in dimension and lower in strength. Engineering Wood Product (EWP) such as glulam (glue-laminated) timber, is required to overcome their losses. Glulam is fabricated by gluing the wooden planks that having a thickness of 20 to 45 mm with an adhesive material and a certain pressure. Glulam can be made a curved beam, is one of the advantages, thus making it strength is greater than a straight beam. This paper is aimed to know the material behavior of curved glue-laminated beam of Jabon timber. Preliminary methods was to gain physical and mechanical properties, and glue spread strength of Jabon timber, which following the ASTM D-143 standard test method. Dimension of beams were 50 mm wide, 760 mm span, 50 mm thick, and 50 mm rise. Each layer of Jabon has a thickness of 5 mm and is glued with polyurethane. Cold press will be applied to beam laminated specimens for more than 5 hours. The curved glue-laminated beams specimens will be tested about the bending behavior. This experiments aims to obtain the increasing of load carrying capacity and stiffness of curved glulam beam.Keywords: curved glulam beam, HTR&HTI, load carrying, strength
Procedia PDF Downloads 2991676 ARABEX: Automated Dotted Arabic Expiration Date Extraction using Optimized Convolutional Autoencoder and Custom Convolutional Recurrent Neural Network
Authors: Hozaifa Zaki, Ghada Soliman
Abstract:
In this paper, we introduced an approach for Automated Dotted Arabic Expiration Date Extraction using Optimized Convolutional Autoencoder (ARABEX) with bidirectional LSTM. This approach is used for translating the Arabic dot-matrix expiration dates into their corresponding filled-in dates. A custom lightweight Convolutional Recurrent Neural Network (CRNN) model is then employed to extract the expiration dates. Due to the lack of available dataset images for the Arabic dot-matrix expiration date, we generated synthetic images by creating an Arabic dot-matrix True Type Font (TTF) matrix to address this limitation. Our model was trained on a realistic synthetic dataset of 3287 images, covering the period from 2019 to 2027, represented in the format of yyyy/mm/dd. We then trained our custom CRNN model using the generated synthetic images to assess the performance of our model (ARABEX) by extracting expiration dates from the translated images. Our proposed approach achieved an accuracy of 99.4% on the test dataset of 658 images, while also achieving a Structural Similarity Index (SSIM) of 0.46 for image translation on our dataset. The ARABEX approach demonstrates its ability to be applied to various downstream learning tasks, including image translation and reconstruction. Moreover, this pipeline (ARABEX+CRNN) can be seamlessly integrated into automated sorting systems to extract expiry dates and sort products accordingly during the manufacturing stage. By eliminating the need for manual entry of expiration dates, which can be time-consuming and inefficient for merchants, our approach offers significant results in terms of efficiency and accuracy for Arabic dot-matrix expiration date recognition.Keywords: computer vision, deep learning, image processing, character recognition
Procedia PDF Downloads 821675 Changes in Forest Cover Regulate Streamflow in Central Nigerian Gallery Forests
Authors: Rahila Yilangai, Sonali Saha, Amartya Saha, Augustine Ezealor
Abstract:
Gallery forests in sub-Saharan Africa are drastically disappearing due to intensive anthropogenic activities thus reducing ecosystem services, one of which is water provisioning. The role played by forest cover in regulating streamflow and water yield is not well understood, especially in West Africa. This pioneering 2-year study investigated the interrelationships between plant cover and hydrology in protected and unprotected gallery forests. Rainfall, streamflow, and evapotranspiration (ET) measurements/estimates over 2015-2016 were obtained to form a water balance for both catchments. In addition, transpiration in the protected gallery forest with high vegetation cover was calculated from stomatal conductance readings of selected species chosen from plot level data of plant diversity and abundance. Results showed that annual streamflow was significantly higher in the unprotected site than the protected site, even when normalized by catchment area. However, streamflow commenced earlier and lasted longer in the protected site than the degraded unprotected site, suggesting regulation by the greater tree density in the protected site. Streamflow correlated strongly with rainfall with the highest peak in August. As expected, transpiration measurements were less than potential evapotranspiration estimates, while rainfall exceeded ET in the water cycle. The water balance partitioning suggests that the lower vegetation cover in the unprotected catchment leads to a larger runoff in the rainy season and less infiltration, thereby leading to streams drying up earlier, than in the protected catchment. This baseline information is important in understanding the contribution of plants in water cycle regulation, for modeling integrative water management in applied research and natural resource management in sustaining water resources with changing the land cover and climate uncertainties in this data-poor region.Keywords: evapotranspiration, gallery forest, rainfall, streamflow, transpiration
Procedia PDF Downloads 1741674 Economic Development and New Challenges: Biomass Energy and Sustainability
Authors: Fabricia G. F. S. Rossato, Ieda G. Hidalgo, Andres Susseta, Felipe Casale, Leticia H. Nakamiti
Abstract:
This research was conducted to show the useful source of biomass energy provided from forest waste and the black liquor from the pulping process. This energy source could be able to assist and improve its area environment in a sustainable way. The research will demonstrate the challenges from producing the biomass energy and the implantation of the pulp industry in the city of Três Lagoas, MS. – Brazil. Planted forest’s potential, energy production in the pulp industries and its consequence of impacts on the local region environmental was also studied and examined. The present study is classified as descriptive purposes as it exposes the characteristics of a given population and the means such as bibliographical and documentary. All the data and information collected and demonstrate in this study was carefully analyzed and provided from reliable sources such as official government agencies.Keywords: Brazil, pulp industry, renewable energy, Três Lagoas
Procedia PDF Downloads 3281673 Features Reduction Using Bat Algorithm for Identification and Recognition of Parkinson Disease
Authors: P. Shrivastava, A. Shukla, K. Verma, S. Rungta
Abstract:
Parkinson's disease is a chronic neurological disorder that directly affects human gait. It leads to slowness of movement, causes muscle rigidity and tremors. Gait serve as a primary outcome measure for studies aiming at early recognition of disease. Using gait techniques, this paper implements efficient binary bat algorithm for an early detection of Parkinson's disease by selecting optimal features required for classification of affected patients from others. The data of 166 people, both fit and affected is collected and optimal feature selection is done using PSO and Bat algorithm. The reduced dataset is then classified using neural network. The experiments indicate that binary bat algorithm outperforms traditional PSO and genetic algorithm and gives a fairly good recognition rate even with the reduced dataset.Keywords: parkinson, gait, feature selection, bat algorithm
Procedia PDF Downloads 5451672 Characterization of Forest Fire Fuel in Shivalik Himalayas Using Hyperspectral Remote Sensing
Authors: Neha Devi, P. K. Joshi
Abstract:
Fire fuel map is one of the most critical factors for planning and managing the fire hazard and risk. One of the most significant forms of global disturbance, impacting community dynamics, biogeochemical cycles and local and regional climate across a wide range of ecosystems ranging from boreal forests to tropical rainforest is wildfire Assessment of fire danger is a function of forest type, fuelwood stock volume, moisture content, degree of senescence and fire management strategy adopted in the ground. Remote sensing has potential of reduction the uncertainty in mapping fuels. Hyperspectral remote sensing is emerging to be a very promising technology for wildfire fuels characterization. Fine spectral information also facilitates mapping of biophysical and chemical information that is directly related to the quality of forest fire fuels including above ground live biomass, canopy moisture, etc. We used Hyperion imagery acquired in February, 2016 and analysed four fuel characteristics using Hyperion sensor data on-board EO-1 satellite, acquired over the Shiwalik Himalayas covering the area of Champawat, Uttarakhand state. The main objective of this study was to present an overview of methodologies for mapping fuel properties using hyperspectral remote sensing data. Fuel characteristics analysed include fuel biomass, fuel moisture, and fuel condition and fuel type. Fuel moisture and fuel biomass were assessed through the expression of the liquid water bands. Fuel condition and type was assessed using green vegetation, non-photosynthetic vegetation and soil as Endmember for spectral mixture analysis. Linear Spectral Unmixing, a partial spectral unmixing algorithm, was used to identify the spectral abundance of green vegetation, non-photosynthetic vegetation and soil.Keywords: forest fire fuel, Hyperion, hyperspectral, linear spectral unmixing, spectral mixture analysis
Procedia PDF Downloads 1651671 Real Estate Trend Prediction with Artificial Intelligence Techniques
Authors: Sophia Liang Zhou
Abstract:
For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three-time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy ( > 95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as the most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have a significant influence on the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed the best stable performance overall, with accuracies > 95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.Keywords: linear regression, random forest, artificial neural network, real estate price prediction
Procedia PDF Downloads 1031670 A Transformer-Based Question Answering Framework for Software Contract Risk Assessment
Authors: Qisheng Hu, Jianglei Han, Yue Yang, My Hoa Ha
Abstract:
When a company is considering purchasing software for commercial use, contract risk assessment is critical to identify risks to mitigate the potential adverse business impact, e.g., security, financial and regulatory risks. Contract risk assessment requires reviewers with specialized knowledge and time to evaluate the legal documents manually. Specifically, validating contracts for a software vendor requires the following steps: manual screening, interpreting legal documents, and extracting risk-prone segments. To automate the process, we proposed a framework to assist legal contract document risk identification, leveraging pre-trained deep learning models and natural language processing techniques. Given a set of pre-defined risk evaluation problems, our framework utilizes the pre-trained transformer-based models for question-answering to identify risk-prone sections in a contract. Furthermore, the question-answering model encodes the concatenated question-contract text and predicts the start and end position for clause extraction. Due to the limited labelled dataset for training, we leveraged transfer learning by fine-tuning the models with the CUAD dataset to enhance the model. On a dataset comprising 287 contract documents and 2000 labelled samples, our best model achieved an F1 score of 0.687.Keywords: contract risk assessment, NLP, transfer learning, question answering
Procedia PDF Downloads 1291669 Problems in Computational Phylogenetics: The Germano-Italo-Celtic Clade
Authors: Laura Mclean
Abstract:
A recurring point of interest in computational phylogenetic analysis of Indo-European family trees is the inference of a Germano-Italo-Celtic clade in some versions of the trees produced. The presence of this clade in the models is intriguing as there is little evidence for innovations shared among Germanic, Italic, and Celtic, the evidence generally used in the traditional method to construct a subgroup. One source of this unexpected outcome could be the input to the models. The datasets in the various models used so far, for the most part, take as their basis the Swadesh list, a list compiled by Morris Swadesh and then revised several times, containing up to 207 words that he believed were resistant to change among languages. The judgments made by Swadesh for this list, however, were subjective and based on his intuition rather than rigorous analysis. Some scholars used the Swadesh 200 list as the basis for their Indo-European dataset and made cognacy judgements for each of the words on the list. Another dataset is largely based on the Swadesh 207 list as well although the authors include additional lexical and non-lexical data, and they implement ‘split coding’ to deal with cases of polymorphic characters. A different team of scholars uses a different dataset, IECoR, which combines several different lists, one of which is the Swadesh 200 list. In fact, the Swadesh list is used in some form in every study surveyed and each dataset has three words that, when they are coded as cognates, seemingly contribute to the inference of a Germano-Italo-Celtic clade which could happen due to these clades sharing three words among only themselves. These three words are ‘fish’, ‘flower’, and ‘man’ (in the case of ‘man’, one dataset includes Lithuanian in the cognacy coding and removes the word ‘man’ from the screened data). This collection of cognates shared among Germanic, Italic, and Celtic that were deemed important enough to be included on the Swadesh list, without the ability to account for possible reasons for shared cognates that are not shared innovations, gives an impression of affinity between the Germanic, Celtic, and Italic branches without adequate methodological support. However, by changing how cognacy is defined (ie. root cognates, borrowings vs inherited cognates etc.), we will be able to identify whether these three cognates are significant enough to infer a clade for Germanic, Celtic, and Italic. This paper examines the question of what definition of cognacy should be used for phylogenetic datasets by examining the Germano-Italo-Celtic clade as a case study and offers insights into the reconstruction of a Germano-Italo-Celtic clade.Keywords: historical, computational, Italo-Celtic, Germanic
Procedia PDF Downloads 511668 Improving Chest X-Ray Disease Detection with Enhanced Data Augmentation Using Novel Approach of Diverse Conditional Wasserstein Generative Adversarial Networks
Authors: Malik Muhammad Arslan, Muneeb Ullah, Dai Shihan, Daniyal Haider, Xiaodong Yang
Abstract:
Chest X-rays are instrumental in the detection and monitoring of a wide array of diseases, including viral infections such as COVID-19, tuberculosis, pneumonia, lung cancer, and various cardiac and pulmonary conditions. To enhance the accuracy of diagnosis, artificial intelligence (AI) algorithms, particularly deep learning models like Convolutional Neural Networks (CNNs), are employed. However, these deep learning models demand a substantial and varied dataset to attain optimal precision. Generative Adversarial Networks (GANs) can be employed to create new data, thereby supplementing the existing dataset and enhancing the accuracy of deep learning models. Nevertheless, GANs have their limitations, such as issues related to stability, convergence, and the ability to distinguish between authentic and fabricated data. In order to overcome these challenges and advance the detection and classification of CXR normal and abnormal images, this study introduces a distinctive technique known as DCWGAN (Diverse Conditional Wasserstein GAN) for generating synthetic chest X-ray (CXR) images. The study evaluates the effectiveness of this Idiosyncratic DCWGAN technique using the ResNet50 model and compares its results with those obtained using the traditional GAN approach. The findings reveal that the ResNet50 model trained on the DCWGAN-generated dataset outperformed the model trained on the classic GAN-generated dataset. Specifically, the ResNet50 model utilizing DCWGAN synthetic images achieved impressive performance metrics with an accuracy of 0.961, precision of 0.955, recall of 0.970, and F1-Measure of 0.963. These results indicate the promising potential for the early detection of diseases in CXR images using this Inimitable approach.Keywords: CNN, classification, deep learning, GAN, Resnet50
Procedia PDF Downloads 881667 Effect of Acid-Basic Treatments of Lingocellulosic Material Forest Wastes Wild Carob on Ethyl Violet Dye Adsorption
Authors: Abdallah Bouguettoucha, Derradji Chebli, Tariq Yahyaoui, Hichem Attout
Abstract:
The effect of acid -basic treatment of lingocellulosic material (forest wastes wild carob) on Ethyl violet adsorption was investigated. It was found that surface chemistry plays an important role in Ethyl violet (EV) adsorption. HCl treatment produces more active acidic surface groups such as carboxylic and lactone, resulting in an increase in the adsorption of EV dye. The adsorption efficiency was higher for treated of lingocellulosic material with HCl than for treated with KOH. Maximum biosorption capacity was 170 and 130 mg/g, for treated of lingocellulosic material with HCl than for treated with KOH at pH 6 respectively. It was also found that the time to reach equilibrium takes less than 25 min for both treated materials. The adsorption of basic dye (i.e., ethyl violet or basic violet 4) was carried out by varying some process parameters, such as initial concentration, pH and temperature. The adsorption process can be well described by means of a pseudo-second-order reaction model showing that boundary layer resistance was not the rate-limiting step, as confirmed by intraparticle diffusion since the linear plot of Qt versus t^0.5 did not pass through the origin. In addition, experimental data were accurately expressed by the Sips equation if compared with the Langmuir and Freundlich isotherms. The values of ΔG° and ΔH° confirmed that the adsorption of EV on acid-basic treated forest wast wild carob was spontaneous and endothermic in nature. The positive values of ΔS° suggested an irregular increase of the randomness at the treated lingocellulosic material -solution interface during the adsorption process.Keywords: adsorption, isotherm models, thermodynamic parameters, wild carob
Procedia PDF Downloads 2771666 Machine Learning Methods for Network Intrusion Detection
Authors: Mouhammad Alkasassbeh, Mohammad Almseidin
Abstract:
Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE. Procedia PDF Downloads 2351665 Multivariate Analysis of Spectroscopic Data for Agriculture Applications
Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman
Abstract:
In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.Keywords: Brown rot disease, NIR spectroscopy, potato, random forest
Procedia PDF Downloads 1901664 Satellite LiDAR-Based Digital Terrain Model Correction using Gaussian Process Regression
Authors: Keisuke Takahata, Hiroshi Suetsugu
Abstract:
Forest height is an important parameter for forest biomass estimation, and precise elevation data is essential for accurate forest height estimation. There are several globally or nationally available digital elevation models (DEMs) like SRTM and ASTER. However, its accuracy is reported to be low particularly in mountainous areas where there are closed canopy or steep slope. Recently, space-borne LiDAR, such as the Global Ecosystem Dynamics Investigation (GEDI), have started to provide sparse but accurate ground elevation and canopy height estimates. Several studies have reported the high degree of accuracy in their elevation products on their exact footprints, while it is not clear how this sparse information can be used for wider area. In this study, we developed a digital terrain model correction algorithm by spatially interpolating the difference between existing DEMs and GEDI elevation products by using Gaussian Process (GP) regression model. The result shows that our GP-based methodology can reduce the mean bias of the elevation data from 3.7m to 0.3m when we use airborne LiDAR-derived elevation information as ground truth. Our algorithm is also capable of quantifying the elevation data uncertainty, which is critical requirement for biomass inventory. Upcoming satellite-LiDAR missions, like MOLI (Multi-footprint Observation Lidar and Imager), are expected to contribute to the more accurate digital terrain model generation.Keywords: digital terrain model, satellite LiDAR, gaussian processes, uncertainty quantification
Procedia PDF Downloads 1831663 Effect of Mangrove Forests in Coastal Flood and Erosion
Authors: Majid Samiee Zenoozian
Abstract:
This paper studies the susceptibility of local settlements in the gulf of Oman mangrove forest zone to flooding and progressesconsiderate of acuities and reactions to historical and present coastal flooding.it is indirect thaterosionsproduced in coastal zones by the change of mangrove undergrowthsubsequent from the enduring influence of persons since the late 19th century. Confronted with the increasing impact of climate change on climate ambitiousalarms such as flooding and biodiversity damage, handling the relationship between mangroves and their atmosphere has become authoritative for their defense. Coastal flood dangers are increasing quickly. We offer high resolution approximations of the financial value of mangroves forests for flood risk discount. We progress a probabilistic, process-based estimate of the properties of mangroves on avoidanceharms to people and property. More significantly, it also establishes how the incessantsqualor of this significant ecosystem has the potential to unfavorably influence the future cyclone persuadeddangers in the area.Keywords: mangrove forest, coastal, flood, erosion
Procedia PDF Downloads 1181662 Monitoring Three-Dimensional Models of Tree and Forest by Using Digital Close-Range Photogrammetry
Authors: S. Y. Cicekli
Abstract:
In this study, tree-dimensional model of tree was created by using terrestrial close range photogrammetry. For this close range photos were taken. Photomodeler Pro 5 software was used for camera calibration and create three-dimensional model of trees. In first test, three-dimensional model of a tree was created, in the second test three-dimensional model of three trees were created. This study aim is creating three-dimensional model of trees and indicate the use of close-range photogrammetry in forestry. At the end of the study, three-dimensional model of tree and three trees were created. This study showed that usability of close-range photogrammetry for monitoring tree and forests three-dimensional model.Keywords: close- range photogrammetry, forest, tree, three-dimensional model
Procedia PDF Downloads 3891661 Application of Multilayer Perceptron and Markov Chain Analysis Based Hybrid-Approach for Predicting and Monitoring the Pattern of LULC Using Random Forest Classification in Jhelum District, Punjab, Pakistan
Authors: Basit Aftab, Zhichao Wang, Feng Zhongke
Abstract:
Land Use and Land Cover Change (LULCC) is a critical environmental issue that has significant effects on biodiversity, ecosystem services, and climate change. This study examines the spatiotemporal dynamics of land use and land cover (LULC) across a three-decade period (1992–2022) in a district area. The goal is to support sustainable land management and urban planning by utilizing the combination of remote sensing, GIS data, and observations from Landsat satellites 5 and 8 to provide precise predictions of the trajectory of urban sprawl. In order to forecast the LULCC patterns, this study suggests a hybrid strategy that combines the Random Forest method with Multilayer Perceptron (MLP) and Markov Chain analysis. To predict the dynamics of LULC change for the year 2035, a hybrid technique based on multilayer Perceptron and Markov Chain Model Analysis (MLP-MCA) was employed. The area of developed land has increased significantly, while the amount of bare land, vegetation, and forest cover have all decreased. This is because the principal land types have changed due to population growth and economic expansion. The study also discovered that between 1998 and 2023, the built-up area increased by 468 km² as a result of the replacement of natural resources. It is estimated that 25.04% of the study area's urbanization will be increased by 2035. The performance of the model was confirmed with an overall accuracy of 90% and a kappa coefficient of around 0.89. It is important to use advanced predictive models to guide sustainable urban development strategies. It provides valuable insights for policymakers, land managers, and researchers to support sustainable land use planning, conservation efforts, and climate change mitigation strategies.Keywords: land use land cover, Markov chain model, multi-layer perceptron, random forest, sustainable land, remote sensing.
Procedia PDF Downloads 341660 Co-management Organizations: A Way to Facilitate Sustainable Management of the Sundarbans Mangrove Forests of Bangladesh
Authors: Md. Wasiul Islam, Md. Jamius Shams Sowrov
Abstract:
The Sundarbans is the largest single tract of mangrove forest in the world. This is located in the southwest corner of Bangladesh. This is a unique ecosystem which is a great breeding and nursing ground for a great biodiversity. It supports the livelihood of about 3.5 million coastal dwellers and also protects the coastal belt and inland areas from various natural calamities. Historically, the management of the Sundarbans was controlled by the Bangladesh Forest Department following top-down approach without the involvement of local communities. Such fence and fining-based blue-print approach was not effective to protect the forest which caused Sundarbans to degrade severely in the recent past. Fifty percent of the total tree cover has been lost in the last 30 years. Therefore, local multi-stakeholder based bottom-up co-management approach was introduced at some of the parts of the Sundarbans in 2006 to improve the biodiversity status by enhancing the protection level of the forest. Various co-management organizations were introduced under co-management approach where the local community people could actively involve in various activities related to the management and welfare of the Sundarbans including the decision-making process to achieve the goal. From this backdrop, the objective of the study was to assess the performance of co-management organizations to facilitate sustainable management of the Sundarbans mangrove forests. The qualitative study followed face-to-face interview to collect data using two sets of semi-structured questionnaires. A total of 40 respondents participated in the research that was from eight villagers under two forest ranges. 32 representatives from the local communities as well as 8 official representatives involved in co-management approach were interviewed using snowball sampling technique. The study shows that the co-management approach improved governance system of the Sundarbans through active participation of the local community people and their interactions with the officials via the platform of co-management organizations. It facilitated accountability and transparency system to some extent through following some formal and informal rules and regulations. It also improved the power structure of the management process by fostering local empowerment process particularly the women. Moreover, people were able to learn from their interactions with and within the co-management organizations as well as interventions improved environmental awareness and promoted social learning. The respondents considered good governance as the most important factor for achieving the goal of sustainable management and biodiversity conservation of the Sundarbans. The success of co-management planning process also depends on the active and functional participation of different stakeholders including the local communities where co-management organizations were considered as the most functional platform. However, the governance system was also facing various challenges which resulted in barriers to the sustainable management of the Sundarbans mangrove forest. But still there were some members involved in illegal forest operations and created obstacles against sustainable management of the Sundarbans. Respondents recommended greater patronization from the government, financial and logistic incentives for alternative income generation opportunities with effective participatory monitoring and evaluation system to improve sustainable management of the Sundarbans.Keywords: Bangladesh, co-management approach, co-management organizations, governance, Sundarbans, sustainable management
Procedia PDF Downloads 1791659 Improving Lane Detection for Autonomous Vehicles Using Deep Transfer Learning
Authors: Richard O’Riordan, Saritha Unnikrishnan
Abstract:
Autonomous Vehicles (AVs) are incorporating an increasing number of ADAS features, including automated lane-keeping systems. In recent years, many research papers into lane detection algorithms have been published, varying from computer vision techniques to deep learning methods. The transition from lower levels of autonomy defined in the SAE framework and the progression to higher autonomy levels requires increasingly complex models and algorithms that must be highly reliable in their operation and functionality capacities. Furthermore, these algorithms have no room for error when operating at high levels of autonomy. Although the current research details existing computer vision and deep learning algorithms and their methodologies and individual results, the research also details challenges faced by the algorithms and the resources needed to operate, along with shortcomings experienced during their detection of lanes in certain weather and lighting conditions. This paper will explore these shortcomings and attempt to implement a lane detection algorithm that could be used to achieve improvements in AV lane detection systems. This paper uses a pre-trained LaneNet model to detect lane or non-lane pixels using binary segmentation as the base detection method using an existing dataset BDD100k followed by a custom dataset generated locally. The selected roads will be modern well-laid roads with up-to-date infrastructure and lane markings, while the second road network will be an older road with infrastructure and lane markings reflecting the road network's age. The performance of the proposed method will be evaluated on the custom dataset to compare its performance to the BDD100k dataset. In summary, this paper will use Transfer Learning to provide a fast and robust lane detection algorithm that can handle various road conditions and provide accurate lane detection.Keywords: ADAS, autonomous vehicles, deep learning, LaneNet, lane detection
Procedia PDF Downloads 104