Search results for: forest cover-type dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1935

Search results for: forest cover-type dataset

1665 Polarimetric Synthetic Aperture Radar Data Classification Using Support Vector Machine and Mahalanobis Distance

Authors: Najoua El Hajjaji El Idrissi, Necip Gokhan Kasapoglu

Abstract:

Polarimetric Synthetic Aperture Radar-based imaging is a powerful technique used for earth observation and classification of surfaces. Forest evolution has been one of the vital areas of attention for the remote sensing experts. The information about forest areas can be achieved by remote sensing, whether by using active radars or optical instruments. However, due to several weather constraints, such as cloud cover, limited information can be recovered using optical data and for that reason, Polarimetric Synthetic Aperture Radar (PolSAR) is used as a powerful tool for forestry inventory. In this [14paper, we applied support vector machine (SVM) and Mahalanobis distance to the fully polarimetric AIRSAR P, L, C-bands data from the Nezer forest areas, the classification is based in the separation of different tree ages. The classification results were evaluated and the results show that the SVM performs better than the Mahalanobis distance and SVM achieves approximately 75% accuracy. This result proves that SVM classification can be used as a useful method to evaluate fully polarimetric SAR data with sufficient value of accuracy.

Keywords: classification, synthetic aperture radar, SAR polarimetry, support vector machine, mahalanobis distance

Procedia PDF Downloads 102
1664 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators

Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros

Abstract:

Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.

Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis

Procedia PDF Downloads 110
1663 Enhancing Cultural Heritage Data Retrieval by Mapping COURAGE to CIDOC Conceptual Reference Model

Authors: Ghazal Faraj, Andras Micsik

Abstract:

The CIDOC Conceptual Reference Model (CRM) is an extensible ontology that provides integrated access to heterogeneous and digital datasets. The CIDOC-CRM offers a “semantic glue” intended to promote accessibility to several diverse and dispersed sources of cultural heritage data. That is achieved by providing a formal structure for the implicit and explicit concepts and their relationships in the cultural heritage field. The COURAGE (“Cultural Opposition – Understanding the CultuRal HeritAGE of Dissent in the Former Socialist Countries”) project aimed to explore methods about socialist-era cultural resistance during 1950-1990 and planned to serve as a basis for further narratives and digital humanities (DH) research. This project highlights the diversity of flourished alternative cultural scenes in Eastern Europe before 1989. Moreover, the dataset of COURAGE is an online RDF-based registry that consists of historical people, organizations, collections, and featured items. For increasing the inter-links between different datasets and retrieving more relevant data from various data silos, a shared federated ontology for reconciled data is needed. As a first step towards these goals, a full understanding of the CIDOC CRM ontology (target ontology), as well as the COURAGE dataset, was required to start the work. Subsequently, the queries toward the ontology were determined, and a table of equivalent properties from COURAGE and CIDOC CRM was created. The structural diagrams that clarify the mapping process and construct queries are on progress to map person, organization, and collection entities to the ontology. Through mapping the COURAGE dataset to CIDOC-CRM ontology, the dataset will have a common ontological foundation with several other datasets. Therefore, the expected results are: 1) retrieving more detailed data about existing entities, 2) retrieving new entities’ data, 3) aligning COURAGE dataset to a standard vocabulary, 4) running distributed SPARQL queries over several CIDOC-CRM datasets and testing the potentials of distributed query answering using SPARQL. The next plan is to map CIDOC-CRM to other upper-level ontologies or large datasets (e.g., DBpedia, Wikidata), and address similar questions on a wide variety of knowledge bases.

Keywords: CIDOC CRM, cultural heritage data, COURAGE dataset, ontology alignment

Procedia PDF Downloads 119
1662 Effect of Thinning Practice on Carbon Storage in Soil Forest Northern Tunisia

Authors: Zouhaier Nasr, Mohamed Nouri

Abstract:

The increase in greenhouse gases since the pre-industrial period is a real threat to disrupting the balance of marine and terrestrial ecosystems. Along with the oceans, forest soils are considered to be the planet's second-largest carbon sink. North African forests have been subject to alarming degradation for several decades. The objective of this investigation is to determine and quantify the effect of thinning practiced in pine forests in northern Tunisia on the storage of organic carbon in the trees and in the soil. The plot planted in 1989 underwent thinning in 2005 on to plots; the density is therefore 1600 trees/ha in control and 400 trees/ha in thinning. Direct dendrometric measurements (diameter, height, branches, stem) were taken. In the soil part, six profiles of 1m / 1m / 1m were used for soil and root samples and biomass and organic matter measurements. The measurements obtained were statistically processed by appropriate software. The results clearly indicate that thinning improves tree growth, so the diameter increased from 24.3 cm to 30.1 cm. Carbon storage in the trunks was 35% more and 25% for the whole tree. At ground level, the thinned plot shows a slight increase in soil organic matter and quantity of carbon per tree, exceeding the control by 10 to 25%.

Keywords: forest, soil, carbon, climate change, Tunisia

Procedia PDF Downloads 98
1661 Developing Local Wisdom to Integrate Etnobiology and Biodiversity Conservation in Mount Ungaran, Central Java Indonesia

Authors: Margareta Rahayuningsih, Nur Rahayu Utami, Tsabit A. M., Muh. Abdullah

Abstract:

Mount Ungaran is one area that has remaining natural forest in Central Java, Indonesia. Mount Ungaran consists of several habitats that supporting appropriate areas for flora, fauna, and microorganisms biodiversity, particularly of it is protected by government law and IUCN red list data. Therefore, Mount Ungaran also settled up as AZE (Alliance for Zero Extinction) and IBA (Important Bird Area). The land use for agriculture and plantation reduces forest covered areas. It is serious threat to the existence of biodiversity in Moun Ungaran. This research has been identified community local wisdom that possible to be integrated as ethno-biological research and biodiversity conservation. The result showed at least four local wisdom that possible to be integrated to ethno-biological and biodiversity conservation were Wit Weh Woh (a ceremony of life-giving tree), Grebeg Alas Susuk Wangan (a ceremony for forest protection), Iriban (a ceremony of clean water resource protection), and tingkep tandur (a ceremony for ready-harvested plant protection). It is needed ethno-biological researches of local wisdom-contained values, which essential to be developed as a strategy for biodiversity conservation in Mount Ungaran.

Keywords: Mount Ungaran, local wisdom, biodiversity, fragmentation

Procedia PDF Downloads 256
1660 Diversity and Ecological Analysis of Vascular Epiphytes in Gera Wild Coffee Forest, Jimma Zone of Oromia Regional State, Ethiopia

Authors: Bedilu Tafesse

Abstract:

The diversity and ecological analysis of vascular epiphytes was studied in Gera Forest in southwestern Ethiopia at altitudes between 1600 and 2400 m.a.s.l. A total area of 4.5 ha was surveyed in coffee and non-coffee forest vegetation. Fifty sampling plots, each 30 m x 30 m (900 m2), were used for the purpose of data collection. A total of 59 species of vascular epiphytes were recorded, of which 34 (59%) were holo epiphytes, two (4%) were hemi epiphytes and 22 (37%) species were accidental vascular epiphytes. To study the altitudinal distribution of vascular epiphytes, altitudes were classified into higher >2000, middle 1800-2000 and lower 1600-1800 m.a.s.l. According to Shannon-Wiener Index (H/= 3.411) of alpha diversity the epiphyte community in the study area is medium. There was a statistically significant difference between host bark type and epiphyte richness as determined by one-way ANOVA p = 0.001 < 0.05. The post-hoc test shows that there is significant difference of vascular epiphytes richness between smooth bark with rough, flack and corky bark (P =0.001< 0.05), as well as rough and cork bark (p =0.43 <0.05). However, between rough and flack bark (p = 0.753 > 0.05) and between flack and corky bark (p = 0.854 > 0.05) no significant difference of epiphyte abundance was observed. Rough bark had 38%, corky 26%, flack 25%, and only 11% vascular epiphytes abundance occurred on smooth bark. The regression correlation test, (R2 = 0.773, p = 0.0001 < 0.05), showed that the number of species of vascular epiphytes and host DBH size are positively correlated. The regression correlation test (R2 = 0.28, p = 0.0001 < 0.05), showed that the number of species and host tree height positively correlated. The host tree preference of vascular epiphytes was recorded for only Vittaria volkensii species hosted on Syzygium guineense trees. The result of similarity analysis indicated that Gera Forest showed the highest vascular epiphytic similarity (0.35) with Yayu Forest and shared the least vascular epiphytic similarity (0.295) with Harenna Forest. It was concluded that horizontal stems and branches, large and rough, flack and corky bark type trees are more suitable for vascular epiphytes seedling attachments and growth. Conservation and protection of these phorophytes are important for the survival of vascular epiphytes and increase their ecological importance.

Keywords: accidental epiphytes, hemiepiphyte, holoepiphyte, phorophyte

Procedia PDF Downloads 301
1659 Plant Identification Using Convolution Neural Network and Vision Transformer-Based Models

Authors: Virender Singh, Mathew Rees, Simon Hampton, Sivaram Annadurai

Abstract:

Plant identification is a challenging task that aims to identify the family, genus, and species according to plant morphological features. Automated deep learning-based computer vision algorithms are widely used for identifying plants and can help users narrow down the possibilities. However, numerous morphological similarities between and within species render correct classification difficult. In this paper, we tested custom convolution neural network (CNN) and vision transformer (ViT) based models using the PyTorch framework to classify plants. We used a large dataset of 88,000 provided by the Royal Horticultural Society (RHS) and a smaller dataset of 16,000 images from the PlantClef 2015 dataset for classifying plants at genus and species levels, respectively. Our results show that for classifying plants at the genus level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420 and other state-of-the-art CNN-based models suggested in previous studies on a similar dataset. ViT model achieved top accuracy of 83.3% for classifying plants at the genus level. For classifying plants at the species level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420, with a top accuracy of 92.5%. We show that the correct set of augmentation techniques plays an important role in classification success. In conclusion, these results could help end users, professionals and the general public alike in identifying plants quicker and with improved accuracy.

Keywords: plant identification, CNN, image processing, vision transformer, classification

Procedia PDF Downloads 62
1658 Restoration of a Forest Catchment in Himachal Pradesh, India: An Institutional Analysis

Authors: Sakshi Gupta, Kavita Sardana

Abstract:

Management of a forest catchment involves diverse dimensions, multiple stakeholders, and conflicting interests, primarily due to the wide variety of valuable ecosystem services offered by it. Often, the coordination among different levels of formal institutions governing the catchment, local communities, as well as societal norms, taboos, customs and practices, happens to be amiss, leading to conflicting policy interventions which prove detrimental for such resources. In the case of Ala Catchment, which is a protected forest located at a distance of 9 km North-East of the town of Dalhousie, within district Chamba of Himachal Pradesh, India, and serves as one of the primary sources of public water supply for the downstream town of Dalhousie and nearby areas, several policy measures have been adopted for the restoration of the forest catchment, as well as for the improvement of public water supply. These catchment forest restoration measures include; the installation of a fence along the perimeter of the catchment, plantation of trees in the empty patches of the forest, construction of check dams, contour trenches, contour bunds, issuance of grazing permits, and installation of check posts to keep track of trespassers. While the measures adopted to address the acute shortage of public water supply in the Dalhousie region include; building and maintenance of large capacity water storage tanks, laying of pipelines, expanding public water distribution infrastructure to include water sources other than Ala Catchment Forest and introducing of five new water supply schemes for drinking water as well as irrigation. However, despite these policy measures, the degradation of the Ala catchment and acute shortage of water supply continue to distress the region. This study attempts to conduct an institutional analysis to assess the impact of policy measures for the restoration of the Ala Catchment in the Chamba district of Himachal Pradesh in India. For this purpose, the theoretical framework of Ostrom’s Institutional Assessment and Development (IAD) Framework was used. Snowball sampling was used to conduct private interviews and focused group discussions. A semi-structured questionnaire was administered to interview a total of 184 respondents across stakeholders from both formal and informal institutions. The central hypothesis of the study is that the interplay of formal and informal institutions facilitates the implementation of policy measures for ameliorating Ala Catchment, in turn improving the livelihood of people depending on this forest catchment for direct and indirect benefits. The findings of the study suggest that leakages in the successful implementation of policy measures occur at several nodes of decision-making, which adversely impact the catchment and the ecosystem services provided by it. Some of the key reasons diagnosed by the immediate analysis include; ad-hoc assignment of property rights, rise in tourist inflow increasing the pressures on water demand, illegal trespassing by local and nomadic pastoral communities for grazing and unlawful extraction of forest products, and rent-seeking by a few influential formal institutions. Consequently, it is indicated that the interplay of formal and informal institutions may be obscuring the consequentiality of the policy measures on the restoration of the catchment.

Keywords: catchment forest restoration, institutional analysis and development framework, institutional interplay, protected forest, water supply management

Procedia PDF Downloads 52
1657 Interpretation of Time Series Groundwater Monitoring Data Using Analytical Impulse Response Function Method to Understand Groundwater Processes Along the Murray River Floodplain at Gunbower Forest, Victoria, Australia

Authors: Mark Hocking

Abstract:

There is concern about the potential impact environmental flooding may have on groundwater levels and salinity processes in the Murray-Darling Basin. A study was undertaken to determine if environmental flooding of the Gunbower Forest has an impact on groundwater level and salinity which is in Victoria, Australia. To assess the impact, Impulse Response Functions (IRFs) are applied to time series groundwater monitoring well data in the area surrounding Gunbower Forest. It is found that rainfall is the primary driver of seasonal water table fluctuation, and the Murray River water level is a secondary contributor to the water table fluctuations. The dominant process that influenced the long-term water table level and salinity conditions is associated with pressure changes in the deep regional aquifer. The study demonstrates that groundwater level fluctuations in the vicinity of Gunbower Forest do not correlate with flooding (natural or managed). Groundwater recharge is calculated by applying the bore hydrograph method to the rainfall-attributed forcing function fluctuations. Data collected from thirty-three bores between 1990 to 2020 is processed to determine a 30-year average groundwater recharge rate. A 5% specific yield of the unconfined aquifer is assumed based on previously published data. It is found that the rainfall-attributed mean annual groundwater recharge varied between 2 mm/year and 189 mm/year with a median of 33.6 mm/year. Surface water recharge is also calculated by analysing the surface water attributed forcing function fluctuations and found to be as high as 37 mm/year, with most of the high values in the vicinity of rivers or agricultural land. There is a long-term regional aquifer declining trend where most water table bores have an average falling trend of 20 cm/year independent of rainfall over the past 30 years. It is found that the groundwater level beneath the Gunbower Forest is dominated by groundwater evapotranspiration. Evapotranspiration lowers the water table by as much as 0.5 m within the forest, thereby causing a relative groundwater level depression under the Gunbower Forest. Historical data shows that groundwater salinity in the area varies and has an electrical conductivity of up to 45 000 µS/cm (comparable to seawater). High groundwater salinity occurs both within and outside the Gunbower Forest as well as adjacent to the Murray River. Available groundwater salinity data suggests trends are generally stable; however, data quality and collection frequency could be improved. This study shows that at the majority of locations analyzed, the groundwater recharge occurred due to both rainfall and water loss from the Murray River. It is found that Deep groundwater pressures determined the base groundwater level, and the fluctuation of the deeper aquifer pressures determined the environmental interaction at the water surface. Local groundwater processes, such as high evapotranspiration rates in Gunbower Forest, have the capacity to lower the water table locally. The rise or fall of the regional aquifer water level has the greatest influence on the groundwater salinity in and around Gunbower Forest.

Keywords: groundwater data interpretation, groundwater monitoring, hydrogeology, impulse response function

Procedia PDF Downloads 28
1656 Single Imputation for Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125Hz to 8000Hz. The data contains patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R\textsuperscript{2} values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R\textsuperscript{2} values for the best models for KNN ranges from .89 to .95. The best imputation models received R\textsuperscript{2} between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our best imputation models versus constant imputations by a two percent increase.

Keywords: machine learning, audiograms, data imputations, single imputations

Procedia PDF Downloads 53
1655 PatchMix: Learning Transferable Semi-Supervised Representation by Predicting Patches

Authors: Arpit Rai

Abstract:

In this work, we propose PatchMix, a semi-supervised method for pre-training visual representations. PatchMix mixes patches of two images and then solves an auxiliary task of predicting the label of each patch in the mixed image. Our experiments on the CIFAR-10, 100 and the SVHN dataset show that the representations learned by this method encodes useful information for transfer to new tasks and outperform the baseline Residual Network encoders by on CIFAR 10 by 12% on ResNet 101 and 2% on ResNet-56, by 4% on CIFAR-100 on ResNet101 and by 6% on SVHN dataset on the ResNet-101 baseline model.

Keywords: self-supervised learning, representation learning, computer vision, generalization

Procedia PDF Downloads 59
1654 Rd-PLS Regression: From the Analysis of Two Blocks of Variables to Path Modeling

Authors: E. Tchandao Mangamana, V. Cariou, E. Vigneau, R. Glele Kakai, E. M. Qannari

Abstract:

A new definition of a latent variable associated with a dataset makes it possible to propose variants of the PLS2 regression and the multi-block PLS (MB-PLS). We shall refer to these variants as Rd-PLS regression and Rd-MB-PLS respectively because they are inspired by both Redundancy analysis and PLS regression. Usually, a latent variable t associated with a dataset Z is defined as a linear combination of the variables of Z with the constraint that the length of the loading weights vector equals 1. Formally, t=Zw with ‖w‖=1. Denoting by Z' the transpose of Z, we define herein, a latent variable by t=ZZ’q with the constraint that the auxiliary variable q has a norm equal to 1. This new definition of a latent variable entails that, as previously, t is a linear combination of the variables in Z and, in addition, the loading vector w=Z’q is constrained to be a linear combination of the rows of Z. More importantly, t could be interpreted as a kind of projection of the auxiliary variable q onto the space generated by the variables in Z, since it is collinear to the first PLS1 component of q onto Z. Consider the situation in which we aim to predict a dataset Y from another dataset X. These two datasets relate to the same individuals and are assumed to be centered. Let us consider a latent variable u=YY’q to which we associate the variable t= XX’YY’q. Rd-PLS consists in seeking q (and therefore u and t) so that the covariance between t and u is maximum. The solution to this problem is straightforward and consists in setting q to the eigenvector of YY’XX’YY’ associated with the largest eigenvalue. For the determination of higher order components, we deflate X and Y with respect to the latent variable t. Extending Rd-PLS to the context of multi-block data is relatively easy. Starting from a latent variable u=YY’q, we consider its ‘projection’ on the space generated by the variables of each block Xk (k=1, ..., K) namely, tk= XkXk'YY’q. Thereafter, Rd-MB-PLS seeks q in order to maximize the average of the covariances of u with tk (k=1, ..., K). The solution to this problem is given by q, eigenvector of YY’XX’YY’, where X is the dataset obtained by horizontally merging datasets Xk (k=1, ..., K). For the determination of latent variables of order higher than 1, we use a deflation of Y and Xk with respect to the variable t= XX’YY’q. In the same vein, extending Rd-MB-PLS to the path modeling setting is straightforward. Methods are illustrated on the basis of case studies and performance of Rd-PLS and Rd-MB-PLS in terms of prediction is compared to that of PLS2 and MB-PLS.

Keywords: multiblock data analysis, partial least squares regression, path modeling, redundancy analysis

Procedia PDF Downloads 110
1653 Climate Changes in Albania and Their Effect on Cereal Yield

Authors: Lule Basha, Eralda Gjika

Abstract:

This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine-learning methods, such as random forest, are used to predict cereal yield responses to climacteric and other variables. Random Forest showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the Random Forest method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods.

Keywords: cereal yield, climate change, machine learning, multiple regression model, random forest

Procedia PDF Downloads 58
1652 Automated Evaluation Approach for Time-Dependent Question Answering Pairs on Web Crawler Based Question Answering System

Authors: Shraddha Chaudhary, Raksha Agarwal, Niladri Chatterjee

Abstract:

This work demonstrates a web crawler-based generalized end-to-end open domain Question Answering (QA) system. An efficient QA system requires a significant amount of domain knowledge to answer any question with the aim to find an exact and correct answer in the form of a number, a noun, a short phrase, or a brief piece of text for the user's questions. Analysis of the question, searching the relevant document, and choosing an answer are three important steps in a QA system. This work uses a web scraper (Beautiful Soup) to extract K-documents from the web. The value of K can be calibrated on the basis of a trade-off between time and accuracy. This is followed by a passage ranking process using the MS-Marco dataset trained on 500K queries to extract the most relevant text passage, to shorten the lengthy documents. Further, a QA system is used to extract the answers from the shortened documents based on the query and return the top 3 answers. For evaluation of such systems, accuracy is judged by the exact match between predicted answers and gold answers. But automatic evaluation methods fail due to the linguistic ambiguities inherent in the questions. Moreover, reference answers are often not exhaustive or are out of date. Hence correct answers predicted by the system are often judged incorrect according to the automated metrics. One such scenario arises from the original Google Natural Question (GNQ) dataset which was collected and made available in the year 2016. Use of any such dataset proves to be inefficient with respect to any questions that have time-varying answers. For illustration, if the query is where will be the next Olympics? Gold Answer for the above query as given in the GNQ dataset is “Tokyo”. Since the dataset was collected in the year 2016, and the next Olympics after 2016 were in 2020 that was in Tokyo which is absolutely correct. But if the same question is asked in 2022 then the answer is “Paris, 2024”. Consequently, any evaluation based on the GNQ dataset will be incorrect. Such erroneous predictions are usually given to human evaluators for further validation which is quite expensive and time-consuming. To address this erroneous evaluation, the present work proposes an automated approach for evaluating time-dependent question-answer pairs. In particular, it proposes a metric using the current timestamp along with top-n predicted answers from a given QA system. To test the proposed approach GNQ dataset has been used and the system achieved an accuracy of 78% for a test dataset comprising 100 QA pairs. This test data was automatically extracted using an analysis-based approach from 10K QA pairs of the GNQ dataset. The results obtained are encouraging. The proposed technique appears to have the possibility of developing into a useful scheme for gathering precise, reliable, and specific information in a real-time and efficient manner. Our subsequent experiments will be guided towards establishing the efficacy of the above system for a larger set of time-dependent QA pairs.

Keywords: web-based information retrieval, open domain question answering system, time-varying QA, QA evaluation

Procedia PDF Downloads 74
1651 Cosmetic Recommendation Approach Using Machine Learning

Authors: Shakila N. Senarath, Dinesh Asanka, Janaka Wijayanayake

Abstract:

The necessity of cosmetic products is arising to fulfill consumer needs of personality appearance and hygiene. A cosmetic product consists of various chemical ingredients which may help to keep the skin healthy or may lead to damages. Every chemical ingredient in a cosmetic product does not perform on every human. The most appropriate way to select a healthy cosmetic product is to identify the texture of the body first and select the most suitable product with safe ingredients. Therefore, the selection process of cosmetic products is complicated. Consumer surveys have shown most of the time, the selection process of cosmetic products is done in an improper way by consumers. From this study, a content-based system is suggested that recommends cosmetic products for the human factors. To such an extent, the skin type, gender and price range will be considered as human factors. The proposed system will be implemented by using Machine Learning. Consumer skin type, gender and price range will be taken as inputs to the system. The skin type of consumer will be derived by using the Baumann Skin Type Questionnaire, which is a value-based approach that includes several numbers of questions to derive the user’s skin type to one of the 16 skin types according to the Bauman Skin Type indicator (BSTI). Two datasets are collected for further research proceedings. The user data set was collected using a questionnaire given to the public. Those are the user dataset and the cosmetic dataset. Product details are included in the cosmetic dataset, which belongs to 5 different kinds of product categories (Moisturizer, Cleanser, Sun protector, Face Mask, Eye Cream). An alternate approach of TF-IDF (Term Frequency – Inverse Document Frequency) is applied to vectorize cosmetic ingredients in the generic cosmetic products dataset and user-preferred dataset. Using the IF-IPF vectors, each user-preferred products dataset and generic cosmetic products dataset can be represented as sparse vectors. The similarity between each user-preferred product and generic cosmetic product will be calculated using the cosine similarity method. For the recommendation process, a similarity matrix can be used. Higher the similarity, higher the match for consumer. Sorting a user column from similarity matrix in a descending order, the recommended products can be retrieved in ascending order. Even though results return a list of similar products, and since the user information has been gathered, such as gender and the price ranges for product purchasing, further optimization can be done by considering and giving weights for those parameters once after a set of recommended products for a user has been retrieved.

Keywords: content-based filtering, cosmetics, machine learning, recommendation system

Procedia PDF Downloads 108
1650 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 160
1649 An Enhanced Support Vector Machine Based Approach for Sentiment Classification of Arabic Tweets of Different Dialects

Authors: Gehad S. Kaseb, Mona F. Ahmed

Abstract:

Arabic Sentiment Analysis (SA) is one of the most common research fields with many open areas. Few studies apply SA to Arabic dialects. This paper proposes different pre-processing steps and a modified methodology to improve the accuracy using normal Support Vector Machine (SVM) classification. The paper works on two datasets, Arabic Sentiment Tweets Dataset (ASTD) and Extended Arabic Tweets Sentiment Dataset (Extended-AATSD), which are publicly available for academic use. The results show that the classification accuracy approaches 86%.

Keywords: Arabic, classification, sentiment analysis, tweets

Procedia PDF Downloads 114
1648 Wreathed Hornbill (Rhyticeros undulatus) on Mount Ungaran: Are their Habitat Threatened?

Authors: Margareta Rahayuningsih, Nugroho Edi K., Siti Alimah

Abstract:

Wreathed Hornbill (Rhyticeros undulatus) is the one of hornbill species (Family: Bucerotidae) that found on Mount Ungaran. In the preservation or planning in situ conservation of Wreathed Hornbill require the habitat condition data. The objective of the research was to determine the land cover change on Mount Ungaran using satellite image data and GIS. Based on the land cover data on 1999-2009 the research showed that the primer forest on Mount Ungaran was decreased almost 50%, while the seconder forest, tea and coffee plantation, and the settlement were increased.

Keywords: GIS, Mount Ungaran, threatened habitat, Wreathed Hornbill (Rhyticeros undulatus)

Procedia PDF Downloads 337
1647 Volume Estimation of Trees: An Exploratory Study on Pterocarpus erinaceus Logging Operations within Forest Transition and Savannah Ecological Zones of Ghana

Authors: Albert Kwabena Osei Konadu

Abstract:

Pterocarpus erinaceus, also known as Rosewood, is tropical wood, endemic in forest savannah transition zones within the middle and northern portion of Ghana. Its economic viability has made it increasingly popular and in high demand, leading to widespread conservation concerns. Ghana’s forest resource management regime for these ecozones is mainly on conservation and very little on resource utilization. Consequently, commercial logging management standards are at teething stage and not fully developed, leading to a deficiency in the monitoring of logging operations and quantification of harvested trees volumes. Tree information form (TIF); a volume estimation and tracking regime, has proven to be an effective, sustainable management tool for regulating timber resource extraction in the high forest zones of the country. This work aims to generate TIF that can track and capture requisite parameters to accurately estimate the volume of harvested rosewood within forest savannah transition zones. Tree information forms were created on three scenarios of individual billets, stacked billets and conveying vessel basis. These TIFs were field-tested to deduce the most viable option for the tracking and estimation of harvested volumes of rosewood using the smallian and cubic volume estimation formula. Overall, four districts were covered with individual billets, stacked billets and conveying vessel scenarios registering mean volumes of 25.83m3,45.08m3 and 32.6m3, respectively. These adduced volumes were validated by benchmarking to assigned volumes of the Forestry Commission of Ghana and known standard volumes of conveying vessels. The results did indicate an underestimation of extracted volumes under the quotas regime, a situation that could lead to unintended overexploitation of the species. The research revealed conveying vessels route is the most viable volume estimation and tracking regime for the sustainable management of the Pterocarpous erinaceus species as it provided a more practical volume estimate and data extraction protocol.

Keywords: convention on international trade in endangered species, cubic volume formula, forest transition savannah zones, pterocarpus erinaceus, smallian’s volume formula, tree information form

Procedia PDF Downloads 60
1646 Using Machine Learning to Build a Real-Time COVID-19 Mask Safety Monitor

Authors: Yash Jain

Abstract:

The US Center for Disease Control has recommended wearing masks to slow the spread of the virus. The research uses a video feed from a camera to conduct real-time classifications of whether or not a human is correctly wearing a mask, incorrectly wearing a mask, or not wearing a mask at all. Utilizing two distinct datasets from the open-source website Kaggle, a mask detection network had been trained. The first dataset that was used to train the model was titled 'Face Mask Detection' on Kaggle, where the dataset was retrieved from and the second dataset was titled 'Face Mask Dataset, which provided the data in a (YOLO Format)' so that the TinyYoloV3 model could be trained. Based on the data from Kaggle, two machine learning models were implemented and trained: a Tiny YoloV3 Real-time model and a two-stage neural network classifier. The two-stage neural network classifier had a first step of identifying distinct faces within the image, and the second step was a classifier to detect the state of the mask on the face and whether it was worn correctly, incorrectly, or no mask at all. The TinyYoloV3 was used for the live feed as well as for a comparison standpoint against the previous two-stage classifier and was trained using the darknet neural network framework. The two-stage classifier attained a mean average precision (MAP) of 80%, while the model trained using TinyYoloV3 real-time detection had a mean average precision (MAP) of 59%. Overall, both models were able to correctly classify stages/scenarios of no mask, mask, and incorrectly worn masks.

Keywords: datasets, classifier, mask-detection, real-time, TinyYoloV3, two-stage neural network classifier

Procedia PDF Downloads 127
1645 Loan Repayment Prediction Using Machine Learning: Model Development, Django Web Integration and Cloud Deployment

Authors: Seun Mayowa Sunday

Abstract:

Loan prediction is one of the most significant and recognised fields of research in the banking, insurance, and the financial security industries. Some prediction systems on the market include the construction of static software. However, due to the fact that static software only operates with strictly regulated rules, they cannot aid customers beyond these limitations. Application of many machine learning (ML) techniques are required for loan prediction. Four separate machine learning models, random forest (RF), decision tree (DT), k-nearest neighbour (KNN), and logistic regression, are used to create the loan prediction model. Using the anaconda navigator and the required machine learning (ML) libraries, models are created and evaluated using the appropriate measuring metrics. From the finding, the random forest performs with the highest accuracy of 80.17% which was later implemented into the Django framework. For real-time testing, the web application is deployed on the Alibabacloud which is among the top 4 biggest cloud computing provider. Hence, to the best of our knowledge, this research will serve as the first academic paper which combines the model development and the Django framework, with the deployment into the Alibaba cloud computing application.

Keywords: k-nearest neighbor, random forest, logistic regression, decision tree, django, cloud computing, alibaba cloud

Procedia PDF Downloads 101
1644 Comparison of Deep Convolutional Neural Networks Models for Plant Disease Identification

Authors: Megha Gupta, Nupur Prakash

Abstract:

Identification of plant diseases has been performed using machine learning and deep learning models on the datasets containing images of healthy and diseased plant leaves. The current study carries out an evaluation of some of the deep learning models based on convolutional neural network (CNN) architectures for identification of plant diseases. For this purpose, the publicly available New Plant Diseases Dataset, an augmented version of PlantVillage dataset, available on Kaggle platform, containing 87,900 images has been used. The dataset contained images of 26 diseases of 14 different plants and images of 12 healthy plants. The CNN models selected for the study presented in this paper are AlexNet, ZFNet, VGGNet (four models), GoogLeNet, and ResNet (three models). The selected models are trained using PyTorch, an open-source machine learning library, on Google Colaboratory. A comparative study has been carried out to analyze the high degree of accuracy achieved using these models. The highest test accuracy and F1-score of 99.59% and 0.996, respectively, were achieved by using GoogLeNet with Mini-batch momentum based gradient descent learning algorithm.

Keywords: comparative analysis, convolutional neural networks, deep learning, plant disease identification

Procedia PDF Downloads 163
1643 K-Means Clustering-Based Infinite Feature Selection Method

Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Abstract:

Infinite Feature Selection (IFS) algorithm is an efficient feature selection algorithm that selects a subset of features of all sizes (including infinity). In this paper, we present an improved version of it, called clustering IFS (CIFS), by clustering the dataset in advance. To do so, first, we apply the K-means algorithm to cluster the dataset, then we apply IFS. In the CIFS method, the spatial and temporal complexities are reduced compared to the IFS method. Experimental results on 6 datasets show the superiority of CIFS compared to IFS in terms of accuracy, running time, and memory consumption.

Keywords: feature selection, infinite feature selection, clustering, graph

Procedia PDF Downloads 94
1642 A Study of Classification Models to Predict Drill-Bit Breakage Using Degradation Signals

Authors: Bharatendra Rai

Abstract:

Cutting tools are widely used in manufacturing processes and drilling is the most commonly used machining process. Although drill-bits used in drilling may not be expensive, their breakage can cause damage to expensive work piece being drilled and at the same time has major impact on productivity. Predicting drill-bit breakage, therefore, is important in reducing cost and improving productivity. This study uses twenty features extracted from two degradation signals viz., thrust force and torque. The methodology used involves developing and comparing decision tree, random forest, and multinomial logistic regression models for classifying and predicting drill-bit breakage using degradation signals.

Keywords: degradation signal, drill-bit breakage, random forest, multinomial logistic regression

Procedia PDF Downloads 323
1641 Practicing Participatory Approach in Social Forestry to Strengthen Sustainability in a Rural Area of Bangladesh

Authors: A B M Enamol Hassan

Abstract:

The forest storing up in Bangladesh is of deep concern to policy analysts because of increasing encroachment that results in deforestation and degradation of the ecosystem. To address these problems, forest-dependent people, as responsible for encroachment, could be involved in the co-management process along with other local stakeholders through a participatory approach. On the basis of this premise, this paper conceptualizes and empirically assesses the integration of all stakeholders in the co-management process through two lenses such as participation and collaboration. The study also analyzed the issues of sustainability in local communities along with examining constraints that limit the processes of integration. The study used a qualitative research method, which included face-to-face interviews with semi-structured questionnaires and field notes following the purposive sampling technique focusing on Comilla Sadar South Upazila (CSSU), Bangladesh. The findings of this paper reveal beneficiaries, Bangladesh Forest Department (BFD) and Union Parishad (UP), come together as leading actors, while NGOs and business entrepreneurs are ignored in the co-management process of social forestry. However, integrated management contributes to the strength of community sustainability, although it has some major limitations causing the matter of concerns among the local communities and policy analysts.

Keywords: integration, participation, collaboration, stakeholders, community sustainability

Procedia PDF Downloads 148
1640 Monitoring of Quantitative and Qualitative Changes in Combustible Material in the Białowieża Forest

Authors: Damian Czubak

Abstract:

The Białowieża Forest is a very valuable natural area, included in the World Natural Heritage at UNESCO, where, due to infestation by the bark beetle (Ips typographus), norway spruce (Picea abies) have deteriorated. This catastrophic scenario led to an increase in fire danger. This was due to the occurrence of large amounts of dead wood and grass cover, as light penetrated to the bottom of the stands. These factors in a dry state are materials that favour the possibility of fire and the rapid spread of fire. One of the objectives of the study was to monitor the quantitative and qualitative changes of combustible material on the permanent decay plots of spruce stands from 2012-2022. In addition, the size of the area with highly flammable vegetation was monitored and a classification of the stands of the Białowieża Forest by flammability classes was made. The key factor that determines the potential fire hazard of a forest is combustible material. Primarily its type, quantity, moisture content, size and spatial structure. Based on the inventory data on the areas of forest districts in the Białowieża Forest, the average fire load and its changes over the years were calculated. The analysis was carried out taking into account the changes in the health status of the stands and sanitary operations. The quantitative and qualitative assessment of fallen timber and fire load of ground cover used the results of the 2019 and 2021 inventories. Approximately 9,000 circular plots were used for the study. An assessment was made of the amount of potential fuel, understood as ground cover vegetation and dead wood debris. In addition, monitoring of areas with vegetation that poses a high fire risk was conducted using data from 2019 and 2021. All sub-areas were inventoried where vegetation posing a specific fire hazard represented at least 10% of the area with species characteristic of that cover. In addition to the size of the area with fire-prone vegetation, a very important element is the size of the fire load on the indicated plots. On representative plots, the biomass of the land cover was measured on an area of 10 m2 and then the amount of biomass of each component was determined. The resulting element of variability of ground covers in stands was their flammability classification. The classification developed made it possible to track changes in the flammability classes of stands over the period covered by the measurements.

Keywords: classification, combustible material, flammable vegetation, Norway spruce

Procedia PDF Downloads 63
1639 Evaluation of Pheromone and Tree Trap Efficiency in Orthotomicus erosus (Col: Curculionidae: Scolytinae) Monitoring in Pine Forests of Iran

Authors: Sudabe Amini, Jamasb Nozari, Somaye Rahimi

Abstract:

Bark beetles are one of the most destructive groups of pests in the forest and green space. Mediterranean pine Engraver Orthotomicus erosus (Wollston) is the dominant species in the pine forests of Iran. Pine forests are considered a crucial region in the world and need high protection. Although there is no effective control method, mass trapping is the most common method to suppress the bark beetle population. Due to this, from 2018-to 2020, a survey was conducted on bark beetles mass trapping by using two kinds of traps, including pheromone and tree trap. These traps were evaluated in 10 different sites of pine forests. The statistical results proved that significant differences between the pheromone trap and tree trap were observed. It confirmed that the pheromone trap attracted more beetles than the tree trap. The results of this study suggest that the most effective and applicable method in bark beetle’s management of pines forest is using a pheromone trap that suppresses and maintains bark beetle’s population at an economic level, although tree traps attract bark beetles too. In the future, using tree-pheromone traps, which would synergist attraction of more bark beetles, is recommended.

Keywords: bark beetle, pines forest, Orthotomicus erosus, pheromone trap, tree trap

Procedia PDF Downloads 128
1638 Global City Typologies: 300 Cities and Over 100 Datasets

Authors: M. Novak, E. Munoz, A. Jana, M. Nelemans

Abstract:

Cities and local governments the world over are interested to employ circular strategies as a means to bring about food security, create employment and increase resilience. The selection and implementation of circular strategies is facilitated by modeling the effects of strategies locally and understanding the impacts such strategies have had in other (comparable) cities and how that would translate locally. Urban areas are heterogeneous because of their geographic, economic, social characteristics, governance, and culture. In order to better understand the effect of circular strategies on urban systems, we create a dataset for over 300 cities around the world designed to facilitate circular strategy scenario modeling. This new dataset integrates data from over 20 prominent global national and urban data sources, such as the Global Human Settlements layer and International Labour Organisation, as well as incorporating employment data from over 150 cities collected bottom up from local departments and data providers. The dataset is made to be reproducible. Various clustering techniques are explored in the paper. The result is sets of clusters of cities, which can be used for further research, analysis, and support comparative, regional, and national policy making on circular cities.

Keywords: data integration, urban innovation, cluster analysis, circular economy, city profiles, scenario modelling

Procedia PDF Downloads 154
1637 Exploring the Rhinoceros Beetles of a Tropical Forest of Eastern Himalayas

Authors: Subhankar Kumar Sarkar

Abstract:

Beetles of the subfamily Dynastinae under the family Scarabaeidae of the insect order Coleoptera are popularly known as ‘Rhinoceros beetles’ because of the characteristic horn borne by the males on their head. These horns are dedicated in mating battle against other males and have evolved as a result of phenotypic plasticity. Scarabaeidae is the largest of all families under Coleoptera and is composed of 11 subfamilies, of which the subfamily Dynastinae is represented by approximately 300 species. Some of these beetles have been reported to cause considerable damage to agriculture and forestry both in their larval and adult stages, while many of them are beneficial as they pollinate plants and recycle plant materials. Eastern Himalayas is regarded as one of the 35 biodiversity hotspot zones of the world and one of the four of India, which is exhibited by its rich and megadiverse tropical forests. However, our knowledge on the faunal diversity of these forests is very limited, particularly for the insect fauna. One such tropical forest of Eastern Himalayas is the ‘Buxa Tiger Reserve’ located between latitudes 26°30” to 26°55” North and Longitudes 89°20” to 89˚35” East of India and occupies an area of about 759.26 square kilometers. It is with this background an attempt has been made to explore the insect fauna of the forest. Insect sampling was carried out in each beat and range of Buxa Tiger Reserve in all the three seasons viz, Premonsoon, Monsoon, and Postmonsoon. Sample collections were done by sweep nets, hand picking technique and pit fall traps. UV light trap was used to collect the nocturnal insects. Morphological examinations of the collected samples were carried out with Stereozoom Binocular Microscopes (Zeiss SV6 and SV11) and were identified up to species level with the aid of relevant literature. Survey of the insect fauna of the forest resulted in the recognition of 76 scarab species, of which 8 belong to the subfamily dealt herein. Each of the 8 species represents a separate genus. The forest is dominated by the members of Xylotrupes gideon (Linnaeus) as is represented by highest number of individuals. The recorded taxa show about 12% endemism and are of mainly oriental in distribution. Premonsoon is the most favorable season for their occurrence and activity followed by Monsoon and Postmonsoon.

Keywords: Dynastinae, Scarabaeidae, diversity, Buxa Tiger Reserve

Procedia PDF Downloads 157
1636 Human Wildlife Conflict Outside Protected Areas of Nepal: Causes, Consequences and Mitigation Strategies

Authors: Kedar Baral

Abstract:

This study was carried out in Mustang, Kaski, Tanahun, Baitadi, and Jhapa districts of Nepal. The study explored the spatial and temporal pattern of HWC, socio economic factors associated with it, impacts of conflict on life / livelihood of people and survival of wildlife species, and impact of climate change and forest fire onHWC. Study also evaluated people’s attitude towards wildlife conservation and assessed relevant policies and programs. Questionnaire survey was carried out with the 250 respondents, and both socio-demographic and HWC related information werecollected. Secondary information were collected from Divisional Forest Offices and Annapurna Conservation Area Project.HWC events were grouped by season /months/sites (forest type, distances from forest, and settlement), and the coordinates of the events were exported to ArcGIS. Collected data were analyzed using descriptive statistics in Excel and R Program. A total of 1465 events were recorded in 5 districts during 2015 and 2019. Out of that, livestock killing, crop damage, human attack, and cattle shed damage events were 70 %, 12%, 11%, and 7%, respectively. Among 151 human attack cases, 23 people were killed, and 128 were injured. Elephant in Terai, common leopard and monkey in Middle Mountain, and snow leopard in high mountains were found as major problematic animals. Common leopard attacks were found more in the autumn, evening, and on human settlement area. Whereas elephant attacks were found higher in winter, day time, and on farmland. Poor people farmers were found highly victimized, and they were losing 26% of their income due to crop raiding and livestock depredation. On the other hand, people are killing many wildlife in revenge, and this number is increasing every year. Based on the people's perception, climate change is causing increased temperature and forest fire events and decreased water sources within the forest. Due to the scarcity of food and water within forests, wildlife are compelled to dwell at human settlement area, hence HWC events are increasing. Nevertheless, more than half of the respondents were found positive about conserving entire wildlife species. Forests outside PAs are under the community forestry (CF) system, which restored the forest, improved the habitat, and increased the wildlife.However, CF policies and programs were found to be more focused on forest management with least priority on wildlife conservation and HWC mitigation. Compensation / relief scheme of government for wildlife damage was found some how effective to manage HWC, but the lengthy process, being applicable to the damage of few wildlife species and highly increasing events made it necessary to revisit. Based on these facts, the study suggest to carry out awareness generation activities to the poor farmers, linking the property of people with the insurance scheme, conducting habitat management activities within CF, promoting the unpalatable crops, improvement of shed house of livestock, simplifying compensation scheme and establishing a fund at the district level and incorporating the wildlife conservation and HWCmitigation programs in CF. Finally, the study suggests to carry out rigorous researches to understand the impacts of current forest management practices on forest, biodiversity, wildlife, and HWC.

Keywords: community forest, conflict mitigation, wildlife conservation, climate change

Procedia PDF Downloads 83