Search results for: tokenization
9 Tokenization of Blue Bonds to Scale Blue Carbon Projects
Authors: Rodrigo Buaiz Boabaid
Abstract:
Tokenization of Blue Bonds is an emerging Green Finance tool that has the potential to scale Blue Carbon Projects to fight climate change. This innovative solution has a huge potential to democratize the green finance market and catalyze innovations in the climate change finance sector. Switzerland has emerged as a leader in the Green Finance space and is well-positioned to drive the adoption of Tokenization of Blue & Green Bonds. This unique approach has the potential to unlock new sources of capital and enable global investors to participate in the financing of sustainable blue carbon projects. By leveraging the power of blockchain technology, Tokenization of Blue Bonds can provide greater transparency, efficiency, and security in the investment process while also reducing transaction costs. Investments are in line with the highest regulations and designed according to the stringent legal framework and compliance standards set by Switzerland. The potential benefits of Tokenization of Blue Bonds are significant and could transform the way that sustainable projects are financed. By unlocking new sources of capital, this approach has the potential to accelerate the deployment of Blue Carbon projects and create new opportunities for investors to participate in the fight against climate change.Keywords: blue bonds, blue carbon, tokenization, green finance
Procedia PDF Downloads 878 Tokenization of Blue Bonds as an Emerging Green Finance Tool
Authors: Rodrigo Buaiz Boabaid
Abstract:
Tokenization of Blue Bonds is an emerging Green Finance tool that has the potential to scale Blue Carbon Projects to fight climate change. This innovative solution has a huge potential to democratize the green finance market and catalyze innovations in the climate change finance sector. Switzerland has emerged as a leader in the Green Finance space and is well-positioned to drive the adoption of Tokenization of Blue & Green Bonds. This unique approach has the potential to unlock new sources of capital and enable global investors to participate in the financing of sustainable blue carbon projects. By leveraging the power of blockchain technology, Tokenization of Blue Bonds can provide greater transparency, efficiency, and security in the investment process, while also reducing transaction costs. Investments are in line with the highest regulations and designed according to the stringent legal framework and compliance standards set by Switzerland. The potential benefits of Tokenization of Blue Bonds are significant and could transform the way that sustainable projects are financed. By unlocking new sources of capital, this approach has the potential to accelerate the deployment of Blue Carbon projects and create new opportunities for investors to participate in the fight against climate change.Keywords: blue carbon, blue bonds, green finance, Tokenization, blockchain solutions
Procedia PDF Downloads 737 Information Retrieval for Kafficho Language
Authors: Mareye Zeleke Mekonen
Abstract:
The Kafficho language has distinct issues in information retrieval because of its restricted resources and dearth of standardized methods. In this endeavor, with the cooperation and support of linguists and native speakers, we investigate the creation of information retrieval systems specifically designed for the Kafficho language. The Kafficho information retrieval system allows Kafficho speakers to access information easily in an efficient and effective way. Our objective is to conduct an information retrieval experiment using 220 Kafficho text files, including fifteen sample questions. Tokenization, normalization, stop word removal, stemming, and other data pre-processing chores, together with additional tasks like term weighting, were prerequisites for the vector space model to represent each page and a particular query. The three well-known measurement metrics we used for our word were Precision, Recall, and and F-measure, with values of 87%, 28%, and 35%, respectively. This demonstrates how well the Kaffiho information retrieval system performed well while utilizing the vector space paradigm.Keywords: Kafficho, information retrieval, stemming, vector space
Procedia PDF Downloads 576 Machine Learning Automatic Detection on Twitter Cyberbullying
Authors: Raghad A. Altowairgi
Abstract:
With the wide spread of social media platforms, young people tend to use them extensively as the first means of communication due to their ease and modernity. But these platforms often create a fertile ground for bullies to practice their aggressive behavior against their victims. Platform usage cannot be reduced, but intelligent mechanisms can be implemented to reduce the abuse. This is where machine learning comes in. Understanding and classifying text can be helpful in order to minimize the act of cyberbullying. Artificial intelligence techniques have expanded to formulate an applied tool to address the phenomenon of cyberbullying. In this research, machine learning models are built to classify text into two classes; cyberbullying and non-cyberbullying. After preprocessing the data in 4 stages; removing characters that do not provide meaningful information to the models, tokenization, removing stop words, and lowering text. BoW and TF-IDF are used as the main features for the five classifiers, which are; logistic regression, Naïve Bayes, Random Forest, XGboost, and Catboost classifiers. Each of them scores 92%, 90%, 92%, 91%, 86% respectively.Keywords: cyberbullying, machine learning, Bag-of-Words, term frequency-inverse document frequency, natural language processing, Catboost
Procedia PDF Downloads 1305 Extraction of Compound Words in Malay Sentences Using Linguistic and Statistical Approaches
Authors: Zamri Abu Bakar Zamri, Normaly Kamal Ismail Normaly, Mohd Izani Mohamed Rawi Izani
Abstract:
Malay noun compound are phrases that consist of two or more nouns. The key characteristic behind noun compounds lies on its frequent occurrences within the text. Therefore, extracting these noun compounds is essential for several domains of research such as Information Retrieval, Sentiment Analysis and Question Answering. Many research efforts have been proposed in terms of extracting Malay noun compounds using linguistic and statistical approaches. Most of the existing methods have concentrated on the extraction of bi-gram noun+noun compound. However, extracting noun+verb, noun+adjective and noun+prepositional is challenging due to the difficulty of selecting an appropriate method with effective results. Thus, there is still room for improvement in terms of enhancing the effectiveness of compound word extraction. Therefore, this study proposed a combination of linguistic approach and statistical measures in order to enhance the extraction of compound words. Several preprocessing steps are involved including normalization, tokenization, and stemming. The linguistic approach that has been used in this study is Part-of-Speech (POS) tagging. In addition, a new linguistic pattern for named entities has been utilized using a list of Malays named entities in order to enhance the linguistic approach in terms of noun compound recognition. The proposed statistical measures consists of NC-value, NTC-value and NLC value.Keywords: Compound Word, Noun Compound, Linguistic Approach, Statistical Approach
Procedia PDF Downloads 3504 Local Interpretable Model-agnostic Explanations (LIME) Approach to Email Spam Detection
Authors: Rohini Hariharan, Yazhini R., Blessy Maria Mathew
Abstract:
The task of detecting email spam is a very important one in the era of digital technology that needs effective ways of curbing unwanted messages. This paper presents an approach aimed at making email spam categorization algorithms transparent, reliable and more trustworthy by incorporating Local Interpretable Model-agnostic Explanations (LIME). Our technique assists in providing interpretable explanations for specific classifications of emails to help users understand the decision-making process by the model. In this study, we developed a complete pipeline that incorporates LIME into the spam classification framework and allows creating simplified, interpretable models tailored to individual emails. LIME identifies influential terms, pointing out key elements that drive classification results, thus reducing opacity inherent in conventional machine learning models. Additionally, we suggest a visualization scheme for displaying keywords that will improve understanding of categorization decisions by users. We test our method on a diverse email dataset and compare its performance with various baseline models, such as Gaussian Naive Bayes, Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Classifier, K-Nearest Neighbors, Decision Tree, and Logistic Regression. Our testing results show that our model surpasses all other models, achieving an accuracy of 96.59% and a precision of 99.12%.Keywords: text classification, LIME (local interpretable model-agnostic explanations), stemming, tokenization, logistic regression.
Procedia PDF Downloads 473 Layer-Level Feature Aggregation Network for Effective Semantic Segmentation of Fine-Resolution Remote Sensing Images
Authors: Wambugu Naftaly, Ruisheng Wang, Zhijun Wang
Abstract:
Models based on convolutional neural networks (CNNs), in conjunction with Transformer, have excelled in semantic segmentation, a fundamental task for intelligent Earth observation using remote sensing (RS) imagery. Nonetheless, tokenization in the Transformer model undermines object structures and neglects inner-patch local information, whereas CNNs are unable to simulate global semantics due to limitations inherent in their convolutional local properties. The integration of the two methodologies facilitates effective global-local feature aggregation and interactions, potentially enhancing segmentation results. Inspired by the merits of CNNs and Transformers, we introduce a layer-level feature aggregation network (LLFA-Net) to address semantic segmentation of fine-resolution remote sensing (FRRS) images for land cover classification. The simple yet efficient system employs a transposed unit that hierarchically utilizes dense high-level semantics and sufficient spatial information from various encoder layers through a layer-level feature aggregation module (LLFAM) and models global contexts using structured Transformer blocks. Furthermore, the decoder aggregates resultant features to generate rich semantic representation. Extensive experiments on two public land cover datasets demonstrate that our proposed framework exhibits competitive performance relative to the most recent frameworks in semantic segmentation.Keywords: land cover mapping, semantic segmentation, remote sensing, vision transformer networks, deep learning
Procedia PDF Downloads 72 Detecting Indigenous Languages: A System for Maya Text Profiling and Machine Learning Classification Techniques
Authors: Alejandro Molina-Villegas, Silvia Fernández-Sabido, Eduardo Mendoza-Vargas, Fátima Miranda-Pestaña
Abstract:
The automatic detection of indigenous languages in digital texts is essential to promote their inclusion in digital media. Underrepresented languages, such as Maya, are often excluded from language detection tools like Google’s language-detection library, LANGDETECT. This study addresses these limitations by developing a hybrid language detection solution that accurately distinguishes Maya (YUA) from Spanish (ES). Two strategies are employed: the first focuses on creating a profile for the Maya language within the LANGDETECT library, while the second involves training a Naive Bayes classification model with two categories, YUA and ES. The process includes comprehensive data preprocessing steps, such as cleaning, normalization, tokenization, and n-gram counting, applied to text samples collected from various sources, including articles from La Jornada Maya, a major newspaper in Mexico and the only media outlet that includes a Maya section. After the training phase, a portion of the data is used to create the YUA profile within LANGDETECT, which achieves an accuracy rate above 95% in identifying the Maya language during testing. Additionally, the Naive Bayes classifier, trained and tested on the same database, achieves an accuracy close to 98% in distinguishing between Maya and Spanish, with further validation through F1 score, recall, and logarithmic scoring, without signs of overfitting. This strategy, which combines the LANGDETECT profile with a Naive Bayes model, highlights an adaptable framework that can be extended to other underrepresented languages in future research. This fills a gap in Natural Language Processing and supports the preservation and revitalization of these languages.Keywords: indigenous languages, language detection, Maya language, Naive Bayes classifier, natural language processing, low-resource languages
Procedia PDF Downloads 181 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering
Authors: Zelalem Fantahun
Abstract:
Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.Keywords: POS tagging, Amharic, unsupervised learning, k-means
Procedia PDF Downloads 452