Search results for: documents clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1522

Search results for: documents clustering

1312 The Second Column of Origen’s Hexapla and the Transcription of BGDKPT Consonants: A Confrontation with Transliterated Hebrew Names in Greek Documents

Authors: Isabella Maurizio

Abstract:

This research analyses the pronunciation of Hebrew consonants 'bgdkpt' in II- III C. E. in Palestine, through the confrontation of two kinds of data: the fragments of transliteration of Old Testament in the Greek alphabet, from the second column of Origen’s synopsis, called Hexapla, and Hebrew names transliterated in Greek documents, especially epigraphs. Origen is a very important author, not only for his bgdkpt theological and exegetic works: the Hexapla, synoptic six columns for a critical edition of Septuaginta, has a relevant role in attempting to reconstruct the pronunciation of Hebrew language before Masoretic punctuation. For this reason, at the beginning, it is important to analyze the column in order to study phonetic and linguistic phenomena. Among the most problematic data, there is the evidence from bgdkpt consonants, always represented as Greek aspirated graphemes. This transcription raised the question if their pronunciation was the only spirant, and consequently, the double one, that is, the stop/spirant contrast, was introduced by Masoretes. However, the phonetic and linguistic examination of the column alone is not enough to establish a real pronunciation of language: this paper is significant because a confrontation between the second column’s transliteration and Hebrew names found in Greek documents epigraphic ones mainly, is achieved. Palestine in II - III was a bilingual country: Greek and Aramaic language lived together, the first one like the official language, the second one as the principal mean of communication between people. For this reason, Hebrew names are often found in Greek documents of the same geographical area: a deep examination of bgdkpt’s transliteration can help to understand better which the real pronunciation of these consonants was, or at least it allows to evidence a phonetic tendency. As a consequence, the research considers the contemporary documents to Origen and the previous ones: the first ones testify a specific stadium of pronunciation, the second ones reflect phonemes’ evolution. Alexandrian documents are also examined: Origen was from there, and the influence of Greek language, spoken in his native country, must be considered. The epigraphs have another implication: they are totally free from morphological criteria, probably used by Origen in his column, because of their popular origin. Thus, a confrontation between the hexaplaric transliteration and Hebrew names is absolutely required, in Hexapla’s studies: first of all, it can be the second clue of a pronunciation already noted in the column; then because, for documents’ specific nature, it has more probabilities to be real, reflecting a daily use of language. The examination of data shows a general tendency to employ the aspirated graphemes for bgdkpt consonants’ transliteration. This probably means that they were closer to Greek aspirated consonants rather than to the plosive ones. The exceptions are linked to a particular status of the name, i.e. its history and origin. In this way, this paper gives its contribution to onomastic studies, too: indeed, the research may contribute to verify the diffusion and the treatment of Jewish names in Hellenized world and in the koinè language.

Keywords: bgdkpt consonants, Greek epigraphs, Jewish names, origen's Hexapla

Procedia PDF Downloads 140
1311 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: evolutionary computation, feature selection, classification, clustering

Procedia PDF Downloads 372
1310 Method of Visual Prosthesis Design Based on Biologically Inspired Design

Authors: Shen Jian, Hu Jie, Zhu Guo Niu, Peng Ying Hong

Abstract:

There are two issues exited in the traditional visual prosthesis: lacking systematic method and the low level of humanization. To tackcle those obstacles, a visual prosthesis design method based on biologically inspired design is proposed. Firstly, a constrained FBS knowledge cell model is applied to construct the functional model of visual prosthesis in biological field. Then the clustering results of engineering domain are ob-tained with the use of the cross-domain knowledge cell clustering algorithm. Finally, a prototype system is designed to support the bio-logically inspired design where the conflict is digested by TRIZ and other tools, and the validity of the method is verified by the solution scheme

Keywords: knowledge-based engineering, visual prosthesis, biologically inspired design, biomedical engineering

Procedia PDF Downloads 193
1309 Neural Network Based Path Loss Prediction for Global System for Mobile Communication in an Urban Environment

Authors: Danladi Ali

Abstract:

In this paper, we measured GSM signal strength in the Dnepropetrovsk city in order to predict path loss in study area using nonlinear autoregressive neural network prediction and we also, used neural network clustering to determine average GSM signal strength receive at the study area. The nonlinear auto-regressive neural network predicted that the GSM signal is attenuated with the mean square error (MSE) of 2.6748dB, this attenuation value is used to modify the COST 231 Hata and the Okumura-Hata models. The neural network clustering revealed that -75dB to -95dB is received more frequently. This means that the signal strength received at the study is mostly weak signal

Keywords: one-dimensional multilevel wavelets, path loss, GSM signal strength, propagation, urban environment and model

Procedia PDF Downloads 382
1308 Hybrid Hierarchical Routing Protocol for WSN Lifetime Maximization

Authors: H. Aoudia, Y. Touati, E. H. Teguig, A. Ali Cherif

Abstract:

Conceiving and developing routing protocols for wireless sensor networks requires considerations on constraints such as network lifetime and energy consumption. In this paper, we propose a hybrid hierarchical routing protocol named HHRP combining both clustering mechanism and multipath optimization taking into account residual energy and RSSI measures. HHRP consists of classifying dynamically nodes into clusters where coordinators nodes with extra privileges are able to manipulate messages, aggregate data and ensure transmission between nodes according to TDMA and CDMA schedules. The reconfiguration of the network is carried out dynamically based on a threshold value which is associated with the number of nodes belonging to the smallest cluster. To show the effectiveness of the proposed approach HHRP, a comparative study with LEACH protocol is illustrated in simulations.

Keywords: routing protocol, optimization, clustering, WSN

Procedia PDF Downloads 470
1307 A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.

Keywords: pattern recognition, global terrorism database, Manhattan distance, k-means clustering, terrorism data analysis

Procedia PDF Downloads 386
1306 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: search algorithm, centroid, query, keyword, co-occurrence, categorisation

Procedia PDF Downloads 282
1305 Altered Network Organization in Mild Alzheimer's Disease Compared to Mild Cognitive Impairment Using Resting-State EEG

Authors: Chia-Feng Lu, Yuh-Jen Wang, Shin Teng, Yu-Te Wu, Sui-Hing Yan

Abstract:

Brain functional networks based on resting-state EEG data were compared between patients with mild Alzheimer’s disease (mAD) and matched patients with amnestic subtype of mild cognitive impairment (aMCI). We integrated the time–frequency cross mutual information (TFCMI) method to estimate the EEG functional connectivity between cortical regions and the network analysis based on graph theory to further investigate the alterations of functional networks in mAD compared with aMCI group. We aimed at investigating the changes of network integrity, local clustering, information processing efficiency, and fault tolerance in mAD brain networks for different frequency bands based on several topological properties, including degree, strength, clustering coefficient, shortest path length, and efficiency. Results showed that the disruptions of network integrity and reductions of network efficiency in mAD characterized by lower degree, decreased clustering coefficient, higher shortest path length, and reduced global and local efficiencies in the delta, theta, beta2, and gamma bands were evident. The significant changes in network organization can be used in assisting discrimination of mAD from aMCI in clinical.

Keywords: EEG, functional connectivity, graph theory, TFCMI

Procedia PDF Downloads 432
1304 Behavior of Printing Inks on Historical Documents Subjected to Cold RF Plasma Discharges

Authors: Dorina Rusu, Emil Ghiocel Ioanid, Marta Ursescu, Ana Maria Vlad, Mihaela Popescu

Abstract:

During the last decades the cold plasma discharges made the subject of numerous studies concerning the applications in the cultural heritage field, especially concentrated on ecological and non-invasive aspect of these conservation procedures. The conservation treatment using cold plasma is based, on the one hand, on the well-known property of plasma discharges to inactivate the contaminant biological species and, on the other hand, on the surface cleaning effect. Moreover the plasma discharge produces the functionalization of the treated surface, allowing subsequent deposition of protective layers. The paper presents the behavior of printing inks on historical documents treated in cold RF plasma. Two types of printing inks were studied, namely red and black ink, used on a religious book published in 19 century. SEM-EDX analysis results in the identification of the two inks as carbon black ink (C presence in the EDX spectrum) and cinnabar based red ink (Hg and S lines in the spectrum), result confirmed by XRF analysis. The experiments have been performed on paper samples written with laboratory- made inks, of similar composition with the inks identified on historical documents. The samples were subjected to RF plasma discharge, operating in nitrogen gaseous medium, at 1.2 MHz frequency and low-pressure (0.5 mbar), performed in a self-designed equipment for the application of conservation treatments on naturally aged paper supports. The impact of plasma discharge on the inks has been evaluated by SEM, XRD and color analysis. The color analysis revealed a slight discoloration of cinnabar ink on the historical document. SEM and XRD analyses have been carried out in an attempt to elucidate the process responsable for color modification.

Keywords: RF plasma, printing inks, historical documents, surface cleaning effect

Procedia PDF Downloads 440
1303 Popularization of the Communist Manifesto in 19th Century Europe

Authors: Xuanyu Bai

Abstract:

“The Communist Manifesto”, written by Karl Marx and Friedrich Engels, is one of the most significant documents throughout the whole history which covers across different fields including Economic, Politic, Sociology and Philosophy. Instead of discussing the Communist ideas presented in the Communist Manifesto, the essay focuses on exploring the reasons that contributed to the popularization of the document and its influence on political revolutions in 19th century Europe by concentrating on the document itself along with other primary and secondary sources and temporal artwork. Combining the details from the Communist Manifesto and other documents, Marx’s writing style and word choice, his convincible notions about a new society dominated by proletariats, and the revolutionary idea of class destruction has led to the popularization of the Communist Manifesto and influenced the latter political revolutions.

Keywords: communist manifesto, Marx, Engels, capitalism

Procedia PDF Downloads 131
1302 Computer Fraud from the Perspective of Iran's Law and International Documents

Authors: Babak Pourghahramani

Abstract:

One of the modern crimes against property and ownership in the cyber-space is the computer fraud. Despite being modern, the aforementioned crime has its roots in the principles of religious jurisprudence. In some cases, this crime is compatible with the traditional regulations and that is when the computer is considered as a crime commitment device and also some computer frauds that take place in the context of electronic exchanges are considered as crime based on the E-commerce Law (approved in 2003) but the aforementioned regulations are flawed and until recent years there was no comprehensive law in this regard; yet after some years the Computer Crime Act was approved in 2009/26/5 and partly solved the problem of legal vacuum. The present study intends to investigate the computer fraud according to Iran's Computer Crime Act and by taking into consideration the international documents.

Keywords: fraud, cyber fraud, computer fraud, classic fraud, computer crime

Procedia PDF Downloads 332
1301 Automatic Landmark Selection Based on Feature Clustering for Visual Autonomous Unmanned Aerial Vehicle Navigation

Authors: Paulo Fernando Silva Filho, Elcio Hideiti Shiguemori

Abstract:

The selection of specific landmarks for an Unmanned Aerial Vehicles’ Visual Navigation systems based on Automatic Landmark Recognition has significant influence on the precision of the system’s estimated position. At the same time, manual selection of the landmarks does not guarantee a high recognition rate, which would also result on a poor precision. This work aims to develop an automatic landmark selection that will take the image of the flight area and identify the best landmarks to be recognized by the Visual Navigation Landmark Recognition System. The criterion to select a landmark is based on features detected by ORB or AKAZE and edges information on each possible landmark. Results have shown that disposition of possible landmarks is quite different from the human perception.

Keywords: clustering, edges, feature points, landmark selection, X-means

Procedia PDF Downloads 282
1300 Clustering Based and Centralized Routing Table Topology of Control Protocol in Mobile Wireless Sensor Networks

Authors: Mbida Mohamed, Ezzati Abdellah

Abstract:

A strong challenge in the wireless sensor networks (WSN) is to save the energy and have a long life time in the network without having a high rate of loss information. However, topology control (TC) protocols are designed in a way that the network is divided and having a standard system of exchange packets between nodes. In this article, we will propose a clustering based and centralized routing table protocol of TC (CBCRT) which delegates a leader node that will encapsulate a single routing table in every cluster nodes. Hence, if a node wants to send packets to the sink, it requests the information's routing table of the current cluster from the node leader in order to root the packet.

Keywords: mobile wireless sensor networks, routing, topology of control, protocols

Procedia PDF Downloads 275
1299 Evaluation of Security and Performance of Master Node Protocol in the Bitcoin Peer-To-Peer Network

Authors: Muntadher Sallal, Gareth Owenson, Mo Adda, Safa Shubbar

Abstract:

Bitcoin is a digital currency based on a peer-to-peer network to propagate and verify transactions. Bitcoin is gaining wider adoption than any previous crypto-currency. However, the mechanism of peers randomly choosing logical neighbors without any knowledge about underlying physical topology can cause a delay overhead in information propagation, which makes the system vulnerable to double-spend attacks. Aiming at alleviating the propagation delay problem, this paper introduces proximity-aware extensions to the current Bitcoin protocol, named Master Node Based Clustering (MNBC). The ultimate purpose of the proposed protocol, that are based on how clusters are formulated and how nodes can define their membership, is to improve the information propagation delay in the Bitcoin network. In MNBC protocol, physical internet connectivity increases, as well as the number of hops between nodes, decreases through assigning nodes to be responsible for maintaining clusters based on physical internet proximity. We show, through simulations, that the proposed protocol defines better clustering structures that optimize the performance of the transaction propagation over the Bitcoin protocol. The evaluation of partition attacks in the MNBC protocol, as well as the Bitcoin network, was done in this paper. Evaluation results prove that even though the Bitcoin network is more resistant against the partitioning attack than the MNBC protocol, more resources are needed to be spent to split the network in the MNBC protocol, especially with a higher number of nodes.

Keywords: Bitcoin network, propagation delay, clustering, scalability

Procedia PDF Downloads 116
1298 Analysis of State Documents on Environmental Awareness Aspects in Kazakhstan

Authors: Y. A. Kumar

Abstract:

Environmental awareness issues in Kazakhstan are one of the most undermined topics both among the public community and in terms of state rhetoric. In the context of official state documents, so far only two official environmental codes and national programs called Zhasyl Kazakhstan were introduced in the country in 2021. While on the one hand the Environmental Code was introduced with the purpose to modernize, frame and enlist main legislative aspects on various sectors of environmental law in Kazakhstan, on the other hand, the Zhasyl Kazakhstan Program has been implemented as a state program to address with numerous environmental projects various environmental issues ranging from air pollution to waste management as well as aspects related to ecological education and low environmental awareness matters. In this regard, the main goal of this paper is to analyze critically the main content of both of these documents with a particular focus on sections related to environmental awareness-raising aspects. For that, this paper applied a subjective-based content analysis in order to identify interesting insights on regulatory legal aspects, future research streams, and uncovering of improved legislative frameworks in the context of an environmental awareness issue. Apart from that, five open-ended questions were sent out to the Ministry of Ecology, Geology and Natural Resources to obtain primary data on the state’s view in regards to current previous, recent and future aspects of environmental awareness issues in the country.

Keywords: Kazakhstan, environmental awareness, environmental code, Zhasyl Kazakhstan, content analysis

Procedia PDF Downloads 95
1297 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 131
1296 A Transformer-Based Question Answering Framework for Software Contract Risk Assessment

Authors: Qisheng Hu, Jianglei Han, Yue Yang, My Hoa Ha

Abstract:

When a company is considering purchasing software for commercial use, contract risk assessment is critical to identify risks to mitigate the potential adverse business impact, e.g., security, financial and regulatory risks. Contract risk assessment requires reviewers with specialized knowledge and time to evaluate the legal documents manually. Specifically, validating contracts for a software vendor requires the following steps: manual screening, interpreting legal documents, and extracting risk-prone segments. To automate the process, we proposed a framework to assist legal contract document risk identification, leveraging pre-trained deep learning models and natural language processing techniques. Given a set of pre-defined risk evaluation problems, our framework utilizes the pre-trained transformer-based models for question-answering to identify risk-prone sections in a contract. Furthermore, the question-answering model encodes the concatenated question-contract text and predicts the start and end position for clause extraction. Due to the limited labelled dataset for training, we leveraged transfer learning by fine-tuning the models with the CUAD dataset to enhance the model. On a dataset comprising 287 contract documents and 2000 labelled samples, our best model achieved an F1 score of 0.687.

Keywords: contract risk assessment, NLP, transfer learning, question answering

Procedia PDF Downloads 129
1295 Event Driven Dynamic Clustering and Data Aggregation in Wireless Sensor Network

Authors: Ashok V. Sutagundar, Sunilkumar S. Manvi

Abstract:

Energy, delay and bandwidth are the prime issues of wireless sensor network (WSN). Energy usage optimization and efficient bandwidth utilization are important issues in WSN. Event triggered data aggregation facilitates such optimal tasks for event affected area in WSN. Reliable delivery of the critical information to sink node is also a major challenge of WSN. To tackle these issues, we propose an event driven dynamic clustering and data aggregation scheme for WSN that enhances the life time of the network by minimizing redundant data transmission. The proposed scheme operates as follows: (1) Whenever the event is triggered, event triggered node selects the cluster head. (2) Cluster head gathers data from sensor nodes within the cluster. (3) Cluster head node identifies and classifies the events out of the collected data using Bayesian classifier. (4) Aggregation of data is done using statistical method. (5) Cluster head discovers the paths to the sink node using residual energy, path distance and bandwidth. (6) If the aggregated data is critical, cluster head sends the aggregated data over the multipath for reliable data communication. (7) Otherwise aggregated data is transmitted towards sink node over the single path which is having the more bandwidth and residual energy. The performance of the scheme is validated for various WSN scenarios to evaluate the effectiveness of the proposed approach in terms of aggregation time, cluster formation time and energy consumed for aggregation.

Keywords: wireless sensor network, dynamic clustering, data aggregation, wireless communication

Procedia PDF Downloads 452
1294 Structural Challenges of Social Integration of Immigrants in Iran: Investigating the Status of Providing Citizenship and Social Services

Authors: Iman Shabanzadeh

Abstract:

In terms of its geopolitical position, Iran has been one of the main centers of migration movements in the world in recent decades. However, the policy makers' lack of preparation in completing the cycle of social integration of these immigrants, especially the second and third generation, has caused these people to always be prone to leave the country and immigrate to developed and industrialized countries. In this research, the issue of integration of immigrants in Iran from the perspective of four indicators, "Identity Documents", "Access to Banking Services", "Access to Health and Treatment Services" and "Obtaining a Driver's License" will be analyzed. The research method is descriptive-analytical. To collect information, library and document sources in the field of laws and regulations related to immigrants' rights in Iran, semi-structured interviews with experts have been used. The investigations of this study show that none of the residence documents of immigrants in Iran guarantee the full enjoyment of basic citizenship rights for them. In fact, the function of many of these identity documents, such as the census card, educational support card, etc., is only to prevent crossing the border, and none of them guarantee the basic rights of citizenship. Therefore, for many immigrants, the difference between legality and illegality is only in the risk of crossing the border, and this has led to the spread of the habit of illegal presence for them. Despite this, it seems that there is no clear and coherent policy framework around the issue of foreign immigrants in the country. This policy incoherence can be clearly seen in the diversity and plurality of identity and legal documents of the citizens present in the country and the policy maker's lack of planning to integrate and organize the identity of this huge group. Examining the differences and socioeconomic inequalities between immigrants and the native Iranian population shows that immigrants have been poorly integrated into the structures of Iranian society from an economic and social point of view.

Keywords: immigrants, social integration, citizen services, structural inequality

Procedia PDF Downloads 45
1293 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 452
1292 Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Authors: Diego Saqui, José H. Saito, José R. Campos, Lúcio A. de C. Jorge

Abstract:

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Keywords: band selection, fuzzy c-means, k-means, hyperspectral image

Procedia PDF Downloads 409
1291 Assessing Solid Waste Management Practices and Health Impacts in Port Harcourt City, Nigeria

Authors: Perpetual Onyejelem, Kenichi Matsui

Abstract:

Solid waste management has recently posed urgent challenges to environmental sustainability and public health in emerging Sub-Saharan urban centers. This paper examines solid waste management in Port Harcourt, the rapidly growing city in Nigeria, with a focus on current solid waste management practices and its health implications. To do so we analyzed past academic papers and official documents. The Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) statement and its four-stage inclusion/exclusion criteria were utilized as part of a systematic literature review technique to identify papers related to solid waste management practices (Scopus and Google Scholar). In terms of policy documents, we focused on information about the implementation between 2014 and 2023. We found that the Rivers State Waste Management Policy and the National Policy on Solid Waste Management were the two most important documents to understand Port Harcourt’s practices. Past studies, however, highlighted that residents continued to dump waste in drainages as they were largely unaware of the policies that encourage them to sort waste. The studies tend to blame the city of its lack of political commitment to monitoring waste sites. Another study highlighted inefficient waste collection practices, the absence of community participation and poor resident awareness of 3R practices. Government documents and past studies tend to agree that an increase in disorderly waste management practices and the emergence of vector-borne diseases (e.g., malaria, lassa fever, cholera) co-incided in Port Harcourt. This led to increased spending for healthcare for locals, particularly low-income households. This study concludes by making some remedial recommendations.

Keywords: health effects, solid waste management practices, environmental pollution, Port Harcourt

Procedia PDF Downloads 29
1290 BIM-Based Tool for Sustainability Assessment and Certification Documents Provision

Authors: Taki Eddine Seghier, Mohd Hamdan Ahmad, Yaik-Wah Lim, Samuel Opeyemi Williams

Abstract:

The assessment of building sustainability to achieve a specific green benchmark and the preparation of the required documents in order to receive a green building certification, both are considered as major challenging tasks for green building design team. However, this labor and time-consuming process can take advantage of the available Building Information Modeling (BIM) features such as material take-off and scheduling. Furthermore, the workflow can be automated in order to track potentially achievable credit points and provide rating feedback for several design options by using integrated Visual Programing (VP) to handle the stored parameters within the BIM model. Hence, this study proposes a BIM-based tool that uses Green Building Index (GBI) rating system requirements as a unique input case to evaluate the building sustainability in the design stage of the building project life cycle. The tool covers two key models for data extraction, firstly, a model for data extraction, calculation and the classification of achievable credit points in a green template, secondly, a model for the generation of the required documents for green building certification. The tool was validated on a BIM model of residential building and it serves as proof of concept that building sustainability assessment of GBI certification can be automatically evaluated and documented through BIM.

Keywords: green building rating system, GBRS, building information modeling, BIM, visual programming, VP, sustainability assessment

Procedia PDF Downloads 327
1289 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 145
1288 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 296
1287 Identification of Nonlinear Systems Using Radial Basis Function Neural Network

Authors: C. Pislaru, A. Shebani

Abstract:

This paper uses the radial basis function neural network (RBFNN) for system identification of nonlinear systems. Five nonlinear systems are used to examine the activity of RBFNN in system modeling of nonlinear systems; the five nonlinear systems are dual tank system, single tank system, DC motor system, and two academic models. The feed forward method is considered in this work for modelling the non-linear dynamic models, where the K-Means clustering algorithm used in this paper to select the centers of radial basis function network, because it is reliable, offers fast convergence and can handle large data sets. The least mean square method is used to adjust the weights to the output layer, and Euclidean distance method used to measure the width of the Gaussian function.

Keywords: system identification, nonlinear systems, neural networks, radial basis function, K-means clustering algorithm

Procedia PDF Downloads 471
1286 Discriminating Between Energy Drinks and Sports Drinks Based on Their Chemical Properties Using Chemometric Methods

Authors: Robert Cazar, Nathaly Maza

Abstract:

Energy drinks and sports drinks are quite popular among young adults and teenagers worldwide. Some concerns regarding their health effects – particularly those of the energy drinks - have been raised based on scientific findings. Differentiating between these two types of drinks by means of their chemical properties seems to be an instructive task. Chemometrics provides the most appropriate strategy to do so. In this study, a discrimination analysis of the energy and sports drinks has been carried out applying chemometric methods. A set of eleven samples of available commercial brands of drinks – seven energy drinks and four sports drinks – were collected. Each sample was characterized by eight chemical variables (carbohydrates, energy, sugar, sodium, pH, degrees Brix, density, and citric acid). The data set was standardized and examined by exploratory chemometric techniques such as clustering and principal component analysis. As a preliminary step, a variable selection was carried out by inspecting the variable correlation matrix. It was detected that some variables are redundant, so they can be safely removed, leaving only five variables that are sufficient for this analysis. They are sugar, sodium, pH, density, and citric acid. Then, a hierarchical clustering `employing the average – linkage criterion and using the Euclidian distance metrics was performed. It perfectly separates the two types of drinks since the resultant dendogram, cut at the 25% similarity level, assorts the samples in two well defined groups, one of them containing the energy drinks and the other one the sports drinks. Further assurance of the complete discrimination is provided by the principal component analysis. The projection of the data set on the first two principal components – which retain the 71% of the data information – permits to visualize the distribution of the samples in the two groups identified in the clustering stage. Since the first principal component is the discriminating one, the inspection of its loadings consents to characterize such groups. The energy drinks group possesses medium to high values of density, citric acid, and sugar. The sports drinks group, on the other hand, exhibits low values of those variables. In conclusion, the application of chemometric methods on a data set that features some chemical properties of a number of energy and sports drinks provides an accurate, dependable way to discriminate between these two types of beverages.

Keywords: chemometrics, clustering, energy drinks, principal component analysis, sports drinks

Procedia PDF Downloads 109
1285 Parallel Genetic Algorithms Clustering for Handling Recruitment Problem

Authors: Walid Moudani, Ahmad Shahin

Abstract:

This research presents a study to handle the recruitment services system. It aims to enhance a business intelligence system by embedding data mining in its core engine and to facilitate the link between job searchers and recruiters companies. The purpose of this study is to present an intelligent management system for supporting recruitment services based on data mining methods. It consists to apply segmentation on the extracted job postings offered by the different recruiters. The details of the job postings are associated to a set of relevant features that are extracted from the web and which are based on critical criterion in order to define consistent clusters. Thereafter, we assign the job searchers to the best cluster while providing a ranking according to the job postings of the selected cluster. The performance of the proposed model used is analyzed, based on a real case study, with the clustered job postings dataset and classified job searchers dataset by using some metrics.

Keywords: job postings, job searchers, clustering, genetic algorithms, business intelligence

Procedia PDF Downloads 329
1284 A Model Based Metaheuristic for Hybrid Hierarchical Community Structure in Social Networks

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

In recent years, the study of community detection in social networks has received great attention. The hierarchical structure of the network leads to the emergence of the convergence to a locally optimal community structure. In this paper, we aim to avoid this local optimum in the introduced hybrid hierarchical method. To achieve this purpose, we present an objective function where we incorporate the value of structural and semantic similarity based modularity and a metaheuristic namely bees colonies algorithm to optimize our objective function on both hierarchical level divisive and agglomerative. In order to assess the efficiency and the accuracy of the introduced hybrid bee colony model, we perform an extensive experimental evaluation on both synthetic and real networks.

Keywords: social network, community detection, agglomerative hierarchical clustering, divisive hierarchical clustering, similarity, modularity, metaheuristic, bee colony

Procedia PDF Downloads 380
1283 Improving the Performance of Requisition Document Online System for Royal Thai Army by Using Time Series Model

Authors: D. Prangchumpol

Abstract:

This research presents a forecasting method of requisition document demands for Military units by using Exponential Smoothing methods to analyze data. The data used in the forecast is an actual data requisition document of The Adjutant General Department. The results of the forecasting model to forecast the requisition of the document found that Holt–Winters’ trend and seasonality method of α=0.1, β=0, γ=0 is appropriate and matches for requisition of documents. In addition, the researcher has developed a requisition online system to improve the performance of requisition documents of The Adjutant General Department, and also ensuring that the operation can be checked.

Keywords: requisition, holt–winters, time series, royal thai army

Procedia PDF Downloads 308