Search results for: document similarity
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1373

Search results for: document similarity

1253 Graph Planning Based Composition for Adaptable Semantic Web Services

Authors: Rihab Ben Lamine, Raoudha Ben Jemaa, Ikram Amous Ben Amor

Abstract:

This paper proposes a graph planning technique for semantic adaptable Web Services composition. First, we use an ontology based context model for extending Web Services descriptions with information about the most suitable context for its use. Then, we transform the composition problem into a semantic context aware graph planning problem to build the optimal service composition based on user's context. The construction of the planning graph is based on semantic context aware Web Service discovery that allows for each step to add most suitable Web Services in terms of semantic compatibility between the services parameters and their context similarity with the user's context. In the backward search step, semantic and contextual similarity scores are used to find best composed Web Services list. Finally, in the ranking step, a score is calculated for each best solution and a set of ranked solutions is returned to the user.

Keywords: semantic web service, web service composition, adaptation, context, graph planning

Procedia PDF Downloads 489
1252 A Model Based Metaheuristic for Hybrid Hierarchical Community Structure in Social Networks

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

In recent years, the study of community detection in social networks has received great attention. The hierarchical structure of the network leads to the emergence of the convergence to a locally optimal community structure. In this paper, we aim to avoid this local optimum in the introduced hybrid hierarchical method. To achieve this purpose, we present an objective function where we incorporate the value of structural and semantic similarity based modularity and a metaheuristic namely bees colonies algorithm to optimize our objective function on both hierarchical level divisive and agglomerative. In order to assess the efficiency and the accuracy of the introduced hybrid bee colony model, we perform an extensive experimental evaluation on both synthetic and real networks.

Keywords: social network, community detection, agglomerative hierarchical clustering, divisive hierarchical clustering, similarity, modularity, metaheuristic, bee colony

Procedia PDF Downloads 352
1251 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 426
1250 Control Configuration System as a Key Element in Distributed Control System

Authors: Goodarz Sabetian, Sajjad Moshfe

Abstract:

Control system for hi-tech industries could be realized generally and deeply by a special document. Vast heavy industries such as power plants with a large number of I/O signals are controlled by a distributed control system (DCS). This system comprises of so many parts from field level to high control level, and junior instrument engineers may be confused by this enormous information. The key document which can solve this problem is “control configuration system diagram” for each type of DCS. This is a road map that covers all of activities respect to control system in each industrial plant and inevitable to be studied by whom corresponded. It plays an important role from designing control system start point until the end; deliver the system to operate. This should be inserted in bid documents, contracts, purchasing specification and used in different periods of project EPC (engineering, procurement, and construction). Separate parts of DCS are categorized here in order of importance and a brief description and some practical plan is offered. This article could be useful for all instrument and control engineers who worked is EPC projects.

Keywords: control, configuration, DCS, power plant, bus

Procedia PDF Downloads 467
1249 Applications of Visual Ethnography in Public Anthropology

Authors: Subramaniam Panneerselvam, Gunanithi Perumal, KP Subin

Abstract:

The Visual Ethnography is used to document the culture of a community through a visual means. It could be either photography or audio-visual documentation. The visual ethnographic techniques are widely used in visual anthropology. The visual anthropologists use the camera to capture the cultural image of the studied community. There is a scope for subjectivity while the culture is documented by an external person. But the upcoming of the public anthropology provides an opportunity for the participants to document their own culture. There is a need to equip the participants with the skill of doing visual ethnography. The mobile phone technology provides visual documentation facility to everyone to capture the moments instantly. The visual ethnography facilitates the multiple-interpretation for the audiences. This study explores the effectiveness of visual ethnography among the tribal youth through public anthropology perspective. The case study was conducted to equip the tribal youth of Nilgiris in visual ethnography and the outcome of the experiment shared in this paper.

Keywords: visual ethnography, visual anthropology, public anthropology, multiple-interpretation, case study

Procedia PDF Downloads 139
1248 Development of Distance Training Packages for Teacher on Education Management for Learners with Special Needs

Authors: Jareeluk Ratanaphan

Abstract:

The purposed of this research were; 1. To survey the teacher’s needs on knowledge about special education management for special needs student 2. Development of distance training packages for teacher on special education management for special needs student 3. to study the effects of using the packages on trainee’s achievement 4. to study the effects of using the packages on trainee’s opinion on the distance training packages. The design of the experiment was research and development. The research sample for survey were 86 teachers, and 22 teachers for study the effects of using the packages on achievement and opinion. The research instrument comprised: 1) training packages on special education management for special needs student 2) achievement test 3) questionnaire. Mean, percentage, standard deviation, t-test and content analysis were used for data analysis. The findings of the research were as follows: 1. The teacher’s needs on knowledge about teaching for a learner with learning disability, mental retardation, autism, physical and health impairment and research in special education. 2. The package composed of special education management for special needs student document and manual of distance training packages. The document consisted by the name of packages, the explanation for the educator, content’s structure, concept, objectives, content and activities. Manual of distance training packages consisted by the explanation about a document, objectives, explanation about using the package, training schedule, and evaluation. The efficiency of packages was established at 79.50/81.35. 3. The results of using the packages were the posttest average scores of trainee’s achievement were higher than the pretest. 4. The trainee’s opinion on the package was at the highest level.

Keywords: distance training package, teacher, learner with special needs

Procedia PDF Downloads 464
1247 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 120
1246 Enhancement of Genetic Diversity through Cross Breeding of Two Catfish (Heteropneustes fossilis and Clarias batrachus) in Bangladesh

Authors: M. F. Miah, A. Chakrabarty

Abstract:

Two popular and highly valued fish, Stinging catfish (Heteropneustes fossilis) and Asian catfish (Clarias batrachus) are considered for observing genetic enhancement. Cross breeding was performed considering wild and farmed fish through inducing agent. Five RAPD markers were used to assess genetic diversity among parents and offspring of these two catfish for evaluating genetic enhancement in F1 generation. Considering different genetic data such as banding pattern of DNA, polymorphic loci, polymorphic information content (PIC), inter individual pair wise similarity, Nei genetic similarity, genetic distance, phylogenetic relationships, allele frequency, genotype frequency, intra locus gene diversity and average gene diversity of parents and offspring of these two fish were analyzed and finally in both cases higher genetic diversity was found in F1 generation than the parents.

Keywords: Heteropneustes fossilis, Clarias batrachus, cross breeding, genetic enhancement

Procedia PDF Downloads 204
1245 Large Language Model Powered Chatbots Need End-to-End Benchmarks

Authors: Debarag Banerjee, Pooja Singh, Arjun Avadhanam, Saksham Srivastava

Abstract:

Autonomous conversational agents, i.e., chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs), we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, authors propose the use of a benchmark that they call the E2E (End to End) benchmark and show how the E2E benchmark can be used to evaluate the accuracy and usefulness of the answers provided by chatbots, especially ones powered by LLMs. The authors evaluate an example chatbot at different levels of sophistication based on both our E2E benchmark as well as other available metrics commonly used in the state of the art and observe that the proposed benchmark shows better results compared to others. In addition, while some metrics proved to be unpredictable, the metric associated with the E2E benchmark, which uses cosine similarity, performed well in evaluating chatbots. The performance of our best models shows that there are several benefits of using the cosine similarity score as a metric in the E2E benchmark.

Keywords: chatbot benchmarking, end-to-end (E2E) benchmarking, large language model, user centric evaluation.

Procedia PDF Downloads 41
1244 Understanding Embryology in Promoting Peace Leadership: A Document Review

Authors: Vasudev Das

Abstract:

The specific problem is that many leaders of the 21st century do not understand that the extermination of embryos wreaks havoc on peace leadership. The purpose of the document review is to understand embryology in facilitating peace leadership. Extermination of human embryos generates a requital wave of violence which later falls on human society in the form of disturbances, considering that violence breeds further violence as a consequentiality. The study results reveal that a deep understanding of embryology facilitates peace leadership, given that minimizing embryo extermination enhances non-violence in the global village. Neo-Newtonians subscribe to the idea that every action has an equal and opposite reaction. The US Federal Government recognizes the embryo or fetus as a member of Homo sapiens. The social change implications of this study are that understanding human embryology promotes peace leadership, considering that the consequentiality of embryo extermination can serve as a deterrent for violence on embryos.

Keywords: consequentiality, Homo sapiens, neo-Newtonians, violence

Procedia PDF Downloads 109
1243 Adaptation of Projection Profile Algorithm for Skewed Handwritten Text Line Detection

Authors: Kayode A. Olaniyi, Tola. M. Osifeko, Adeola A. Ogunleye

Abstract:

Text line segmentation is an important step in document image processing. It represents a labeling process that assigns the same label using distance metric probability to spatially aligned units. Text line detection techniques have successfully been implemented mainly in printed documents. However, processing of the handwritten texts especially unconstrained documents has remained a key problem. This is because the unconstrained hand-written text lines are often not uniformly skewed. The spaces between text lines may not be obvious, complicated by the nature of handwriting and, overlapping ascenders and/or descenders of some characters. Hence, text lines detection and segmentation represents a leading challenge in handwritten document image processing. Text line detection methods that rely on the traditional global projection profile of the text document cannot efficiently confront with the problem of variable skew angles between different text lines. Hence, the formulation of a horizontal line as a separator is often not efficient. This paper presents a technique to segment a handwritten document into distinct lines of text. The proposed algorithm starts, by partitioning the initial text image into columns, across its width into chunks of about 5% each. At each vertical strip of 5%, the histogram of horizontal runs is projected. We have worked with the assumption that text appearing in a single strip is almost parallel to each other. The algorithm developed provides a sliding window through the first vertical strip on the left side of the page. It runs through to identify the new minimum corresponding to a valley in the projection profile. Each valley would represent the starting point of the orientation line and the ending point is the minimum point on the projection profile of the next vertical strip. The derived text-lines traverse around any obstructing handwritten vertical strips of connected component by associating it to either the line above or below. A decision of associating such connected component is made by the probability obtained from a distance metric decision. The technique outperforms the global projection profile for text line segmentation and it is robust to handle skewed documents and those with lines running into each other.

Keywords: connected-component, projection-profile, segmentation, text-line

Procedia PDF Downloads 94
1242 Coastal Hydraulic Modelling to Ascertain Stability of Rubble Mound Breakwater

Authors: Safari Mat Desa, Othman A. Karim, Mohd Kamarulhuda Samion, Saiful Bahri Hamzah

Abstract:

Rubble mound breakwater was one of the most popular designs in Malaysia, constructed at the river mouth to dissipate the incoming wave energy from the seaward. Geometrically characteristics in trapezoid, crest width, and bottom width will determine the hypotonus stability, whilst structural height was designed for wave overtopping consideration. Physical hydraulic modelling in two-dimensional facilities was instigated in the flume to test the stability as well as the overtopping rate complied with the method of similarity, namely kinematic, dynamic, and geometric. Scaling effects of wave characteristics were carried out in order to acquire significant interaction of wave height, wave period, and water depth. Results showed two-dimensional physical modelling has proven reliable capability to ascertain breakwater stability significantly.

Keywords: breakwater, geometrical characteristic, wave overtopping, physical hydraulic modelling, method of similarity, wave characteristic

Procedia PDF Downloads 80
1241 Computational Analysis of Potential Inhibitors Selected Based on Structural Similarity for the Src SH2 Domain

Authors: W. P. Hu, J. V. Kumar, Jeffrey J. P. Tsai

Abstract:

The inhibition of SH2 domain regulated protein-protein interactions is an attractive target for developing an effective chemotherapeutic approach in the treatment of disease. Molecular simulation is a useful tool for developing new drugs and for studying molecular recognition. In this study, we searched potential drug compounds for the inhibition of SH2 domain by performing structural similarity search in PubChem Compound Database. A total of 37 compounds were screened from the database, and then we used the LibDock docking program to evaluate the inhibition effect. The best three compounds (AP22408, CID 71463546 and CID 9917321) were chosen for MD simulations after the LibDock docking. Our results show that the compound CID 9917321 can produce a more stable protein-ligand complex compared to other two currently known inhibitors of Src SH2 domain. The compound CID 9917321 may be useful for the inhibition of SH2 domain based on these computational results. Subsequently experiments are needed to verify the effect of compound CID 9917321 on the SH2 domain in the future studies.

Keywords: nonpeptide inhibitor, Src SH2 domain, LibDock, molecular dynamics simulation

Procedia PDF Downloads 241
1240 Garment Industry Development in South East Asia and Competitiveness

Authors: P. Nayak, Shakeel Shaikh

Abstract:

In this paper, we analyse the apparel export performance of Southeast Asian Nations (ASEAN) in the world market. The study covers the 2003-2012 period at the sector as well as product levels (6 digit HS) and analysis is based HS 2002 nomenclature. We measure export similarity among Southeast Asian nations for the apparel sector (two digit HS-61 & 62), besides analysing the products performance in the world through Revealed Comparative Advantage (RCA) technique. Coupled with RCA, the price as a factor of competitiveness was examined from the available Unit Value Realizations (UVR). Further to this, the resource availability or outsourced from the region was considered as an extension to the analysis of competitiveness between the nations. With the help of these methodologies, we examine the degree of competition between the exports of southeast nations in the world market. Our results show that Cambodia, Indonesia, Thailand, and Vietnam are well performing states within ASEAN. The paper further delves into sustainability of the export performing countries within ASEAN.

Keywords: export competitiveness, export similarity index, revealed comparative advantage, unit value realisation

Procedia PDF Downloads 258
1239 Semantic Textual Similarity on Contracts: Exploring Multiple Negative Ranking Losses for Sentence Transformers

Authors: Yogendra Sisodia

Abstract:

Researchers are becoming more interested in extracting useful information from legal documents thanks to the development of large-scale language models in natural language processing (NLP), and deep learning has accelerated the creation of powerful text mining models. Legal fields like contracts benefit greatly from semantic text search since it makes it quick and easy to find related clauses. After collecting sentence embeddings, it is relatively simple to locate sentences with a comparable meaning throughout the entire legal corpus. The author of this research investigated two pre-trained language models for this task: MiniLM and Roberta, and further fine-tuned them on Legal Contracts. The author used Multiple Negative Ranking Loss for the creation of sentence transformers. The fine-tuned language models and sentence transformers showed promising results.

Keywords: legal contracts, multiple negative ranking loss, natural language inference, sentence transformers, semantic textual similarity

Procedia PDF Downloads 70
1238 Influence of Convective Boundary Condition on Chemically Reacting Micropolar Fluid Flow over a Truncated Cone Embedded in Porous Medium

Authors: Pradeepa Teegala, Ramreddy Chitteti

Abstract:

This article analyzes the mixed convection flow of chemically reacting micropolar fluid over a truncated cone embedded in non-Darcy porous medium with convective boundary condition. In addition, heat generation/absorption and Joule heating effects are taken into consideration. The similarity solution does not exist for this complex fluid flow problem, and hence non-similarity transformations are used to convert the governing fluid flow equations along with related boundary conditions into a set of nondimensional partial differential equations. Many authors have been applied the spectral quasi-linearization method to solve the ordinary differential equations, but here the resulting nonlinear partial differential equations are solved for non-similarity solution by using a recently developed method called the spectral quasi-linearization method (SQLM). Comparison with previously published work on special cases of the problem is performed and found to be in excellent agreement. The effect of pertinent parameters namely, Biot number, mixed convection parameter, heat generation/absorption, Joule heating, Forchheimer number, chemical reaction, micropolar and magnetic field on physical quantities of the flow are displayed through graphs and the salient features are explored in detail. Further, the results are analyzed by comparing with two special cases, namely, vertical plate and full cone wherever possible.

Keywords: chemical reaction, convective boundary condition, joule heating, micropolar fluid, mixed convection, spectral quasi-linearization method

Procedia PDF Downloads 255
1237 Hierarchical Piecewise Linear Representation of Time Series Data

Authors: Vineetha Bettaiah, Heggere S. Ranganath

Abstract:

This paper presents a Hierarchical Piecewise Linear Approximation (HPLA) for the representation of time series data in which the time series is treated as a curve in the time-amplitude image space. The curve is partitioned into segments by choosing perceptually important points as break points. Each segment between adjacent break points is recursively partitioned into two segments at the best point or midpoint until the error between the approximating line and the original curve becomes less than a pre-specified threshold. The HPLA representation achieves dimensionality reduction while preserving prominent local features and general shape of time series. The representation permits course-fine processing at different levels of details, allows flexible definition of similarity based on mathematical measures or general time series shape, and supports time series data mining operations including query by content, clustering and classification based on whole or subsequence similarity.

Keywords: data mining, dimensionality reduction, piecewise linear representation, time series representation

Procedia PDF Downloads 246
1236 A Deep Learning Based Approach for Dynamically Selecting Pre-processing Technique for Images

Authors: Revoti Prasad Bora, Nikita Katyal, Saurabh Yadav

Abstract:

Pre-processing plays an important role in various image processing applications. Most of the time due to the similar nature of images, a particular pre-processing or a set of pre-processing steps are sufficient to produce the desired results. However, in the education domain, there is a wide variety of images in various aspects like images with line-based diagrams, chemical formulas, mathematical equations, etc. Hence a single pre-processing or a set of pre-processing steps may not yield good results. Therefore, a Deep Learning based approach for dynamically selecting a relevant pre-processing technique for each image is proposed. The proposed method works as a classifier to detect hidden patterns in the images and predicts the relevant pre-processing technique needed for the image. This approach experimented for an image similarity matching problem but it can be adapted to other use cases too. Experimental results showed significant improvement in average similarity ranking with the proposed method as opposed to static pre-processing techniques.

Keywords: deep-learning, classification, pre-processing, computer vision, image processing, educational data mining

Procedia PDF Downloads 119
1235 Machine Learning Driven Analysis of Kepler Objects of Interest to Identify Exoplanets

Authors: Akshat Kumar, Vidushi

Abstract:

This paper identifies 27 KOIs, 26 of which are currently classified as candidates and one as false positives that have a high probability of being confirmed. For this purpose, 11 machine learning algorithms were implemented on the cumulative kepler dataset sourced from the NASA exoplanet archive; it was observed that the best-performing model was HistGradientBoosting and XGBoost with a test accuracy of 93.5%, and the lowest-performing model was Gaussian NB with a test accuracy of 54%, to test model performance F1, cross-validation score and RUC curve was calculated. Based on the learned models, the significant characteristics for confirm exoplanets were identified, putting emphasis on the object’s transit and stellar properties; these characteristics were namely koi_count, koi_prad, koi_period, koi_dor, koi_ror, and koi_smass, which were later considered to filter out the potential KOIs. The paper also calculates the Earth similarity index based on the planetary radius and equilibrium temperature for each KOI identified to aid in their classification.

Keywords: Kepler objects of interest, exoplanets, space exploration, machine learning, earth similarity index, transit photometry

Procedia PDF Downloads 27
1234 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev

Abstract:

Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: data analysis, drug development, sentiment analysis, text-mining

Procedia PDF Downloads 125
1233 Detecting Paraphrases in Arabic Text

Authors: Amal Alshahrani, Allan Ramsay

Abstract:

Paraphrasing is one of the important tasks in natural language processing; i.e. alternative ways to express the same concept by using different words or phrases. Paraphrases can be used in many natural language applications, such as Information Retrieval, Machine Translation, Question Answering, Text Summarization, or Information Extraction. To obtain pairs of sentences that are paraphrases we create a system that automatically extracts paraphrases from a corpus, which is built from different sources of news article since these are likely to contain paraphrases when they report the same event on the same day. There are existing simple standard approaches (e.g. TF-IDF vector space, cosine similarity) and alignment technique (e.g. Dynamic Time Warping (DTW)) for extracting paraphrase which have been applied to the English. However, the performance of these approaches could be affected when they are applied to another language, for instance Arabic language, due to the presence of phenomena which are not present in English, such as Free Word Order, Zero copula, and Pro-dropping. These phenomena will affect the performance of these algorithms. Thus, if we can analysis how the existing algorithms for English fail for Arabic then we can find a solution for Arabic. The results are promising.

Keywords: natural language processing, TF-IDF, cosine similarity, dynamic time warping (DTW)

Procedia PDF Downloads 355
1232 Efficient Layout-Aware Pretraining for Multimodal Form Understanding

Authors: Armineh Nourbakhsh, Sameena Shah, Carolyn Rose

Abstract:

Layout-aware language models have been used to create multimodal representations for documents that are in image form, achieving relatively high accuracy in document understanding tasks. However, the large number of parameters in the resulting models makes building and using them prohibitive without access to high-performing processing units with large memory capacity. We propose an alternative approach that can create efficient representations without the need for a neural visual backbone. This leads to an 80% reduction in the number of parameters compared to the smallest SOTA model, widely expanding applicability. In addition, our layout embeddings are pre-trained on spatial and visual cues alone and only fused with text embeddings in downstream tasks, which can facilitate applicability to low-resource of multi-lingual domains. Despite using 2.5% of training data, we show competitive performance on two form understanding tasks: semantic labeling and link prediction.

Keywords: layout understanding, form understanding, multimodal document understanding, bias-augmented attention

Procedia PDF Downloads 121
1231 The Gender Dialectic in Mothers and Daughters’ Relationships

Authors: Ronit Even Zahav

Abstract:

Objectives: Mother-daughter relationships are often portrayed as one of the most constitutive ties that shape women's identities throughout their lives. Yet, to the best of author’s knowledge, only few studies examine mother-daughter relationships in adulthood in the context of cross-cultural transition. Most of them focus on the mother-daughter relationship among one origin group. Hence, the existing knowledge about these relationships in adulthood, in the context of intercultural transition and encounters between different cultures, remain limited. Based on a critical feminist approach critical and cultural perspectives the current study focuses on a cross-cultural comparison of adult mother-daughter relationships among three groups of origin: Ethiopia, Russia, and Israel. The study aimed to: Explore the voices of women participating in a mother-daughter discourse in the context of gender and ethnicity; examine the differences in the mother-daughter relationship through number of factors (e.g. expectations of similarity and difference, perceptions of gender roles, gender identity, emotional closeness, sharing and stress) and finally, to develop a gender informed tool for understanding the gender dialectic in mother-daughter relationship in the context of cross cultural transitions. Method: 37 dyads of mothers and adult daughters participated in a qualitative study. A semi-structured interview was conducted that included questions about socio-demographic characteristics, language proficiency, social distance, closeness, emotional stress, and expectations of similarity and difference in mother-daughter relationships. Results: Analysis of the findings yielded three relationship patterns of gender dialectic and expectations of similarity and difference that characterize the groups of origin. Ethiopian mothers reported more sharing their daughters, fewer expectations of similarity, and felt more stress in the relationship compered to women from the two other origin groups. Conclusions: The study highlighted the impact of intercultural transition and social exclusion on mother-daughter relationships in adulthood in the context of the gender dialectic and women’s status in society. The presentation will explore the findings that were brought up by participants. The discussion will focus on the practices related to gender dialectic and intersecting inequalities regarding diverse groups and discuss gender development reducing inequalities and promoting empowerment to transform oppressive conditions.

Keywords: gender informed perspectives, gender dialectic, mother-daughter relationships, multiculturalism

Procedia PDF Downloads 37
1230 Clustering Ethno-Informatics of Naming Village in Java Island Using Data Mining

Authors: Atje Setiawan Abdullah, Budi Nurani Ruchjana, I. Gede Nyoman Mindra Jaya, Eddy Hermawan

Abstract:

Ethnoscience is used to see the culture with a scientific perspective, which may help to understand how people develop various forms of knowledge and belief, initially focusing on the ecology and history of the contributions that have been there. One of the areas studied in ethnoscience is etno-informatics, is the application of informatics in the culture. In this study the science of informatics used is data mining, a process to automatically extract knowledge from large databases, to obtain interesting patterns in order to obtain a knowledge. While the application of culture described by naming database village on the island of Java were obtained from Geographic Indonesia Information Agency (BIG), 2014. The purpose of this study is; first, to classify the naming of the village on the island of Java based on the structure of the word naming the village, including the prefix of the word, syllable contained, and complete word. Second to classify the meaning of naming the village based on specific categories, as well as its role in the community behavioral characteristics. Third, how to visualize the naming of the village to a map location, to see the similarity of naming villages in each province. In this research we have developed two theorems, i.e theorems area as a result of research studies have collected intersection naming villages in each province on the island of Java, and the composition of the wedge theorem sets the provinces in Java is used to view the peculiarities of a location study. The methodology in this study base on the method of Knowledge Discovery in Database (KDD) on data mining, the process includes preprocessing, data mining and post processing. The results showed that the Java community prioritizes merit in running his life, always working hard to achieve a more prosperous life, and love as well as water and environmental sustainment. Naming villages in each location adjacent province has a high degree of similarity, and influence each other. Cultural similarities in the province of Central Java, East Java and West Java-Banten have a high similarity, whereas in Jakarta-Yogyakarta has a low similarity. This research resulted in the cultural character of communities within the meaning of the naming of the village on the island of Java, this character is expected to serve as a guide in the behavior of people's daily life on the island of Java.

Keywords: ethnoscience, ethno-informatics, data mining, clustering, Java island culture

Procedia PDF Downloads 254
1229 Teachers' Beliefs and Practices in Designing Negotiated English Lesson Plans

Authors: Joko Nurkamto

Abstract:

A lesson plan is a part of the planning phase in a learning and teaching system framing the scenario of pedagogical activities in the classroom. It informs a decision on what to teach and how to landscape classroom interaction. Regardless of these benefits, the writer has witnessed the fact that lesson plans are viewed merely as a teaching document. Therefore, this paper will explore teachers’ beliefs and practices in designing lesson plans. It focuses primarily on how both teachers and students negotiate lesson plans in which the students are deemed to be the agents of instructional innovations. Additionally, the paper will talk about how such lesson plans are enacted. To investigate these issues, document analysis, in-depth interviews, participant classroom observation, and focus group discussion will be deployed as data collection methods in this explorative case study. The benefits of the paper are to show different roles of lesson plans and to discover different ways to design and enact such plans from a socio-interactional perspective.

Keywords: instructional innovation, learning and teaching system, lesson plan, pedagogical activities, teachers' beliefs and practices

Procedia PDF Downloads 129
1228 Unsupervised Classification of DNA Barcodes Species Using Multi-Library Wavelet Networks

Authors: Abdesselem Dakhli, Wajdi Bellil, Chokri Ben Amar

Abstract:

DNA Barcode, a short mitochondrial DNA fragment, made up of three subunits; a phosphate group, sugar and nucleic bases (A, T, C, and G). They provide good sources of information needed to classify living species. Such intuition has been confirmed by many experimental results. Species classification with DNA Barcode sequences has been studied by several researchers. The classification problem assigns unknown species to known ones by analyzing their Barcode. This task has to be supported with reliable methods and algorithms. To analyze species regions or entire genomes, it becomes necessary to use similarity sequence methods. A large set of sequences can be simultaneously compared using Multiple Sequence Alignment which is known to be NP-complete. To make this type of analysis feasible, heuristics, like progressive alignment, have been developed. Another tool for similarity search against a database of sequences is BLAST, which outputs shorter regions of high similarity between a query sequence and matched sequences in the database. However, all these methods are still computationally very expensive and require significant computational infrastructure. Our goal is to build predictive models that are highly accurate and interpretable. This method permits to avoid the complex problem of form and structure in different classes of organisms. On empirical data and their classification performances are compared with other methods. Our system consists of three phases. The first is called transformation, which is composed of three steps; Electron-Ion Interaction Pseudopotential (EIIP) for the codification of DNA Barcodes, Fourier Transform and Power Spectrum Signal Processing. The second is called approximation, which is empowered by the use of Multi Llibrary Wavelet Neural Networks (MLWNN).The third is called the classification of DNA Barcodes, which is realized by applying the algorithm of hierarchical classification.

Keywords: DNA barcode, electron-ion interaction pseudopotential, Multi Library Wavelet Neural Networks (MLWNN)

Procedia PDF Downloads 288
1227 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 186
1226 Lexical Based Method for Opinion Detection on Tripadvisor Collection

Authors: Faiza Belbachir, Thibault Schienhinski

Abstract:

The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.

Keywords: Tripadvisor, opinion detection, SentiWordNet, trust score

Procedia PDF Downloads 166
1225 Data Clustering Algorithm Based on Multi-Objective Periodic Bacterial Foraging Optimization with Two Learning Archives

Authors: Chen Guo, Heng Tang, Ben Niu

Abstract:

Clustering splits objects into different groups based on similarity, making the objects have higher similarity in the same group and lower similarity in different groups. Thus, clustering can be treated as an optimization problem to maximize the intra-cluster similarity or inter-cluster dissimilarity. In real-world applications, the datasets often have some complex characteristics: sparse, overlap, high dimensionality, etc. When facing these datasets, simultaneously optimizing two or more objectives can obtain better clustering results than optimizing one objective. However, except for the objectives weighting methods, traditional clustering approaches have difficulty in solving multi-objective data clustering problems. Due to this, evolutionary multi-objective optimization algorithms are investigated by researchers to optimize multiple clustering objectives. In this paper, the Data Clustering algorithm based on Multi-objective Periodic Bacterial Foraging Optimization with two Learning Archives (DC-MPBFOLA) is proposed. Specifically, first, to reduce the high computing complexity of the original BFO, periodic BFO is employed as the basic algorithmic framework. Then transfer the periodic BFO into a multi-objective type. Second, two learning strategies are proposed based on the two learning archives to guide the bacterial swarm to move in a better direction. On the one hand, the global best is selected from the global learning archive according to the convergence index and diversity index. On the other hand, the personal best is selected from the personal learning archive according to the sum of weighted objectives. According to the aforementioned learning strategies, a chemotaxis operation is designed. Third, an elite learning strategy is designed to provide fresh power to the objects in two learning archives. When the objects in these two archives do not change for two consecutive times, randomly initializing one dimension of objects can prevent the proposed algorithm from falling into local optima. Fourth, to validate the performance of the proposed algorithm, DC-MPBFOLA is compared with four state-of-art evolutionary multi-objective optimization algorithms and one classical clustering algorithm on evaluation indexes of datasets. To further verify the effectiveness and feasibility of designed strategies in DC-MPBFOLA, variants of DC-MPBFOLA are also proposed. Experimental results demonstrate that DC-MPBFOLA outperforms its competitors regarding all evaluation indexes and clustering partitions. These results also indicate that the designed strategies positively influence the performance improvement of the original BFO.

Keywords: data clustering, multi-objective optimization, bacterial foraging optimization, learning archives

Procedia PDF Downloads 111
1224 Document-level Sentiment Analysis: An Exploratory Case Study of Low-resource Language Urdu

Authors: Ammarah Irum, Muhammad Ali Tahir

Abstract:

Document-level sentiment analysis in Urdu is a challenging Natural Language Processing (NLP) task due to the difficulty of working with lengthy texts in a language with constrained resources. Deep learning models, which are complex neural network architectures, are well-suited to text-based applications in addition to data formats like audio, image, and video. To investigate the potential of deep learning for Urdu sentiment analysis, we implemented five different deep learning models, including Bidirectional Long Short Term Memory (BiLSTM), Convolutional Neural Network (CNN), Convolutional Neural Network with Bidirectional Long Short Term Memory (CNN-BiLSTM), and Bidirectional Encoder Representation from Transformer (BERT). In this study, we developed a hybrid deep learning model called BiLSTM-Single Layer Multi Filter Convolutional Neural Network (BiLSTM-SLMFCNN) by fusing BiLSTM and CNN architecture. The proposed and baseline techniques are applied on Urdu Customer Support data set and IMDB Urdu movie review data set by using pre-trained Urdu word embedding that are suitable for sentiment analysis at the document level. Results of these techniques are evaluated and our proposed model outperforms all other deep learning techniques for Urdu sentiment analysis. BiLSTM-SLMFCNN outperformed the baseline deep learning models and achieved 83%, 79%, 83% and 94% accuracy on small, medium and large sized IMDB Urdu movie review data set and Urdu Customer Support data set respectively.

Keywords: urdu sentiment analysis, deep learning, natural language processing, opinion mining, low-resource language

Procedia PDF Downloads 37