Search results for: semantic indexing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 549

Search results for: semantic indexing

429 Modified Active (MA) Algorithm to Generate Semantic Web Related Clustered Hierarchy for Keyword Search

Authors: G. Leena Giri, Archana Mathur, S. H. Manjula, K. R. Venugopal, L. M. Patnaik

Abstract:

Keyword search in XML documents is based on the notion of lowest common ancestors in the labelled trees model of XML documents and has recently gained a lot of research interest in the database community. In this paper, we propose the Modified Active (MA) algorithm which is an improvement over the active clustering algorithm by taking into consideration the entity aspect of the nodes to find the level of the node pertaining to a particular keyword input by the user. A portion of the bibliography database is used to experimentally evaluate the modified active algorithm and results show that it performs better than the active algorithm. Our modification improves the response time of the system and thereby increases the efficiency of the system.

Keywords: keyword matching patterns, MA algorithm, semantic search, knowledge management

Procedia PDF Downloads 371
428 The Oral Production of University EFL Students: An Analysis of Tasks, Format, and Quality in Foreign Language Development

Authors: Vera Lucia Teixeira da Silva, Sandra Regina Buttros Gattolin de Paula

Abstract:

The present study focuses on academic literacy and addresses the impact of semantic-discursive resources on the constitution of genres that are produced in such context. The research considers the development of writing in the academic context in Portuguese. Researches that address academic literacy and the characteristics of the texts produced in this context are rare, mainly with focus on the development of writing, considering three variables: the constitution of the writer, the perception of the reader/interlocutor and the organization of the informational text flow. The research aims to map the semantic-discursive resources of the written register in texts of several genres and produced by students in the first semester of the undergraduate course in Letters. The hypothesis raised is that writing in the academic environment is not a recurrent literacy practice for these learners and can be explained by the ontogenetic and phylogenetic nature of language development. Qualitative in nature, the present research has as empirical data texts produced in a half-yearly course of Reading and Textual Production; these data result from the proposition of four different writing proposals, in a total of 600 texts. The corpus is analyzed based on semantic-discursive resources, seeking to contemplate relevant aspects of language (grammar, discourse and social context) that reveal the choices made in the reader/writer interrelationship and the organizational flow of the Text. Among the semantic-discursive resources, the analysis includes three resources, including (a) appraisal and negotiation to understand the attitudes negotiated (roles of the participants of the discourse and their relationship with the other); (b) ideation to explain the construction of the experience (activities performed and participants); and (c) periodicity to outline the flow of information in the organization of the text according to the genre it instantiates. The results indicate the organizational difficulties of the flow of the text information. Cartography contributes to the understanding of the way writers use language in an effort to present themselves, evaluate someone else’s work, and communicate with readers.

Keywords: academic writing, Portuguese mother tongue, semantic-discursive resources, academic context

Procedia PDF Downloads 89
427 Probing Language Models for Multiple Linguistic Information

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, large-scale pre-trained language models have achieved state-of-the-art performance on a variety of natural language processing tasks. The word vectors produced by these language models can be viewed as dense encoded presentations of natural language that in text form. However, it is unknown how much linguistic information is encoded and how. In this paper, we construct several corresponding probing tasks for multiple linguistic information to clarify the encoding capabilities of different language models and performed a visual display. We firstly obtain word presentations in vector form from different language models, including BERT, ELMo, RoBERTa and GPT. Classifiers with a small scale of parameters and unsupervised tasks are then applied on these word vectors to discriminate their capability to encode corresponding linguistic information. The constructed probe tasks contain both semantic and syntactic aspects. The semantic aspect includes the ability of the model to understand semantic entities such as numbers, time, and characters, and the grammatical aspect includes the ability of the language model to understand grammatical structures such as dependency relationships and reference relationships. We also compare encoding capabilities of different layers in the same language model to infer how linguistic information is encoded in the model.

Keywords: language models, probing task, text presentation, linguistic information

Procedia PDF Downloads 63
426 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 395
425 Understanding the Semantic Network of Tourism Studies in Taiwan by Using Bibliometrics Analysis

Authors: Chun-Min Lin, Yuh-Jen Wu, Ching-Ting Chung

Abstract:

The formulation of tourism policies requires objective academic research and evidence as support, especially research from local academia. Taiwan is a small island, and its economic growth relies heavily on tourism revenue. Taiwanese government has been devoting to the promotion of the tourism industry over the past few decades. Scientific research outcomes by Taiwanese scholars may and will help lay the foundations for drafting future tourism policy by the government. In this study, a total of 120 full journal articles published between 2008 and 2016 from the Journal of Tourism and Leisure Studies (JTSL) were examined to explore the scientific research trend of tourism study in Taiwan. JTSL is one of the most important Taiwanese journals in the tourism discipline which focuses on tourism-related issues and uses traditional Chinese as the study language. The method of co-word analysis from bibliometrics approaches was employed for semantic analysis in this study. When analyzing Chinese words and phrases, word segmentation analysis is a crucial step. It must be carried out initially and precisely in order to obtain meaningful word or word chunks for further frequency calculation. A word segmentation system basing on N-gram algorithm was developed in this study to conduct semantic analysis, and 100 groups of meaningful phrases with the highest recurrent rates were located. Subsequently, co-word analysis was employed for semantic classification. The results showed that the themes of tourism research in Taiwan in recent years cover the scope of tourism education, environmental protection, hotel management, information technology, and senior tourism. The results can give insight on the related issues and serve as a reference for tourism-related policy making and follow-up research.

Keywords: bibliometrics, co-word analysis, word segmentation, tourism research, policy

Procedia PDF Downloads 201
424 On Early Verb Acquisition in Chinese-Speaking Children

Authors: Yating Mu

Abstract:

Young children acquire native language with amazing rapidity. After noticing this interesting phenomenon, lots of linguistics, as well as psychologists, devote themselves to exploring the best explanations. Thus researches on first language acquisition emerged. Early lexical development is an important branch of children’s FLA (first language acquisition). Verb, the most significant class of lexicon, the most grammatically complex syntactic category or word type, is not only the core of exploring syntactic structures of language but also plays a key role in analyzing semantic features. Obviously, early verb development must have great impacts on children’s early lexical acquisition. Most scholars conclude that verbs, in general, are very difficult to learn because the problem in verb learning might be more about mapping a specific verb onto an action or event than about learning the underlying relational concepts that the verb or relational term encodes. However, the previous researches on early verb development mainly focus on the argument about whether there is a noun-bias or verb-bias in children’s early productive vocabulary. There are few researches on general characteristics of children’s early verbs concerning both semantic and syntactic aspects, not mentioning a general survey on Chinese-speaking children’s verb acquisition. Therefore, the author attempts to examine the general conditions and characteristics of Chinese-speaking children’s early productive verbs, based on data from a longitudinal study on three Chinese-speaking children. In order to present an overall picture of Chinese verb development, both semantic and syntactic aspects will be focused in the present study. As for semantic analysis, a classification method is adopted first. Verb category is a sophisticated class in Mandarin, so it is quite necessary to divide it into small sub-types, thus making the research much easier. By making a reasonable classification of eight verb classes on basis of semantic features, the research aims at finding out whether there exist any universal rules in Chinese-speaking children’s verb development. With regard to the syntactic aspect of verb category, a debate between nativist account and usage-based approach has lasted for quite a long time. By analyzing the longitudinal Mandarin data, the author attempts to find out whether the usage-based theory can fully explain characteristics in Chinese verb development. To sum up, this thesis attempts to apply the descriptive research method to investigate the acquisition and the usage of Chinese-speaking children’s early verbs, on purpose of providing a new perspective in investigating semantic and syntactic features of early verb acquisition.

Keywords: Chinese-speaking children, early verb acquisition, verb classes, verb grammatical structures

Procedia PDF Downloads 327
423 An Approach to Specify Software Requirements in Semantic Form

Authors: Deepa Vijay, Chellammal Surianarayanan, Gopinath Ganapathy

Abstract:

Requirements of a software project serve as a guideline for the entire project team which enable the team towards producing the right outcome. As requirements are the key in deciding the success of the project, it should be specified in an unambiguous manner. Also, the requirements should be complete and consistent. It should be interpreted in the same way by the entire software project team as the customer interprets. Specifying requirements in textual manner is common in software development. This leads to poor understanding of the requirements which results in more errors and degraded quality. There are some literatures which focus on semantic way of specifying functional requirement which ensure the consistency and completeness of requirements. Alternately in the work, a method is proposed to map the syntactic requirements with corresponding semantics in the form of ontologies. This improves the understanding of requirements, prevents errors and improves quality.

Keywords: functional requirement, ontology, requirements management, semantics

Procedia PDF Downloads 332
422 Automated Adaptions of Semantic User- and Service Profile Representations by Learning the User Context

Authors: Nicole Merkle, Stefan Zander

Abstract:

Ambient Assisted Living (AAL) describes a technological and methodological stack of (e.g. formal model-theoretic semantics, rule-based reasoning and machine learning), different aspects regarding the behavior, activities and characteristics of humans. Hence, a semantic representation of the user environment and its relevant elements are required in order to allow assistive agents to recognize situations and deduce appropriate actions. Furthermore, the user and his/her characteristics (e.g. physical, cognitive, preferences) need to be represented with a high degree of expressiveness in order to allow software agents a precise evaluation of the users’ context models. The correct interpretation of these context models highly depends on temporal, spatial circumstances as well as individual user preferences. In most AAL approaches, model representations of real world situations represent the current state of a universe of discourse at a given point in time by neglecting transitions between a set of states. However, the AAL domain currently lacks sufficient approaches that contemplate on the dynamic adaptions of context-related representations. Semantic representations of relevant real-world excerpts (e.g. user activities) help cognitive, rule-based agents to reason and make decisions in order to help users in appropriate tasks and situations. Furthermore, rules and reasoning on semantic models are not sufficient for handling uncertainty and fuzzy situations. A certain situation can require different (re-)actions in order to achieve the best results with respect to the user and his/her needs. But what is the best result? To answer this question, we need to consider that every smart agent requires to achieve an objective, but this objective is mostly defined by domain experts who can also fail in their estimation of what is desired by the user and what not. Hence, a smart agent has to be able to learn from context history data and estimate or predict what is most likely in certain contexts. Furthermore, different agents with contrary objectives can cause collisions as their actions influence the user’s context and constituting conditions in unintended or uncontrolled ways. We present an approach for dynamically updating a semantic model with respect to the current user context that allows flexibility of the software agents and enhances their conformance in order to improve the user experience. The presented approach adapts rules by learning sensor evidence and user actions using probabilistic reasoning approaches, based on given expert knowledge. The semantic domain model consists basically of device-, service- and user profile representations. In this paper, we present how this semantic domain model can be used in order to compute the probability of matching rules and actions. We apply this probability estimation to compare the current domain model representation with the computed one in order to adapt the formal semantic representation. Our approach aims at minimizing the likelihood of unintended interferences in order to eliminate conflicts and unpredictable side-effects by updating pre-defined expert knowledge according to the most probable context representation. This enables agents to adapt to dynamic changes in the environment which enhances the provision of adequate assistance and affects positively the user satisfaction.

Keywords: ambient intelligence, machine learning, semantic web, software agents

Procedia PDF Downloads 253
421 Emerging Technology for Business Intelligence Applications

Authors: Hsien-Tsen Wang

Abstract:

Business Intelligence (BI) has long helped organizations make informed decisions based on data-driven insights and gain competitive advantages in the marketplace. In the past two decades, businesses witnessed not only the dramatically increasing volume and heterogeneity of business data but also the emergence of new technologies, such as Artificial Intelligence (AI), Semantic Web (SW), Cloud Computing, and Big Data. It is plausible that the convergence of these technologies would bring more value out of business data by establishing linked data frameworks and connecting in ways that enable advanced analytics and improved data utilization. In this paper, we first review and summarize current BI applications and methodology. Emerging technologies that can be integrated into BI applications are then discussed. Finally, we conclude with a proposed synergy framework that aims at achieving a more flexible, scalable, and intelligent BI solution.

Keywords: business intelligence, artificial intelligence, semantic web, big data, cloud computing

Procedia PDF Downloads 62
420 Semantic Differences between Bug Labeling of Different Repositories via Machine Learning

Authors: Pooja Khanal, Huaming Zhang

Abstract:

Labeling of issues/bugs, also known as bug classification, plays a vital role in software engineering. Some known labels/classes of bugs are 'User Interface', 'Security', and 'API'. Most of the time, when a reporter reports a bug, they try to assign some predefined label to it. Those issues are reported for a project, and each project is a repository in GitHub/GitLab, which contains multiple issues. There are many software project repositories -ranging from individual projects to commercial projects. The labels assigned for different repositories may be dependent on various factors like human instinct, generalization of labels, label assignment policy followed by the reporter, etc. While the reporter of the issue may instinctively give that issue a label, another person reporting the same issue may label it differently. This way, it is not known mathematically if a label in one repository is similar or different to the label in another repository. Hence, the primary goal of this research is to find the semantic differences between bug labeling of different repositories via machine learning. Independent optimal classifiers for individual repositories are built first using the text features from the reported issues. The optimal classifiers may include a combination of multiple classifiers stacked together. Then, those classifiers are used to cross-test other repositories which leads the result to be deduced mathematically. The produce of this ongoing research includes a formalized open-source GitHub issues database that is used to deduce the similarity of the labels pertaining to the different repositories.

Keywords: bug classification, bug labels, GitHub issues, semantic differences

Procedia PDF Downloads 165
419 Semantic-Based Collaborative Filtering to Improve Visitor Cold Start in Recommender Systems

Authors: Baba Mbaye

Abstract:

In collaborative filtering recommendation systems, a user receives suggested items based on the opinions and evaluations of a community of users. This type of recommendation system uses only the information (notes in numerical values) contained in a usage matrix as input data. This matrix can be constructed based on users' behaviors or by offering users to declare their opinions on the items they know. The cold start problem leads to very poor performance for new users. It is a phenomenon that occurs at the beginning of use, in the situation where the system lacks data to make recommendations. There are three types of cold start problems: cold start for a new item, a new system, and a new user. We are interested in this article at the cold start for a new user. When the system welcomes a new user, the profile exists but does not have enough data, and its communities with other users profiles are still unknown. This leads to recommendations not adapted to the profile of the new user. In this paper, we propose an approach that improves cold start by using the notions of similarity and semantic proximity between users profiles during cold start. We will use the cold-metadata available (metadata extracted from the new user's data) useful in positioning the new user within a community. The aim is to look for similarities and semantic proximities with the old and current user profiles of the system. Proximity is represented by close concepts considered to belong to the same group, while similarity groups together elements that appear similar. Similarity and proximity are two close but not similar concepts. This similarity leads us to the construction of similarity which is based on: a) the concepts (properties, terms, instances) independent of ontology structure and, b) the simultaneous representation of the two concepts (relations, presence of terms in a document, simultaneous presence of the authorities). We propose an ontology, OIVCSRS (Ontology of Improvement Visitor Cold Start in Recommender Systems), in order to structure the terms and concepts representing the meaning of an information field, whether by the metadata of a namespace, or the elements of a knowledge domain. This approach allows us to automatically attach the new user to a user community, partially compensate for the data that was not initially provided and ultimately to associate a better first profile with the cold start. Thus, the aim of this paper is to propose an approach to improving cold start using semantic technologies.

Keywords: visitor cold start, recommender systems, collaborative filtering, semantic filtering

Procedia PDF Downloads 191
418 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 418
417 Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design

Authors: Shen Jian, Hu Jie, Ma Jin, Peng Ying Hong, Fang Yi, Liu Wen Hai

Abstract:

Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.

Keywords: knowledge clustering, knowledge acquisition, knowledge based engineering, knowledge cell, biologically inspired design

Procedia PDF Downloads 401
416 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 123
415 A Study of Mandarin Ba Constructions from the Perspective of Event Structure

Authors: Changyin Zhou

Abstract:

Ba constructions are a special type of constructions in Chinese. Their syntactic behaviors are closely related to their event structural properties. The existing study which treats the semantic function of Ba as causative meets difficulty in treating the discrepancy between Ba constructions and their corresponding constructions without Ba in expressing causativity. This paper holds that Ba in Ba constructions is a functional category expressing affectedness. The affectedness expressed by Ba can be positive or negative. The functional category Ba expressing negative affectedness has the semantic property of being 'expected'. The precondition of Ba construction is the boundedness of the event concerned. This paper, holding the parallelism between motion events and change-of-state events, proposes a syntactic model based on the notions of boundedness and affectedness, discusses the transformations between Ba constructions and the related resultative constructions, and derivates the various Ba constructions concerned.

Keywords: affectedness, Ba constructions, boundedness, event structure, resultative constructions

Procedia PDF Downloads 397
414 Classification of Contexts for Mentioning Love in Interviews with Victims of the Holocaust

Authors: Marina Yurievna Aleksandrova

Abstract:

Research of the Holocaust retains value not only for history but also for sociology and psychology. One of the most important fields of study is how people were coping during and after this traumatic event. The aim of this paper is to identify the main contexts of the topic of love and to determine which contexts are more characteristic for different groups of victims of the Holocaust (gender, nationality, age). In this research, transcripts of interviews with Holocaust victims that were collected during 1946 for the "Voices of the Holocaust" project were used as data. Main contexts were analyzed with methods of network analysis and latent semantic analysis and classified by gender, age, and nationality with random forest. The results show that love is articulated and described significantly differently for male and female informants, nationality is shown results with lower values of quality metrics, as well as the age.

Keywords: Holocaust, latent semantic analysis, network analysis, text-mining, random forest

Procedia PDF Downloads 152
413 Semantic Network Analysis of the Saudi Women Driving Decree

Authors: Dania Aljouhi

Abstract:

September 26th, 2017, is a historic date for all women in Saudi Arabia. On that day, Saudi Arabia announced the decree on allowing Saudi women to drive. With the advent of vision 2030 and its goal to empower women and increase their participation in Saudi society, we see how Saudis’ Twitter users deliberate the 2017 decree from different social, cultural, religious, economic and political factors. This topic bridges social media 'Twitter,' gender and social-cultural studies to offer insights into how Saudis’ tweets reflect a broader discourse on Saudi women in the age of social media. The present study aims to explore the meanings and themes that emerge by Saudis’ Twitter users in response to the 2017 royal decree on women driving. The sample used in the current study involves (n= 1000) tweets that were collected from Sep 2017 to March 2019 to account for the Saudis’ tweets before and after implementing the decree. The paper uses semantic and thematic network analysis methods to examine the Saudis’ Twitter discourse on the women driving issue. The paper argues that Twitter as a platform has mediated the discourse of women driving among the Saudi community and facilitated social changes. Finally, framing theory (Goffman, 1974) and Networked framing (Meraz & Papacharissi 2013) are both used to explain the tweets on the decree of allowing Saudi women to drive based on # Saudi women-driving-cars.

Keywords: Saudi Arabia, women, Twitter, semantic network analysis, framing

Procedia PDF Downloads 116
412 Deep Vision: A Robust Dominant Colour Extraction Framework for T-Shirts Based on Semantic Segmentation

Authors: Kishore Kumar R., Kaustav Sengupta, Shalini Sood Sehgal, Poornima Santhanam

Abstract:

Fashion is a human expression that is constantly changing. One of the prime factors that consistently influences fashion is the change in colour preferences. The role of colour in our everyday lives is very significant. It subconsciously explains a lot about one’s mindset and mood. Analyzing the colours by extracting them from the outfit images is a critical study to examine the individual’s/consumer behaviour. Several research works have been carried out on extracting colours from images, but to the best of our knowledge, there were no studies that extract colours to specific apparel and identify colour patterns geographically. This paper proposes a framework for accurately extracting colours from T-shirt images and predicting dominant colours geographically. The proposed method consists of two stages: first, a U-Net deep learning model is adopted to segment the T-shirts from the images. Second, the colours are extracted only from the T-shirt segments. The proposed method employs the iMaterialist (Fashion) 2019 dataset for the semantic segmentation task. The proposed framework also includes a mechanism for gathering data and analyzing India’s general colour preferences. From this research, it was observed that black and grey are the dominant colour in different regions of India. The proposed method can be adapted to study fashion’s evolving colour preferences.

Keywords: colour analysis in t-shirts, convolutional neural network, encoder-decoder, k-means clustering, semantic segmentation, U-Net model

Procedia PDF Downloads 74
411 Automatic Multi-Label Image Annotation System Guided by Firefly Algorithm and Bayesian Method

Authors: Saad M. Darwish, Mohamed A. El-Iskandarani, Guitar M. Shawkat

Abstract:

Nowadays, the amount of available multimedia data is continuously on the rise. The need to find a required image for an ordinary user is a challenging task. Content based image retrieval (CBIR) computes relevance based on the visual similarity of low-level image features such as color, textures, etc. However, there is a gap between low-level visual features and semantic meanings required by applications. The typical method of bridging the semantic gap is through the automatic image annotation (AIA) that extracts semantic features using machine learning techniques. In this paper, a multi-label image annotation system guided by Firefly and Bayesian method is proposed. Firstly, images are segmented using the maximum variance intra cluster and Firefly algorithm, which is a swarm-based approach with high convergence speed, less computation rate and search for the optimal multiple threshold. Feature extraction techniques based on color features and region properties are applied to obtain the representative features. After that, the images are annotated using translation model based on the Net Bayes system, which is efficient for multi-label learning with high precision and less complexity. Experiments are performed using Corel Database. The results show that the proposed system is better than traditional ones for automatic image annotation and retrieval.

Keywords: feature extraction, feature selection, image annotation, classification

Procedia PDF Downloads 554
410 Graph-Based Semantical Extractive Text Analysis

Authors: Mina Samizadeh

Abstract:

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.

Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis

Procedia PDF Downloads 38
409 Ontological Modeling Approach for Statistical Databases Publication in Linked Open Data

Authors: Bourama Mane, Ibrahima Fall, Mamadou Samba Camara, Alassane Bah

Abstract:

At the level of the National Statistical Institutes, there is a large volume of data which is generally in a format which conditions the method of publication of the information they contain. Each household or business data collection project includes a dissemination platform for its implementation. Thus, these dissemination methods previously used, do not promote rapid access to information and especially does not offer the option of being able to link data for in-depth processing. In this paper, we present an approach to modeling these data to publish them in a format intended for the Semantic Web. Our objective is to be able to publish all this data in a single platform and offer the option to link with other external data sources. An application of the approach will be made on data from major national surveys such as the one on employment, poverty, child labor and the general census of the population of Senegal.

Keywords: Semantic Web, linked open data, database, statistic

Procedia PDF Downloads 148
408 'Caucasian Mountaineer / Scottish Highlander': Correlation between Semantics and Culture

Authors: Natalia M. Nepomniashchikh

Abstract:

The research focuses on Russian and English linguoculturemes Caucasian mountaineer and Scottish Highlander, the effort of comparative-contrastive analysis was made. In order to reach the aim, the analysis of the vocabulary definitions of the concepts under consideration was taken, which made it possible to build the lexical-semantic fields of both lexical items in Russian and English. This stage of research helped to turn to the linguistic-cultural fields construction. To build these fields, literary pieces containing the concepts under consideration and the items directly related to them were taken from the works about the Caucasus mountains and mountaineers living there by M. Yu. Lermontov and the ones by W. Scott devoted to the Scottish Highlands and their inhabitants. All collected data was systematized in schemes and tables reflecting the differences and intercrossing areas.

Keywords: lexemes, lexical items, lexical-semantic field, linguistic-cultural field, linguoculturemes

Procedia PDF Downloads 201
407 Common Orthodontic Indices and Classification in the United Kingdom

Authors: Ashwini Mohan, Haris Batley

Abstract:

An orthodontic index is used to rate or categorise an individual’s occlusion using a numeric or alphanumeric score. Indexing of malocclusions and their correction is important in epidemiology, diagnosis, communication between clinicians as well as their patients and assessing treatment outcomes. Many useful indices have been put forward, but to the author’s best knowledge, no one method to this day appears to be equally suitable for the use of epidemiologists, public health program planners and clinicians. This article describes the common clinical orthodontic indices and classifications used in United Kingdom.

Keywords: classification, indices, orthodontics, validity

Procedia PDF Downloads 111
406 N400 Investigation of Semantic Priming Effect to Symbolic Pictures in Text

Authors: Thomas Ousterhout

Abstract:

The purpose of this study was to investigate if incorporating meaningful pictures of gestures and facial expressions in short sentences of text could supplement the text with enough semantic information to produce and N400 effect when probe words incongruent to the picture were subsequently presented. Event-related potentials (ERPs) were recorded from a 14-channel commercial grade EEG headset while subjects performed congruent/incongruent reaction time discrimination tasks. Since pictures of meaningful gestures have been shown to be semantically processed in the brain in a similar manner as words are, it is believed that pictures will add supplementary information to text just as the inclusion of their equivalent synonymous word would. The hypothesis is that when subjects read the text/picture mixed sentences, they will process the images and words just like in face-to-face communication and therefore probe words incongruent to the image will produce an N400.

Keywords: EEG, ERP, N400, semantics, congruency, facilitation, Emotiv

Procedia PDF Downloads 229
405 A Supervised Face Parts Labeling Framework

Authors: Khalil Khan, Ikram Syed, Muhammad Ehsan Mazhar, Iran Uddin, Nasir Ahmad

Abstract:

Face parts labeling is the process of assigning class labels to each face part. A face parts labeling method (FPL) which divides a given image into its constitutes parts is proposed in this paper. A database FaceD consisting of 564 images is labeled with hand and make publically available. A supervised learning model is built through extraction of features from the training data. The testing phase is performed with two semantic segmentation methods, i.e., pixel and super-pixel based segmentation. In pixel-based segmentation class label is provided to each pixel individually. In super-pixel based method class label is assigned to super-pixel only – as a result, the same class label is given to all pixels inside a super-pixel. Pixel labeling accuracy reported with pixel and super-pixel based methods is 97.68 % and 93.45% respectively.

Keywords: face labeling, semantic segmentation, classification, face segmentation

Procedia PDF Downloads 226
404 Linguistic Insights Improve Semantic Technology in Medical Research and Patient Self-Management Contexts

Authors: William Michael Short

Abstract:

Semantic Web’ technologies such as the Unified Medical Language System Metathesaurus, SNOMED-CT, and MeSH have been touted as transformational for the way users access online medical and health information, enabling both the automated analysis of natural-language data and the integration of heterogeneous healthrelated resources distributed across the Internet through the use of standardized terminologies that capture concepts and relationships between concepts that are expressed differently across datasets. However, the approaches that have so far characterized ‘semantic bioinformatics’ have not yet fulfilled the promise of the Semantic Web for medical and health information retrieval applications. This paper argues within the perspective of cognitive linguistics and cognitive anthropology that four features of human meaning-making must be taken into account before the potential of semantic technologies can be realized for this domain. First, many semantic technologies operate exclusively at the level of the word. However, texts convey meanings in ways beyond lexical semantics. For example, transitivity patterns (distributions of active or passive voice) and modality patterns (configurations of modal constituents like may, might, could, would, should) convey experiential and epistemic meanings that are not captured by single words. Language users also naturally associate stretches of text with discrete meanings, so that whole sentences can be ascribed senses similar to the senses of words (so-called ‘discourse topics’). Second, natural language processing systems tend to operate according to the principle of ‘one token, one tag’. For instance, occurrences of the word sound must be disambiguated for part of speech: in context, is sound a noun or a verb or an adjective? In syntactic analysis, deterministic annotation methods may be acceptable. But because natural language utterances are typically characterized by polyvalency and ambiguities of all kinds (including intentional ambiguities), such methods leave the meanings of texts highly impoverished. Third, ontologies tend to be disconnected from everyday language use and so struggle in cases where single concepts are captured through complex lexicalizations that involve profile shifts or other embodied representations. More problematically, concept graphs tend to capture ‘expert’ technical models rather than ‘folk’ models of knowledge and so may not match users’ common-sense intuitions about the organization of concepts in prototypical structures rather than Aristotelian categories. Fourth, and finally, most ontologies do not recognize the pervasively figurative character of human language. However, since the time of Galen the widespread use of metaphor in the linguistic usage of both medical professionals and lay persons has been recognized. In particular, metaphor is a well-documented linguistic tool for communicating experiences of pain. Because semantic medical knowledge-bases are designed to help capture variations within technical vocabularies – rather than the kinds of conventionalized figurative semantics that practitioners as well as patients actually utilize in clinical description and diagnosis – they fail to capture this dimension of linguistic usage. The failure of semantic technologies in these respects degrades the efficiency and efficacy not only of medical research, where information retrieval inefficiencies can lead to direct financial costs to organizations, but also of care provision, especially in contexts of patients’ self-management of complex medical conditions.

Keywords: ambiguity, bioinformatics, language, meaning, metaphor, ontology, semantic web, semantics

Procedia PDF Downloads 95
403 3D-Vehicle Associated Research Fields for Smart City via Semantic Search Approach

Authors: Haluk Eren, Mucahit Karaduman

Abstract:

This paper presents 15-year trends for scientific studies in a scientific database considering 3D and vehicle words. Two words are selected to find their associated publications in IEEE scholar database. Both of keywords are entered individually for the years 2002, 2012, and 2016 on the database to identify the preferred subjects of researchers in same years. We have classified closer research fields after searching and listing. Three years (2002, 2012, and 2016) have been investigated to figure out progress in specified time intervals. The first one is assumed as the initial progress in between 2002-2012, and the second one is in 2012-2016 that is fast development duration. We have found very interesting and beneficial results to understand the scholars’ research field preferences for a decade. This information will be highly desirable in smart city-based research purposes consisting of 3D and vehicle-related issues.

Keywords: Vehicle, three-dimensional, smart city, scholarly search, semantic

Procedia PDF Downloads 289
402 Real-Time Episodic Memory Construction for Optimal Action Selection in Cognitive Robotics

Authors: Deon de Jager, Yahya Zweiri, Dimitrios Makris

Abstract:

The three most important components in the cognitive architecture for cognitive robotics is memory representation, memory recall, and action-selection performed by the executive. In this paper, action selection, performed by the executive, is defined as a memory quantification and optimization process. The methodology describes the real-time construction of episodic memory through semantic memory optimization. The optimization is performed by set-based particle swarm optimization, using an adaptive entropy memory quantification approach for fitness evaluation. The performance of the approach is experimentally evaluated by simulation, where a UAV is tasked with the collection and delivery of a medical package. The experiments show that the UAV dynamically uses the episodic memory to autonomously control its velocity, while successfully completing its mission.

Keywords: cognitive robotics, semantic memory, episodic memory, maximum entropy principle, particle swarm optimization

Procedia PDF Downloads 112
401 On the Framework of Contemporary Intelligent Mathematics Underpinning Intelligent Science, Autonomous AI, and Cognitive Computers

Authors: Yingxu Wang, Jianhua Lu, Jun Peng, Jiawei Zhang

Abstract:

The fundamental demand in contemporary intelligent science towards Autonomous AI (AI*) is the creation of unprecedented formal means of Intelligent Mathematics (IM). It is discovered that natural intelligence is inductively created rather than exhaustively trained. Therefore, IM is a family of algebraic and denotational mathematics encompassing Inference Algebra, Real-Time Process Algebra, Concept Algebra, Semantic Algebra, Visual Frame Algebra, etc., developed in our labs. IM plays indispensable roles in training-free AI* theories and systems beyond traditional empirical data-driven technologies. A set of applications of IM-driven AI* systems will be demonstrated in contemporary intelligence science, AI*, and cognitive computers.

Keywords: intelligence mathematics, foundations of intelligent science, autonomous AI, cognitive computers, inference algebra, real-time process algebra, concept algebra, semantic algebra, applications

Procedia PDF Downloads 16
400 Code Embedding for Software Vulnerability Discovery Based on Semantic Information

Authors: Joseph Gear, Yue Xu, Ernest Foo, Praveen Gauravaran, Zahra Jadidi, Leonie Simpson

Abstract:

Deep learning methods have been seeing an increasing application to the long-standing security research goal of automatic vulnerability detection for source code. Attention, however, must still be paid to the task of producing vector representations for source code (code embeddings) as input for these deep learning models. Graphical representations of code, most predominantly Abstract Syntax Trees and Code Property Graphs, have received some use in this task of late; however, for very large graphs representing very large code snip- pets, learning becomes prohibitively computationally expensive. This expense may be reduced by intelligently pruning this input to only vulnerability-relevant information; however, little research in this area has been performed. Additionally, most existing work comprehends code based solely on the structure of the graph at the expense of the information contained by the node in the graph. This paper proposes Semantic-enhanced Code Embedding for Vulnerability Discovery (SCEVD), a deep learning model which uses semantic-based feature selection for its vulnerability classification model. It uses information from the nodes as well as the structure of the code graph in order to select features which are most indicative of the presence or absence of vulnerabilities. This model is implemented and experimentally tested using the SARD Juliet vulnerability test suite to determine its efficacy. It is able to improve on existing code graph feature selection methods, as demonstrated by its improved ability to discover vulnerabilities.

Keywords: code representation, deep learning, source code semantics, vulnerability discovery

Procedia PDF Downloads 125