Search results for: semantic features
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4118

Search results for: semantic features

3968 The Oral Production of University EFL Students: An Analysis of Tasks, Format, and Quality in Foreign Language Development

Authors: Vera Lucia Teixeira da Silva, Sandra Regina Buttros Gattolin de Paula

Abstract:

The present study focuses on academic literacy and addresses the impact of semantic-discursive resources on the constitution of genres that are produced in such context. The research considers the development of writing in the academic context in Portuguese. Researches that address academic literacy and the characteristics of the texts produced in this context are rare, mainly with focus on the development of writing, considering three variables: the constitution of the writer, the perception of the reader/interlocutor and the organization of the informational text flow. The research aims to map the semantic-discursive resources of the written register in texts of several genres and produced by students in the first semester of the undergraduate course in Letters. The hypothesis raised is that writing in the academic environment is not a recurrent literacy practice for these learners and can be explained by the ontogenetic and phylogenetic nature of language development. Qualitative in nature, the present research has as empirical data texts produced in a half-yearly course of Reading and Textual Production; these data result from the proposition of four different writing proposals, in a total of 600 texts. The corpus is analyzed based on semantic-discursive resources, seeking to contemplate relevant aspects of language (grammar, discourse and social context) that reveal the choices made in the reader/writer interrelationship and the organizational flow of the Text. Among the semantic-discursive resources, the analysis includes three resources, including (a) appraisal and negotiation to understand the attitudes negotiated (roles of the participants of the discourse and their relationship with the other); (b) ideation to explain the construction of the experience (activities performed and participants); and (c) periodicity to outline the flow of information in the organization of the text according to the genre it instantiates. The results indicate the organizational difficulties of the flow of the text information. Cartography contributes to the understanding of the way writers use language in an effort to present themselves, evaluate someone else’s work, and communicate with readers.

Keywords: academic writing, Portuguese mother tongue, semantic-discursive resources, academic context

Procedia PDF Downloads 91
3967 Probing Language Models for Multiple Linguistic Information

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, large-scale pre-trained language models have achieved state-of-the-art performance on a variety of natural language processing tasks. The word vectors produced by these language models can be viewed as dense encoded presentations of natural language that in text form. However, it is unknown how much linguistic information is encoded and how. In this paper, we construct several corresponding probing tasks for multiple linguistic information to clarify the encoding capabilities of different language models and performed a visual display. We firstly obtain word presentations in vector form from different language models, including BERT, ELMo, RoBERTa and GPT. Classifiers with a small scale of parameters and unsupervised tasks are then applied on these word vectors to discriminate their capability to encode corresponding linguistic information. The constructed probe tasks contain both semantic and syntactic aspects. The semantic aspect includes the ability of the model to understand semantic entities such as numbers, time, and characters, and the grammatical aspect includes the ability of the language model to understand grammatical structures such as dependency relationships and reference relationships. We also compare encoding capabilities of different layers in the same language model to infer how linguistic information is encoded in the model.

Keywords: language models, probing task, text presentation, linguistic information

Procedia PDF Downloads 69
3966 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 398
3965 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 474
3964 Understanding the Semantic Network of Tourism Studies in Taiwan by Using Bibliometrics Analysis

Authors: Chun-Min Lin, Yuh-Jen Wu, Ching-Ting Chung

Abstract:

The formulation of tourism policies requires objective academic research and evidence as support, especially research from local academia. Taiwan is a small island, and its economic growth relies heavily on tourism revenue. Taiwanese government has been devoting to the promotion of the tourism industry over the past few decades. Scientific research outcomes by Taiwanese scholars may and will help lay the foundations for drafting future tourism policy by the government. In this study, a total of 120 full journal articles published between 2008 and 2016 from the Journal of Tourism and Leisure Studies (JTSL) were examined to explore the scientific research trend of tourism study in Taiwan. JTSL is one of the most important Taiwanese journals in the tourism discipline which focuses on tourism-related issues and uses traditional Chinese as the study language. The method of co-word analysis from bibliometrics approaches was employed for semantic analysis in this study. When analyzing Chinese words and phrases, word segmentation analysis is a crucial step. It must be carried out initially and precisely in order to obtain meaningful word or word chunks for further frequency calculation. A word segmentation system basing on N-gram algorithm was developed in this study to conduct semantic analysis, and 100 groups of meaningful phrases with the highest recurrent rates were located. Subsequently, co-word analysis was employed for semantic classification. The results showed that the themes of tourism research in Taiwan in recent years cover the scope of tourism education, environmental protection, hotel management, information technology, and senior tourism. The results can give insight on the related issues and serve as a reference for tourism-related policy making and follow-up research.

Keywords: bibliometrics, co-word analysis, word segmentation, tourism research, policy

Procedia PDF Downloads 204
3963 An Approach to Specify Software Requirements in Semantic Form

Authors: Deepa Vijay, Chellammal Surianarayanan, Gopinath Ganapathy

Abstract:

Requirements of a software project serve as a guideline for the entire project team which enable the team towards producing the right outcome. As requirements are the key in deciding the success of the project, it should be specified in an unambiguous manner. Also, the requirements should be complete and consistent. It should be interpreted in the same way by the entire software project team as the customer interprets. Specifying requirements in textual manner is common in software development. This leads to poor understanding of the requirements which results in more errors and degraded quality. There are some literatures which focus on semantic way of specifying functional requirement which ensure the consistency and completeness of requirements. Alternately in the work, a method is proposed to map the syntactic requirements with corresponding semantics in the form of ontologies. This improves the understanding of requirements, prevents errors and improves quality.

Keywords: functional requirement, ontology, requirements management, semantics

Procedia PDF Downloads 338
3962 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 46
3961 Automated Adaptions of Semantic User- and Service Profile Representations by Learning the User Context

Authors: Nicole Merkle, Stefan Zander

Abstract:

Ambient Assisted Living (AAL) describes a technological and methodological stack of (e.g. formal model-theoretic semantics, rule-based reasoning and machine learning), different aspects regarding the behavior, activities and characteristics of humans. Hence, a semantic representation of the user environment and its relevant elements are required in order to allow assistive agents to recognize situations and deduce appropriate actions. Furthermore, the user and his/her characteristics (e.g. physical, cognitive, preferences) need to be represented with a high degree of expressiveness in order to allow software agents a precise evaluation of the users’ context models. The correct interpretation of these context models highly depends on temporal, spatial circumstances as well as individual user preferences. In most AAL approaches, model representations of real world situations represent the current state of a universe of discourse at a given point in time by neglecting transitions between a set of states. However, the AAL domain currently lacks sufficient approaches that contemplate on the dynamic adaptions of context-related representations. Semantic representations of relevant real-world excerpts (e.g. user activities) help cognitive, rule-based agents to reason and make decisions in order to help users in appropriate tasks and situations. Furthermore, rules and reasoning on semantic models are not sufficient for handling uncertainty and fuzzy situations. A certain situation can require different (re-)actions in order to achieve the best results with respect to the user and his/her needs. But what is the best result? To answer this question, we need to consider that every smart agent requires to achieve an objective, but this objective is mostly defined by domain experts who can also fail in their estimation of what is desired by the user and what not. Hence, a smart agent has to be able to learn from context history data and estimate or predict what is most likely in certain contexts. Furthermore, different agents with contrary objectives can cause collisions as their actions influence the user’s context and constituting conditions in unintended or uncontrolled ways. We present an approach for dynamically updating a semantic model with respect to the current user context that allows flexibility of the software agents and enhances their conformance in order to improve the user experience. The presented approach adapts rules by learning sensor evidence and user actions using probabilistic reasoning approaches, based on given expert knowledge. The semantic domain model consists basically of device-, service- and user profile representations. In this paper, we present how this semantic domain model can be used in order to compute the probability of matching rules and actions. We apply this probability estimation to compare the current domain model representation with the computed one in order to adapt the formal semantic representation. Our approach aims at minimizing the likelihood of unintended interferences in order to eliminate conflicts and unpredictable side-effects by updating pre-defined expert knowledge according to the most probable context representation. This enables agents to adapt to dynamic changes in the environment which enhances the provision of adequate assistance and affects positively the user satisfaction.

Keywords: ambient intelligence, machine learning, semantic web, software agents

Procedia PDF Downloads 256
3960 Emerging Technology for Business Intelligence Applications

Authors: Hsien-Tsen Wang

Abstract:

Business Intelligence (BI) has long helped organizations make informed decisions based on data-driven insights and gain competitive advantages in the marketplace. In the past two decades, businesses witnessed not only the dramatically increasing volume and heterogeneity of business data but also the emergence of new technologies, such as Artificial Intelligence (AI), Semantic Web (SW), Cloud Computing, and Big Data. It is plausible that the convergence of these technologies would bring more value out of business data by establishing linked data frameworks and connecting in ways that enable advanced analytics and improved data utilization. In this paper, we first review and summarize current BI applications and methodology. Emerging technologies that can be integrated into BI applications are then discussed. Finally, we conclude with a proposed synergy framework that aims at achieving a more flexible, scalable, and intelligent BI solution.

Keywords: business intelligence, artificial intelligence, semantic web, big data, cloud computing

Procedia PDF Downloads 67
3959 Training a Neural Network Using Input Dropout with Aggressive Reweighting (IDAR) on Datasets with Many Useless Features

Authors: Stylianos Kampakis

Abstract:

This paper presents a new algorithm for neural networks called “Input Dropout with Aggressive Re-weighting” (IDAR) aimed specifically at datasets with many useless features. IDAR combines two techniques (dropout of input neurons and aggressive re weighting) in order to eliminate the influence of noisy features. The technique can be seen as a generalization of dropout. The algorithm is tested on two different benchmark data sets: a noisy version of the iris dataset and the MADELON data set. Its performance is compared against three other popular techniques for dealing with useless features: L2 regularization, LASSO and random forests. The results demonstrate that IDAR can be an effective technique for handling data sets with many useless features.

Keywords: neural networks, feature selection, regularization, aggressive reweighting

Procedia PDF Downloads 425
3958 An Automatic Feature Extraction Technique for 2D Punch Shapes

Authors: Awais Ahmad Khan, Emad Abouel Nasr, H. M. A. Hussein, Abdulrahman Al-Ahmari

Abstract:

Sheet-metal parts have been widely applied in electronics, communication and mechanical industries in recent decades; but the advancement in sheet-metal part design and manufacturing is still behind in comparison with the increasing importance of sheet-metal parts in modern industry. This paper presents a methodology for automatic extraction of some common 2D internal sheet metal features. The features used in this study are taken from Unipunch ™ catalogue. The extraction process starts with the data extraction from STEP file using an object oriented approach and with the application of suitable algorithms and rules, all features contained in the catalogue are automatically extracted. Since the extracted features include geometry and engineering information, they will be effective for downstream application such as feature rebuilding and process planning.

Keywords: feature extraction, internal features, punch shapes, sheet metal

Procedia PDF Downloads 587
3957 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis

Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

Abstract:

The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Keywords: auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter

Procedia PDF Downloads 396
3956 Enterprise Information Portal Features: Results of Content Analysis Literature Review

Authors: Michal Krčál

Abstract:

Since their introduction in 1990’s, Enterprise Information Portals (EIPs) were investigated from different perspectives (e.g. project management, technology acceptance, IS success). However, no systematic literature review was produced to systematize both the research efforts and the technology itself. This paper reports first results of an extent systematic literature review study focused on research of EIPs and its categorization, specifically it reports a conceptual model of EIP features. The previous attempt to categorize EIP features was published in 2002. For the purpose of the literature review, content of 89 articles was analyzed in order to identify and categorize features of EIPs. The methodology of the literature review was as follows. Firstly, search queries in major indexing databases (Web of Science and SCOPUS) were used. The results of queries were analyzed according to their usability for the goal of the study. Then, full-texts were coded in Atlas.ti according to previously established coding scheme. The codes were categorized and the conceptual model of EIP features was created.

Keywords: enterprise information portal, content analysis, features, systematic literature review

Procedia PDF Downloads 269
3955 Content-Based Image Retrieval Using HSV Color Space Features

Authors: Hamed Qazanfari, Hamid Hassanpour, Kazem Qazanfari

Abstract:

In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.

Keywords: content-based image retrieval, color difference histogram, efficient features selection, entropy, correlation

Procedia PDF Downloads 221
3954 Semantic-Based Collaborative Filtering to Improve Visitor Cold Start in Recommender Systems

Authors: Baba Mbaye

Abstract:

In collaborative filtering recommendation systems, a user receives suggested items based on the opinions and evaluations of a community of users. This type of recommendation system uses only the information (notes in numerical values) contained in a usage matrix as input data. This matrix can be constructed based on users' behaviors or by offering users to declare their opinions on the items they know. The cold start problem leads to very poor performance for new users. It is a phenomenon that occurs at the beginning of use, in the situation where the system lacks data to make recommendations. There are three types of cold start problems: cold start for a new item, a new system, and a new user. We are interested in this article at the cold start for a new user. When the system welcomes a new user, the profile exists but does not have enough data, and its communities with other users profiles are still unknown. This leads to recommendations not adapted to the profile of the new user. In this paper, we propose an approach that improves cold start by using the notions of similarity and semantic proximity between users profiles during cold start. We will use the cold-metadata available (metadata extracted from the new user's data) useful in positioning the new user within a community. The aim is to look for similarities and semantic proximities with the old and current user profiles of the system. Proximity is represented by close concepts considered to belong to the same group, while similarity groups together elements that appear similar. Similarity and proximity are two close but not similar concepts. This similarity leads us to the construction of similarity which is based on: a) the concepts (properties, terms, instances) independent of ontology structure and, b) the simultaneous representation of the two concepts (relations, presence of terms in a document, simultaneous presence of the authorities). We propose an ontology, OIVCSRS (Ontology of Improvement Visitor Cold Start in Recommender Systems), in order to structure the terms and concepts representing the meaning of an information field, whether by the metadata of a namespace, or the elements of a knowledge domain. This approach allows us to automatically attach the new user to a user community, partially compensate for the data that was not initially provided and ultimately to associate a better first profile with the cold start. Thus, the aim of this paper is to propose an approach to improving cold start using semantic technologies.

Keywords: visitor cold start, recommender systems, collaborative filtering, semantic filtering

Procedia PDF Downloads 195
3953 AS-Geo: Arbitrary-Sized Image Geolocalization with Learnable Geometric Enhancement Resizer

Authors: Huayuan Lu, Chunfang Yang, Ma Zhu, Baojun Qi, Yaqiong Qiao, Jiangqian Xu

Abstract:

Image geolocalization has great application prospects in fields such as autonomous driving and virtual/augmented reality. In practical application scenarios, the size of the image to be located is not fixed; it is impractical to train different networks for all possible sizes. When its size does not match the size of the input of the descriptor extraction model, existing image geolocalization methods usually directly scale or crop the image in some common ways. This will result in the loss of some information important to the geolocalization task, thus affecting the performance of the image geolocalization method. For example, excessive down-sampling can lead to blurred building contour, and inappropriate cropping can lead to the loss of key semantic elements, resulting in incorrect geolocation results. To address this problem, this paper designs a learnable image resizer and proposes an arbitrary-sized image geolocation method. (1) The designed learnable image resizer employs the self-attention mechanism to enhance the geometric features of the resized image. Firstly, it applies bilinear interpolation to the input image and its feature maps to obtain the initial resized image and the resized feature maps. Then, SKNet (selective kernel net) is used to approximate the best receptive field, thus keeping the geometric shapes as the original image. And SENet (squeeze and extraction net) is used to automatically select the feature maps with strong contour information, enhancing the geometric features. Finally, the enhanced geometric features are fused with the initial resized image, to obtain the final resized images. (2) The proposed image geolocalization method embeds the above image resizer as a fronting layer of the descriptor extraction network. It not only enables the network to be compatible with arbitrary-sized input images but also enhances the geometric features that are crucial to the image geolocalization task. Moreover, the triplet attention mechanism is added after the first convolutional layer of the backbone network to optimize the utilization of geometric elements extracted by the first convolutional layer. Finally, the local features extracted by the backbone network are aggregated to form image descriptors for image geolocalization. The proposed method was evaluated on several mainstream datasets, such as Pittsburgh30K, Tokyo24/7, and Places365. The results show that the proposed method has excellent size compatibility and compares favorably to recently mainstream geolocalization methods.

Keywords: image geolocalization, self-attention mechanism, image resizer, geometric feature

Procedia PDF Downloads 178
3952 Investigating the Stylistic Features of Advertising: Ad Design and Creation

Authors: Asma Ben Abdallah

Abstract:

Language has a powerful influence over people and their actions. The language of advertising has a very great impact on the consumer. It makes use of different features from the linguistic continuum. The present paper attempts to apply the theories of stylistics to the analysis of advertising texts. In order to decipher the stylistic features of the advertising discourse, 30 advertising text samples designed by MA Business students have been selected. These samples have been analyzed at the level of design and content. The study brings insights into the use of stylistic devices in advertising, and it reveals that both linguistic and non-linguistic features of advertisements are frequently employed to develop a well-thought-out design and content. The practical significance of the study is to highlight the specificities of the advertising genre so that people interested in the language of advertising (Business students and ESP teachers) will have a better understanding of the nature of the language used and the techniques of writing and designing ads. Similarly, those working in the advertising sphere (ad designers) will appreciate the specificities of the advertising discourse.

Keywords: the language of advertising, advertising discourse, ad design, stylistic features

Procedia PDF Downloads 206
3951 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 421
3950 Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design

Authors: Shen Jian, Hu Jie, Ma Jin, Peng Ying Hong, Fang Yi, Liu Wen Hai

Abstract:

Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.

Keywords: knowledge clustering, knowledge acquisition, knowledge based engineering, knowledge cell, biologically inspired design

Procedia PDF Downloads 403
3949 TARF: Web Toolkit for Annotating RNA-Related Genomic Features

Authors: Jialin Ma, Jia Meng

Abstract:

Genomic features, the genome-based coordinates, are commonly used for the representation of biological features such as genes, RNA transcripts and transcription factor binding sites. For the analysis of RNA-related genomic features, such as RNA modification sites, a common task is to correlate these features with transcript components (5'UTR, CDS, 3'UTR) to explore their distribution characteristics in terms of transcriptomic coordinates, e.g., to examine whether a specific type of biological feature is enriched near transcription start sites. Existing approaches for performing these tasks involve the manipulation of a gene database, conversion from genome-based coordinate to transcript-based coordinate, and visualization methods that are capable of showing RNA transcript components and distribution of the features. These steps are complicated and time consuming, and this is especially true for researchers who are not familiar with relevant tools. To overcome this obstacle, we develop a dedicated web app TARF, which represents web toolkit for annotating RNA-related genomic features. TARF web tool intends to provide a web-based way to easily annotate and visualize RNA-related genomic features. Once a user has uploaded the features with BED format and specified a built-in transcript database or uploaded a customized gene database with GTF format, the tool could fulfill its three main functions. First, it adds annotation on gene and RNA transcript components. For every features provided by the user, the overlapping with RNA transcript components are identified, and the information is combined in one table which is available for copy and download. Summary statistics about ambiguous belongings are also carried out. Second, the tool provides a convenient visualization method of the features on single gene/transcript level. For the selected gene, the tool shows the features with gene model on genome-based view, and also maps the features to transcript-based coordinate and show the distribution against one single spliced RNA transcript. Third, a global transcriptomic view of the genomic features is generated utilizing the Guitar R/Bioconductor package. The distribution of features on RNA transcripts are normalized with respect to RNA transcript landmarks and the enrichment of the features on different RNA transcript components is demonstrated. We tested the newly developed TARF toolkit with 3 different types of genomics features related to chromatin H3K4me3, RNA N6-methyladenosine (m6A) and RNA 5-methylcytosine (m5C), which are obtained from ChIP-Seq, MeRIP-Seq and RNA BS-Seq data, respectively. TARF successfully revealed their respective distribution characteristics, i.e. H3K4me3, m6A and m5C are enriched near transcription starting sites, stop codons and 5’UTRs, respectively. Overall, TARF is a useful web toolkit for annotation and visualization of RNA-related genomic features, and should help simplify the analysis of various RNA-related genomic features, especially those related RNA modifications.

Keywords: RNA-related genomic features, annotation, visualization, web server

Procedia PDF Downloads 181
3948 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 125
3947 Morphological Features Fusion for Identifying INBREAST-Database Masses Using Neural Networks and Support Vector Machines

Authors: Nadia el Atlas, Mohammed el Aroussi, Mohammed Wahbi

Abstract:

In this paper a novel technique of mass characterization based on robust features-fusion is presented. The proposed method consists of mainly four stages: (a) the first phase involves segmenting the masses using edge information’s. (b) The second phase is to calculate and fuse the most relevant morphological features. (c) The last phase is the classification step which allows us to classify the images into benign and malignant masses. In this step we have implemented Support Vectors Machines (SVM) and Artificial Neural Networks (ANN), which were evaluated with the following performance criteria: confusion matrix, accuracy, sensitivity, specificity, receiver operating characteristic ROC, and error histogram. The effectiveness of this new approach was evaluated by a recently developed database: INBREAST database. The fusion of the most appropriate morphological features provided very good results. The SVM gives accuracy to within 64.3%. Whereas the ANN classifier gives better results with an accuracy of 97.5%.

Keywords: breast cancer, mammography, CAD system, features, fusion

Procedia PDF Downloads 567
3946 Using Priority Order of Basic Features for Circumscribed Masses Detection in Mammograms

Authors: Minh Dong Le, Viet Dung Nguyen, Do Huu Viet, Nguyen Huu Tu

Abstract:

In this paper, we present a new method for circumscribed masses detection in mammograms. Our method is evaluated on 23 mammographic images of circumscribed masses and 20 normal mammograms from public Mini-MIAS database. The method is quite sanguine with sensitivity (SE) of 95% with only about 1 false positive per image (FPpI). To achieve above results we carry out a progression following: Firstly, the input images are preprocessed with the aim to enhance key information of circumscribed masses; Next, we calculate and evaluate statistically basic features of abnormal regions on training database; Then, mammograms on testing database are divided into equal blocks which calculated corresponding features. Finally, using priority order of basic features to classify blocks as an abnormal or normal regions.

Keywords: mammograms, circumscribed masses, evaluated statistically, priority order of basic features

Procedia PDF Downloads 304
3945 A Study of Mandarin Ba Constructions from the Perspective of Event Structure

Authors: Changyin Zhou

Abstract:

Ba constructions are a special type of constructions in Chinese. Their syntactic behaviors are closely related to their event structural properties. The existing study which treats the semantic function of Ba as causative meets difficulty in treating the discrepancy between Ba constructions and their corresponding constructions without Ba in expressing causativity. This paper holds that Ba in Ba constructions is a functional category expressing affectedness. The affectedness expressed by Ba can be positive or negative. The functional category Ba expressing negative affectedness has the semantic property of being 'expected'. The precondition of Ba construction is the boundedness of the event concerned. This paper, holding the parallelism between motion events and change-of-state events, proposes a syntactic model based on the notions of boundedness and affectedness, discusses the transformations between Ba constructions and the related resultative constructions, and derivates the various Ba constructions concerned.

Keywords: affectedness, Ba constructions, boundedness, event structure, resultative constructions

Procedia PDF Downloads 398
3944 Text Localization in Fixed-Layout Documents Using Convolutional Networks in a Coarse-to-Fine Manner

Authors: Beier Zhu, Rui Zhang, Qi Song

Abstract:

Text contained within fixed-layout documents can be of great semantic value and so requires a high localization accuracy, such as ID cards, invoices, cheques, and passports. Recently, algorithms based on deep convolutional networks achieve high performance on text detection tasks. However, for text localization in fixed-layout documents, such algorithms detect word bounding boxes individually, which ignores the layout information. This paper presents a novel architecture built on convolutional neural networks (CNNs). A global text localization network and a regional bounding-box regression network are introduced to tackle the problem in a coarse-to-fine manner. The text localization network simultaneously locates word bounding points, which takes the layout information into account. The bounding-box regression network inputs the features pooled from arbitrarily sized RoIs and refine the localizations. These two networks share their convolutional features and are trained jointly. A typical type of fixed-layout documents: ID cards, is selected to evaluate the effectiveness of the proposed system. These networks are trained on data cropped from nature scene images, and synthetic data produced by a synthetic text generation engine. Experiments show that our approach locates high accuracy word bounding boxes and achieves state-of-the-art performance.

Keywords: bounding box regression, convolutional networks, fixed-layout documents, text localization

Procedia PDF Downloads 166
3943 Classification of Contexts for Mentioning Love in Interviews with Victims of the Holocaust

Authors: Marina Yurievna Aleksandrova

Abstract:

Research of the Holocaust retains value not only for history but also for sociology and psychology. One of the most important fields of study is how people were coping during and after this traumatic event. The aim of this paper is to identify the main contexts of the topic of love and to determine which contexts are more characteristic for different groups of victims of the Holocaust (gender, nationality, age). In this research, transcripts of interviews with Holocaust victims that were collected during 1946 for the "Voices of the Holocaust" project were used as data. Main contexts were analyzed with methods of network analysis and latent semantic analysis and classified by gender, age, and nationality with random forest. The results show that love is articulated and described significantly differently for male and female informants, nationality is shown results with lower values of quality metrics, as well as the age.

Keywords: Holocaust, latent semantic analysis, network analysis, text-mining, random forest

Procedia PDF Downloads 157
3942 1D Convolutional Networks to Compute Mel-Spectrogram, Chromagram, and Cochleogram for Audio Networks

Authors: Elias Nemer, Greg Vines

Abstract:

Time-frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Cochleogram have been proven more effective and convenient than training on-time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different features. In this paper, we provide a PyTorch framework for creating various spectral features as well as time-frequency transformation and time-domain filter-banks using the built-in trainable conv1d() layer. This allows computing these features on the fly as part of a larger network and enabling easier experimentation with various combinations and parameters. Our work extends the work in the literature developed for that end: First, by adding more of these features and also by allowing the possibility of either starting from initialized kernels or training them from random values. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes or simply use as is and add more layers for various applications.

Keywords: neural networks Mel-Spectrogram, chromagram, cochleogram, discrete Fourrier transform, PyTorch conv1d()

Procedia PDF Downloads 197
3941 Morphological Properties in Ndre Mjeda's Works

Authors: Shyhrete Morina

Abstract:

This paper deals with morphological features in Mjeda's works. To make such a distinction, these features will be compared to standard Albanian language, considering the linguistic structure in the morphological field, which represent an all-important segment of Albanian language. Therefore, the study will focus mainly on the description and construction of these paradigms, which will give a linguistic insight into the entire work of Mjeda as the author who wrote in the dialect of northwestern Geg. Therefore, we have tried to distinguish different parts of the author's language, as well as the distinctive features or even the similarities of these paradigms that arise in the literary work of Mjeda. By constructing the corpus of this phonetic and grammar segment from the whole of Mjeda's work, we have seen that in these fields has built a variety of grammar structures, which for the history of Albanian are of special importance, that in the full variant of the work, as far as we can investigate, we will point out in all the distinctive features. Therefore, our study aims to highlight the linguistic features, namely the author's deep knowledge toward the language, the authenticity of its use, and its mutual relationship with it.

Keywords: distinctive morpholgy, nouns, adjetives, pronouns, Albanian standard language

Procedia PDF Downloads 133
3940 Semantic Network Analysis of the Saudi Women Driving Decree

Authors: Dania Aljouhi

Abstract:

September 26th, 2017, is a historic date for all women in Saudi Arabia. On that day, Saudi Arabia announced the decree on allowing Saudi women to drive. With the advent of vision 2030 and its goal to empower women and increase their participation in Saudi society, we see how Saudis’ Twitter users deliberate the 2017 decree from different social, cultural, religious, economic and political factors. This topic bridges social media 'Twitter,' gender and social-cultural studies to offer insights into how Saudis’ tweets reflect a broader discourse on Saudi women in the age of social media. The present study aims to explore the meanings and themes that emerge by Saudis’ Twitter users in response to the 2017 royal decree on women driving. The sample used in the current study involves (n= 1000) tweets that were collected from Sep 2017 to March 2019 to account for the Saudis’ tweets before and after implementing the decree. The paper uses semantic and thematic network analysis methods to examine the Saudis’ Twitter discourse on the women driving issue. The paper argues that Twitter as a platform has mediated the discourse of women driving among the Saudi community and facilitated social changes. Finally, framing theory (Goffman, 1974) and Networked framing (Meraz & Papacharissi 2013) are both used to explain the tweets on the decree of allowing Saudi women to drive based on # Saudi women-driving-cars.

Keywords: Saudi Arabia, women, Twitter, semantic network analysis, framing

Procedia PDF Downloads 120
3939 Deep Vision: A Robust Dominant Colour Extraction Framework for T-Shirts Based on Semantic Segmentation

Authors: Kishore Kumar R., Kaustav Sengupta, Shalini Sood Sehgal, Poornima Santhanam

Abstract:

Fashion is a human expression that is constantly changing. One of the prime factors that consistently influences fashion is the change in colour preferences. The role of colour in our everyday lives is very significant. It subconsciously explains a lot about one’s mindset and mood. Analyzing the colours by extracting them from the outfit images is a critical study to examine the individual’s/consumer behaviour. Several research works have been carried out on extracting colours from images, but to the best of our knowledge, there were no studies that extract colours to specific apparel and identify colour patterns geographically. This paper proposes a framework for accurately extracting colours from T-shirt images and predicting dominant colours geographically. The proposed method consists of two stages: first, a U-Net deep learning model is adopted to segment the T-shirts from the images. Second, the colours are extracted only from the T-shirt segments. The proposed method employs the iMaterialist (Fashion) 2019 dataset for the semantic segmentation task. The proposed framework also includes a mechanism for gathering data and analyzing India’s general colour preferences. From this research, it was observed that black and grey are the dominant colour in different regions of India. The proposed method can be adapted to study fashion’s evolving colour preferences.

Keywords: colour analysis in t-shirts, convolutional neural network, encoder-decoder, k-means clustering, semantic segmentation, U-Net model

Procedia PDF Downloads 79