Search results for: Grey-box text attack
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1922

Search results for: Grey-box text attack

1892 Anatomical Survey for Text Pattern Detection

Authors: S. Tehsin, S. Kausar

Abstract:

The ultimate aim of machine intelligence is to explore and materialize the human capabilities, one of which is the ability to detect various text objects within one or more images displayed on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However, it needs to technologically testify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed. Subsequent examination bears testimony to the fact that the features extracted proved instrumental to text detection.

Keywords: biologically inspired vision, content based retrieval, document analysis, text extraction

Procedia PDF Downloads 444
1891 Modeling Intelligent Threats: Case of Continuous Attacks on a Specific Target

Authors: Asma Ben Yaghlane, Mohamed Naceur Azaiez

Abstract:

In this paper, we treat a model that falls in the area of protecting targeted systems from intelligent threats including terrorism. We introduce the concept of system survivability, in the context of continuous attacks, as the probability that a system under attack will continue operation up to some fixed time t. We define a constant attack rate (CAR) process as an attack on a targeted system that follows an exponential distribution. We consider the superposition of several CAR processes. From the attacker side, we determine the optimal attack strategy that minimizes the system survivability. We also determine the optimal strengthening strategy that maximizes the system survivability under limited defensive resources. We use operations research techniques to identify optimal strategies of each antagonist. Our results may be used as interesting starting points to develop realistic protection strategies against intentional attacks.

Keywords: CAR processes, defense/attack strategies, exponential failure, survivability

Procedia PDF Downloads 395
1890 Effect of Waste Foundry Slag and Alccofine on Durability Properties of High Strength Concrete

Authors: Devinder Sharma, Sanjay Sharma, Ajay Goyal, Ashish Kapoor

Abstract:

The present research paper discussed the durability properties of high strength concrete (HSC) using Foundry Slag(FD) as partial substitute for fine aggregates (FA) and Alccofine (AF) in addition to portland pozzolana (PPC) cement. Specimens of Concrete M100 grade with water/binder ratio 0.239, with Foundry Slag (FD) varying from 0 to 50% and with optimum quantity of AF(15%) were casted and tested for durability properties such as Water absorption, water permeability, resistance to sulphate attack, alkali attack and nitrate attack of HSC at the age of 7, 14, 28, 56 and 90 days. Substitution of fine aggregates (FA) with up to 45% of foundry slag(FD) content and cement with 15% substitution and addition of alccofine showed an excellent resistance against durability properties at all ages but showed a decrease in these properties with 50% of FD contents. Loss of weight in concrete samples due to sulphate attack, alkali attack and nitrate attack of HSC at the age of 365 days was compared with loss in compressive strength. Correlation between loss in weight and loss in compressive strength in all the tests was found to be excellent.

Keywords: alccofine, alkali attack, foundry slag, high strength concrete, nitrate attack, water absorption, water permeability

Procedia PDF Downloads 331
1889 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 469
1888 Experimental Studies on the Corrosion Effects of the Concrete Made with Tannery Effluent

Authors: K. Nirmalkumar

Abstract:

An acute water scarcity is prevailing in the dry season in and around Perundurai (Erode district, Tamil Nadu, India) where there are more number of tannery units. Hence an attempt was made to use the effluent from the tannery industry for construction purpose. The mechanical properties such as compressive strength, tensile strength, flexural strength and the special properties such as chloride attack, sulphate attack and chemical attack were studied by casting various concrete specimens in form of cube, cylinders and beams, etc. It was observed that the concrete had some reduction in strength while subjected to chloride attack, sulphate attack and chemical attack. So admixtures were selected and optimized in suitable proportion to counter act the adverse effects and the results were found to be satisfactory. In this research study the corrosion results of specimens prepared by using treated and untreated tannery effluent were compared with the concrete specimens prepared by using potable water. It was observed that by the addition of admixtures, the adverse effects due to the usage of the treated and untreated tannery effluent are counteracted.

Keywords: corrosion, calcium nitrite, concrete, fly ash

Procedia PDF Downloads 269
1887 An Aspiring Solution to the Man in the Middle Bootstrap Vulnerability

Authors: Mouad Zouina, Benaceur Outtaj

Abstract:

The proposed work falls within the context of improving data security for m-commerce systems. In this context we have placed under the light some flaws encountered in HTTPS the most used m-commerce protocol, particularly the man in the middle attack, shortly MITM. The man in the middle attack is an active listening attack. The idea of this attack is to target the handshake phase of the HTTPS protocol which is the transition from a non-secure connection to a secure connection in our case HTTP to HTTPS. This paper proposes a solution to fix those flaws based on the upgrade of HSTS standard handshake sequence using the DNSSEC standard.

Keywords: m-commerce, HTTPS, HSTS, DNSSEC, MITM bootstrap vulnerability

Procedia PDF Downloads 391
1886 A Pattern Recognition Neural Network Model for Detection and Classification of SQL Injection Attacks

Authors: Naghmeh Moradpoor Sheykhkanloo

Abstract:

Structured Query Language Injection (SQLI) attack is a code injection technique in which malicious SQL statements are inserted into a given SQL database by simply using a web browser. Losing data, disclosing confidential information or even changing the value of data are the severe damages that SQLI attack can cause on a given database. SQLI attack has also been rated as the number-one attack among top ten web application threats on Open Web Application Security Project (OWASP). OWASP is an open community dedicated to enabling organisations to consider, develop, obtain, function, and preserve applications that can be trusted. In this paper, we propose an effective pattern recognition neural network model for detection and classification of SQLI attacks. The proposed model is built from three main elements of: a Uniform Resource Locator (URL) generator in order to generate thousands of malicious and benign URLs, a URL classifier in order to: 1) classify each generated URL to either a benign URL or a malicious URL and 2) classify the malicious URLs into different SQLI attack categories, and an NN model in order to: 1) detect either a given URL is a malicious URL or a benign URL and 2) identify the type of SQLI attack for each malicious URL. The model is first trained and then evaluated by employing thousands of benign and malicious URLs. The results of the experiments are presented in order to demonstrate the effectiveness of the proposed approach.

Keywords: neural networks, pattern recognition, SQL injection attacks, SQL injection attack classification, SQL injection attack detection

Procedia PDF Downloads 469
1885 Graph-Based Semantical Extractive Text Analysis

Authors: Mina Samizadeh

Abstract:

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.

Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis

Procedia PDF Downloads 70
1884 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 426
1883 Service Life Modelling of Concrete Deterioration Due to Biogenic Sulphuric Acid (BSA) Attack-State-of-an-Art-Review

Authors: Ankur Bansal, Shashank Bishnoi

Abstract:

Degradation of Sewage pipes, sewage pumping station and Sewage treatment plants(STP) is of major concern due to difficulty in their maintenance and the high cost of replacement. Most of these systems undergo degradation due to Biogenic sulphuric acid (BSA) attack. Since most of Waste water treatment system are underground, detection of this deterioration remains hidden. This paper presents a literature review, outlining the mechanism of this attack focusing on critical parameters of BSA attack, along with available models and software to predict the deterioration due to this attack. This paper critically examines the various steps and equation in various Models of BSA degradation, detail on assumptions and working of different softwares are also highlighted in this paper. The paper also focuses on the service life design technique available through various codes and method to integrate the servile life design with BSA degradation on concrete. In the end, various methods enhancing the resistance of concrete against Biogenic sulphuric acid attack are highlighted. It may be concluded that the effective modelling for degradation phenomena may bring positive economical and environmental impacts. With current computing capabilities integrated degradation models combining the various durability aspects can bring positive change for sustainable society.

Keywords: concrete degradation, modelling, service life, sulphuric acid attack

Procedia PDF Downloads 314
1882 Perceiving Text-Worlds as a Cognitive Mechanism to Understand Surah Al-Kahf

Authors: Awatef Boubakri, Khaled Jebahi

Abstract:

Using Text World Theory (TWT), we attempted to understand how mental representations (text worlds) and perceptions can be construed by readers of Quranic texts. To this end, Surah Al-Kahf was purposefully selected given the fact that while each of its stories is narrated, different levels of discourse intervene, which might result in a confused reader who might find it hard to keep track of which discourse he or she is processing. This surah was studied using specifically-designed text-world diagrams. The findings suggest that TWT can be used to help solve problems of ambiguity at the level of discourse in Quranic texts and to help construct a thinking reader whose cognitive constructs (text worlds / mental representations) are built through reflecting on the various and often changing components of discourse world, text world, and sub-worlds.

Keywords: Al-Kahf, Surah, cognitive, processing, discourse

Procedia PDF Downloads 88
1881 Clicking Based Graphical Password Scheme Resistant to Spyware

Authors: Bandar Alahmadi

Abstract:

The fact that people tend to remember pictures better than texts, motivates researchers to develop graphical passwords as an alternative to textual passwords. Graphical passwords as such were introduced as a possible alternative to traditional text passwords, in which users prove their identity by clicking on pictures rather than typing alphanumerical text. In this paper, we present a scheme for graphical passwords that are resistant to shoulder surfing attacks and spyware attacks. The proposed scheme introduces a clicking technique to chosen images. First, the users choose a set of images, the images are then included in a grid where users can click in the cells around each image, the location of the click and the number of clicks are saved. As a result, the proposed scheme can be safe from shoulder surface and spyware attacks.

Keywords: security, password, authentication, attack, applications

Procedia PDF Downloads 163
1880 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 458
1879 The Journey of a Malicious HTTP Request

Authors: M. Mansouri, P. Jaklitsch, E. Teiniker

Abstract:

SQL injection on web applications is a very popular kind of attack. There are mechanisms such as intrusion detection systems in order to detect this attack. These strategies often rely on techniques implemented at high layers of the application but do not consider the low level of system calls. The problem of only considering the high level perspective is that an attacker can circumvent the detection tools using certain techniques such as URL encoding. One technique currently used for detecting low-level attacks on privileged processes is the tracing of system calls. System calls act as a single gate to the Operating System (OS) kernel; they allow catching the critical data at an appropriate level of detail. Our basic assumption is that any type of application, be it a system service, utility program or Web application, “speaks” the language of system calls when having a conversation with the OS kernel. At this level we can see the actual attack while it is happening. We conduct an experiment in order to demonstrate the suitability of system call analysis for detecting SQL injection. We are able to detect the attack. Therefore we conclude that system calls are not only powerful in detecting low-level attacks but that they also enable us to detect high-level attacks such as SQL injection.

Keywords: Linux system calls, web attack detection, interception, SQL

Procedia PDF Downloads 358
1878 The Untranslatability of the Qur’an

Authors: Mina Elhjouji

Abstract:

The aim of this paper is to raise awareness of the untranslatability of the Qur’an and to suggest some solutions that can help the translator in the process of transferring the meaning from the source text to the target text as much as possible. After the introduction, the miraculous character of the Qur’an shall be illustrated. Then, the difficulty of translating religious texts will be shown in terms of different causes; thematic, cultural, and linguistic. Some examples shall illustrate each type of these difficulties. Finally, some strategies that can help translate the Quran’s meanings will be suggested.

Keywords: translation, religious text, untranslatability, The Qur’an miracle, communicative theory

Procedia PDF Downloads 11
1877 Air Flow Characteristics and Pressure Distributions for Staggered Wing Shaped Tubes Bundle

Authors: Sayed A. Elsayed, Emad Z. Ibrahim, Osama M. Mesalhy, Mohamed A. Abdelatief

Abstract:

An experimental and numerical study has been conducted to clarify fluid flow characteristics and pressure drop distributions of a cross-flow heat exchanger employing staggered wing-shaped tubes at different angels of attack. The water-side Rew and the air-side Rea were at 5 x 102 and at from 1.8 x 103 to 9.7 x 103, respectively. Three cases of the tubes arrangements with various angles of attack, row angles of attack and 90° cone angles were employed at the considered Rea range. Correlation of pressure drop coefficient Pdc in terms of Rea, design parameters for the studied cases were presented. The flow pattern around the staggered wing-shaped tubes bundle were predicted by using commercial CFD FLUENT 6.3.26 software package. Results indicated that the values of Pdc were increased by increasing the angle of attack from 0° to 45°, while the opposite was true for angles of attack from 135° to 180°. Comparisons between the experimental and numerical results of the present study and those, previously, obtained for similar available studies showed good agreements.

Keywords: wing-shaped tubes, cross-flow cooling, staggered arrangement, CFD

Procedia PDF Downloads 376
1876 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 260
1875 Trace Network: A Probabilistic Relevant Pattern Recognition Approach to Attribution Trace Analysis

Authors: Jian Xu, Xiaochun Yun, Yongzheng Zhang, Yafei Sang, Zhenyu Cheng

Abstract:

Network attack prevention is a critical research area of information security. Network attack would be oppressed if attribution techniques are capable to trace back to the attackers after the hacking event. Therefore attributing these attacks to a particular identification becomes one of the important tasks when analysts attempt to differentiate and profile the attacker behind a piece of attack trace. To assist analysts in expose attackers behind the scenes, this paper researches on the connections between attribution traces and proposes probabilistic relevance based attribution patterns. This method facilitates the evaluation of the plausibility relevance between different traceable identifications. Furthermore, through analyzing the connections among traces, it could confirm the existence probability of a certain organization as well as discover its affinitive partners by the means of drawing relevance matrix from attribution traces.

Keywords: attribution trace, probabilistic relevance, network attack, attacker identification

Procedia PDF Downloads 366
1874 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 175
1873 Structural Analysis of Kamaluddin Behzad's Works Based on Roland Barthes' Theory of Communication, 'Text and Image'

Authors: Mahsa Khani Oushani, Mohammad Kazem Hasanvand

Abstract:

Text and image have always been two important components in Iranian layout. The interactive connection between text and image has shaped the art of book design with multiple patterns. In this research, first the structure and visual elements in the research data were analyzed and then the position of the text element and the image element in relation to each other based on Roland Barthes theory on the three theories of text and image, were studied and analyzed and the results were compared, and interpreted. The purpose of this study is to investigate the pattern of text and image in the works of Kamaluddin Behzad based on three Roland Barthes communication theories, 1. Descriptive communication, 2. Reference communication, 3. Matched communication. The questions of this research are what is the relationship between text and image in Behzad's works? And how is it defined according to Roland Barthes theory? The method of this research has been done with a structuralist approach with a descriptive-analytical method in a library collection method. The information has been collected in the form of documents (library) and is a tool for collecting online databases. Findings show that the dominant element in Behzad's drawings is with the image and has created a reference relationship in the layout of the drawings, but in some cases it achieves a different relationship that despite the preference of the image on the page, the text is dispersed proportionally on the page and plays a more active role, played within the image. The text and the image support each other equally on the page; Roland Barthes equates this connection.

Keywords: text, image, Kamaluddin Behzad, Roland Barthes, communication theory

Procedia PDF Downloads 192
1872 A Study on Automotive Attack Database and Data Flow Diagram for Concretization of HEAVENS: A Car Security Model

Authors: Se-Han Lee, Kwang-Woo Go, Gwang-Hyun Ahn, Hee-Sung Park, Cheol-Kyu Han, Jun-Bo Shim, Geun-Chul Kang, Hyun-Jung Lee

Abstract:

In recent years, with the advent of smart cars and the expansion of the market, the announcement of 'Adventures in Automotive Networks and Control Units' at the DEFCON21 conference in 2013 revealed that cars are not safe from hacking. As a result, the HEAVENS model considering not only the functional safety of the vehicle but also the security has been suggested. However, the HEAVENS model only presents a simple process, and there are no detailed procedures and activities for each process, making it difficult to apply it to the actual vehicle security vulnerability check. In this paper, we propose an automated attack database that systematically summarizes attack vectors, attack types, and vulnerable vehicle models to prepare for various car hacking attacks, and data flow diagrams that can detect various vulnerabilities and suggest a way to materialize the HEAVENS model.

Keywords: automotive security, HEAVENS, car hacking, security model, information security

Procedia PDF Downloads 362
1871 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 326
1870 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 96
1869 Written Argumentative Texts in Elementary School: The Development of Text Structure and Its Relation to Reading Comprehension

Authors: Sara Zadunaisky Ehrlich, Batia Seroussi, Anat Stavans

Abstract:

Text structure is a parameter of text quality. This study investigated the structure of written argumentative texts produced by elementary school age children. We set two objectives: to identify and trace the structural components of the argumentative texts and to investigate whether reading comprehension skills were correlated with text structure. 293 school children from 2nd to 5th grades were asked to write two argumentative texts about informal or everyday life controversial topics and completed two reading tasks that targeted different levels of text comprehension. The findings indicated, on the one hand, significant developmental differences between mature and more novice writers in terms of text length and mean proportion of clauses produced for a better elaboration of the different text components. On the other hand, with certain fluctuations, no meaningful differences were found in terms of presence of text structure: at all grade levels, elementary school children produced the basic and minimal structure that included the writer's argument and reasons or arguments' supports. Counter-arguments were scarce even in the upper grades. While the children captured that essentially an argument must be justified, the more the number of supports produced, the fewer the clauses the children produced. Last, weak to mild relations were found between reading comprehension and argumentative text structure. Nevertheless, children who scored higher on sophisticated questions that require inferential or world knowledge displayed more elaborated structures in terms of text length and size of supports to the writer's argument. These findings indicate how school-age children perceive the basic template of an argument with future implications regarding how to elaborate written arguments.

Keywords: argumentative text, text structure, elementary school children, written argumentations

Procedia PDF Downloads 166
1868 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: bilingual, deviations, morphology, texts

Procedia PDF Downloads 269
1867 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 589
1866 Video Text Information Detection and Localization in Lecture Videos Using Moments

Authors: Belkacem Soundes, Guezouli Larbi

Abstract:

This paper presents a robust and accurate method for text detection and localization over lecture videos. Frame regions are classified into text or background based on visual feature analysis. However, lecture video shows significant degradation mainly related to acquisition conditions, camera motion and environmental changes resulting in low quality videos. Hence, affecting feature extraction and description efficiency. Moreover, traditional text detection methods cannot be directly applied to lecture videos. Therefore, robust feature extraction methods dedicated to this specific video genre are required for robust and accurate text detection and extraction. Method consists of a three-step process: Slide region detection and segmentation; Feature extraction and non-text filtering. For robust and effective features extraction moment functions are used. Two distinct types of moments are used: orthogonal and non-orthogonal. For orthogonal Zernike Moments, both Pseudo Zernike moments are used, whereas for non-orthogonal ones Hu moments are used. Expressivity and description efficiency are given and discussed. Proposed approach shows that in general, orthogonal moments show high accuracy in comparison to the non-orthogonal one. Pseudo Zernike moments are more effective than Zernike with better computation time.

Keywords: text detection, text localization, lecture videos, pseudo zernike moments

Procedia PDF Downloads 151
1865 Intonation Salience as an Underframe to Text Intonation Models

Authors: Tatiana Stanchuliak

Abstract:

It is common knowledge that intonation is not laid over a ready text. On the contrary, intonation forms and accompanies the text on the level of its birth in the speaker’s mind. As a result, intonation plays one of the fundamental roles in the process of transferring a thought into external speech. Intonation structure can highlight the semantic significance of textual elements and become a ranging mark in understanding the information structure of the text. Intonation functions by means of prosodic characteristics, one of which is intonation salience, whose function in texts results in making some textual elements more prominent than others. This function of intonation, therefore, performs as organizing. It helps to form the frame of key elements of the text. The study under consideration made an attempt to look into the inner nature of salience and create a sort of a text intonation model. This general goal brought to some more specific intermediate results. First, there were established degrees of salience on the level of the smallest semantic element - intonation group, as well as prosodic means of creating salience, were examined. Second, the most frequent combinations of prosodic means made it possible to distinguish patterns of salience, which then became constituent elements of a text intonation model. Third, the analysis of the predicate structure allowed to divide the whole text into smaller parts, or units, which performed a specific function in the developing of the general communicative intention. It appeared that such units can be found in any text and they have common characteristics of their intonation arrangement. These findings are certainly very important both for the theory of intonation and their practical application.

Keywords: accentuation , inner speech, intention, intonation, intonation functions, models, patterns, predicate, salience, semantics, sentence stress, text

Procedia PDF Downloads 266
1864 Distorted Document Images Dataset for Text Detection and Recognition

Authors: Ilia Zharikov, Philipp Nikitin, Ilia Vasiliev, Vladimir Dokholyan

Abstract:

With the increasing popularity of document analysis and recognition systems, text detection (TD) and optical character recognition (OCR) in document images become challenging tasks. However, according to our best knowledge, no publicly available datasets for these particular problems exist. In this paper, we introduce a Distorted Document Images dataset (DDI-100) and provide a detailed analysis of the DDI-100 in its current state. To create the dataset we collected 7000 unique document pages, and extend it by applying different types of distortions and geometric transformations. In total, DDI-100 contains more than 100,000 document images together with binary text masks, text and character locations in terms of bounding boxes. We also present an analysis of several state-of-the-art TD and OCR approaches on the presented dataset. Lastly, we demonstrate the usefulness of DDI-100 to improve accuracy and stability of the considered TD and OCR models.

Keywords: document analysis, open dataset, optical character recognition, text detection

Procedia PDF Downloads 172
1863 Text-to-Speech in Azerbaijani Language via Transfer Learning in a Low Resource Environment

Authors: Dzhavidan Zeinalov, Bugra Sen, Firangiz Aslanova

Abstract:

Most text-to-speech models cannot operate well in low-resource languages and require a great amount of high-quality training data to be considered good enough. Yet, with the improvements made in ASR systems, it is now much easier than ever to collect data for the design of custom text-to-speech models. In this work, our work on using the ASR model to collect data to build a viable text-to-speech system for one of the leading financial institutions of Azerbaijan will be outlined. NVIDIA’s implementation of the Tacotron 2 model was utilized along with the HiFiGAN vocoder. As for the training, the model was first trained with high-quality audio data collected from the Internet, then fine-tuned on the bank’s single speaker call center data. The results were then evaluated by 50 different listeners and got a mean opinion score of 4.17, displaying that our method is indeed viable. With this, we have successfully designed the first text-to-speech model in Azerbaijani and publicly shared 12 hours of audiobook data for everyone to use.

Keywords: Azerbaijani language, HiFiGAN, Tacotron 2, text-to-speech, transfer learning, whisper

Procedia PDF Downloads 44