Search results for: Text file format
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 824

Search results for: Text file format

764 Meta-Classification using SVM Classifiers for Text Documents

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.

Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1569
763 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1942
762 Grid Based and Random Based Ant Colony Algorithms for Automatic Hose Routing in 3D Space

Authors: Gishantha Thantulage, Tatiana Kalganova, Manissa Wilson

Abstract:

Ant Colony Algorithms have been applied to difficult combinatorial optimization problems such as the travelling salesman problem and the quadratic assignment problem. In this paper gridbased and random-based ant colony algorithms are proposed for automatic 3D hose routing and their pros and cons are discussed. The algorithm uses the tessellated format for the obstacles and the generated hoses in order to detect collisions. The representation of obstacles and hoses in the tessellated format greatly helps the algorithm towards handling free-form objects and speeds up computation. The performance of algorithm has been tested on a number of 3D models.

Keywords: Ant colony algorithm, Automatic hose routing, tessellated format, RAPID.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
761 Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution

Authors: Nchimbi Edward Pius, Liu Qin, Fion Yang, Zhu Hong Ming

Abstract:

The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.

This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.

It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.

The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.

Keywords: Balancer, Datanode, Distributed file system, Hadoop, Replicas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4899
760 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1844
759 Narrative and Expository Text Reading Comprehension by Fourth Grade Spanish-Speaking Children

Authors: Mariela V. De Mier, Veronica S. Sanchez Abchi, Ana M. Borzone

Abstract:

This work aims to explore the factors that have an incidence in reading comprehension process, with different type of texts. In a recent study with 2nd, 3rd and 4th grade children, it was observed that reading comprehension of narrative texts was better than comprehension of expository texts. Nevertheless it seems that not only the type of text but also other textual factors would account for comprehension depending on the cognitive processing demands posed by the text. In order to explore this assumption, three narrative and three expository texts were elaborated with different degree of complexity. A group of 40 fourth grade Spanish-speaking children took part in the study. Children were asked to read the texts and answer orally three literal and three inferential questions for each text. The quantitative and qualitative analysis of children responses showed that children had difficulties in both, narrative and expository texts. The problem was to answer those questions that involved establishing complex relationships among information units that were present in the text or that should be activated from children’s previous knowledge to make an inference. Considering the data analysis, it could be concluded that there is some interaction between the type of text and the cognitive processing load of a specific text.

Keywords: comprehension, textual factors, type of text, processing demands.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1350
758 Web Page Watermarking: XML files using Synonyms and Acronyms

Authors: Nighat Mir, Sayed Afaq Hussain

Abstract:

Advent enhancements in the field of computing have increased massive use of web based electronic documents. Current Copyright protection laws are inadequate to prove the ownership for electronic documents and do not provide strong features against copying and manipulating information from the web. This has opened many channels for securing information and significant evolutions have been made in the area of information security. Digital Watermarking has developed into a very dynamic area of research and has addressed challenging issues for digital content. Watermarking can be visible (logos or signatures) and invisible (encoding and decoding). Many visible watermarking techniques have been studied for text documents but there are very few for web based text. XML files are used to trade information on the internet and contain important information. In this paper, two invisible watermarking techniques using Synonyms and Acronyms are proposed for XML files to prove the intellectual ownership and to achieve the security. Analysis is made for different attacks and amount of capacity to be embedded in the XML file is also noticed. A comparative analysis for capacity is also made for both methods. The system has been implemented using C# language and all tests are made practically to get the results.

Keywords: Watermarking, Extensible Markup Language (XML), Synonyms, Acronyms, Copyright protection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2229
757 Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Authors: Weizhi Xu, Zhiyong Liu, Dongrui Fan, Shuai Jiao, Xiaochun Ye, Fenglong Song, Chenggang Yan

Abstract:

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA-s SpMV library with as high as 17% speedup.

Keywords: GPU, HYB-R, Many-core, Performance Tuning, SpMV

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1949
756 A Talking Head System for Korean Text

Authors: Sang-Wan Kim, Hoon Lee, Kyung-Ho Choi, Soon-Young Park

Abstract:

A talking head system (THS) is presented to animate the face of a speaking 3D avatar in such a way that it realistically pronounces the given Korean text. The proposed system consists of SAPI compliant text-to-speech (TTS) engine and MPEG-4 compliant face animation generator. The input to the THS is a unicode text that is to be spoken with synchronized lip shape. The TTS engine generates a phoneme sequence with their duration and audio data. The TTS applies the coarticulation rules to the phoneme sequence and sends a mouth animation sequence to the face modeler. The proposed THS can make more natural lip sync and facial expression by using the face animation generator than those using the conventional visemes only. The experimental results show that our system has great potential for the implementation of talking head for Korean text.

Keywords: Talking head, Lip sync, TTS, MPEG4.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446
755 Comparative Survey of Object Serialization Techniques and the Programming Supports

Authors: Kazuaki Maeda

Abstract:

This paper compares six approaches of object serialization from qualitative and quantitative aspects. Those are object serialization in Java, IDL, XStream, Protocol Buffers, Apache Avro, and MessagePack. Using each approach, a common example is serialized to a file and the size of the file is measured. The qualitative comparison works are investigated in the way of checking whether schema definition is required or not, whether schema compiler is required or not, whether serialization is based on ascii or binary, and which programming languages are supported. It is clear that there is no best solution. Each solution makes good in the context it was developed.

Keywords: structured data, serialization, programming

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2449
754 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: Bilingual, deviations, morphology, texts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1920
753 String Searching in Dispersed Files using MDS Convolutional Codes

Authors: A. S. Poornima, R. Aparna, B. B. Amberker, Prashant Koulgi

Abstract:

In this paper, we propose use of convolutional codes for file dispersal. The proposed method is comparable in complexity to the information Dispersal Algorithm proposed by M.Rabin and for particular choices of (non-binary) convolutional codes, is almost as efficient as that algorithm in terms of controlling expansion in the total storage. Further, our proposed dispersal method allows string search.

Keywords: Convolutional codes, File dispersal, Filereconstruction, Information Dispersal Algorithm, String search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1239
752 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1903
751 Parallel Text Processing: Alignment of Indonesian to Javanese Language

Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy

Abstract:

Parallel text alignment is proposed as a way of aligning bahasa Indonesia to words in Javanese. Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).

Keywords: Parallel text alignment, phrase pair combination, edit distance coefficient, Javanese-Indonesian language.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2436
750 Performance Evaluation of an Online Text-Based Strategy Game

Authors: Nazleeni S. Haron, Mohd K. Zaime , Izzatdin A. Aziz, Mohd H. Hasan

Abstract:

Text-based game is supposed to be a low resource consumption application that delivers good performances when compared to graphical-intensive type of games. But, nowadays, some of the online text-based games are not offering performances that are acceptable to the users. Therefore, an online text-based game called Star_Quest has been developed in order to analyze its behavior under different performance measurements. Performance metrics such as throughput, scalability, response time and page loading time are captured to yield the performance of the game. The techniques in performing the load testing are also disclosed to exhibit the viability of our work. The comparative assessment between the results obtained and the accepted level of performances are conducted as to determine the performance level of the game. The study reveals that the developed game managed to meet all the performance objectives set forth.

Keywords: Online text-based games, performance evaluation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1561
749 Dynamic Inverted Index Maintenance

Authors: Leo Galambos

Abstract:

The majority of today's IR systems base the IR task on two main processes: indexing and searching. There exists a special group of dynamic IR systems where both processes (indexing and searching) happen simultaneously; such a system discards obsolete information, simultaneously dealing with the insertion of new in¬formation, while still answering user queries. In these dynamic, time critical text document databases, it is often important to modify index structures quickly, as documents arrive. This paper presents a method for dynamization which may be used for this task. Experimental results show that the dynamization process is possible and that it guarantees the response time for the query operation and index actualization.

Keywords: Search engine, inverted file, index management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1338
748 A P2P File Sharing Technique by Indexed-Priority Metric

Authors: Toshinori Takabatake, Yoshikazu Komano

Abstract:

Recently, the improvements in processing performance of a computer and in high speed communication of an optical fiber have been achieved, so that the amount of data which are processed by a computer and flowed on a network has been increasing greatly. However, in a client-server system, since the server receives and processes the amount of data from the clients through the network, a load on the server is increasing. Thus, there are needed to introduce a server with high processing ability and to have a line with high bandwidth. In this paper, concerning to P2P networks to resolve the load on a specific server, a criterion called an Indexed-Priority Metric is proposed and its performance is evaluated. The proposed metric is to allocate some files to each node. As a result, the load on a specific server can distribute them to each node equally well. A P2P file sharing system using the proposed metric is implemented. Simulation results show that the proposed metric can make it distribute files on the specific server.

Keywords: peer-to-peer, file-sharing system, load-balancing, dependability

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1348
747 The Different Ways to Describe Regular Languages by Using Finite Automata and the Changing Algorithm Implementation

Authors: Abdulmajid Mukhtar Afat

Abstract:

This paper aims at introducing finite automata theory, the different ways to describe regular languages and create a program to implement the subset construction algorithms to convert nondeterministic finite automata (NFA) to deterministic finite automata (DFA). This program is written in c++ programming language. The program reads FA 5tuples from text file and then classifies it into either DFA or NFA. For DFA, the program will read the string w and decide whether it is acceptable or not. If accepted, the program will save the tracking path and point it out. On the other hand, when the automation is NFA, the program will change the Automation to DFA so that it is easy to track and it can decide whether the w exists in the regular language or not.

Keywords: Finite Automata, subset construction DFA, NFA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1946
746 Using Suffix Tree Document Representation in Hierarchical Agglomerative Clustering

Authors: Daniel I. Morariu, Radu G. Cretulescu, Lucian N. Vintan

Abstract:

In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any “word context" information found in the document. In this article we make a comparison between the classical method of document representation and a method called Suffix Tree Document Model (STDM) that is based on representing documents in the Suffix Tree format. For the STDM model we proposed a new approach for documents representation and a new formula for computing the similarity between two documents. Thus we propose to build the suffix tree only for any two documents at a time. This approach is faster, it has lower memory consumption and use entire document representation without using methods for disposing nodes. Also for this method is proposed a formula for computing the similarity between documents, which improves substantially the clustering quality. This representation method was validated using HAC - Hierarchical Agglomerative Clustering. In this context we experiment also the stemming influence in the document preprocessing step and highlight the difference between similarity or dissimilarity measures to find “closer" documents.

Keywords: Text Clustering, Suffix tree documentrepresentation, Hierarchical Agglomerative Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1866
745 Web Application to Profiling Scientific Institutions through Citation Mining

Authors: Hector D. Cortes, Jesus A. del Rio, Esther O. Garcia, Miguel Robles

Abstract:

Recently the use of data mining to scientific bibliographic data bases has been implemented to analyze the pathways of the knowledge or the core scientific relevances of a laureated novel or a country. This specific case of data mining has been named citation mining, and it is the integration of citation bibliometrics and text mining. In this paper we present an improved WEB implementation of statistical physics algorithms to perform the text mining component of citation mining. In particular we use an entropic like distance between the compression of text as an indicator of the similarity between them. Finally, we have included the recently proposed index h to characterize the scientific production. We have used this web implementation to identify users, applications and impact of the Mexican scientific institutions located in the State of Morelos.

Keywords: Citation Mining, Text Mining, Science Impact

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1698
744 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: Search algorithm, centroid, query, keyword, cooccurrence, categorisation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 581
743 Connectionist Approach to Generic Text Summarization

Authors: Rajesh S.Prasad, U. V. Kulkarni, Jayashree.R.Prasad

Abstract:

As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance.

Keywords: Artificial Neural Networks (ANN); Computational Intelligence (CI); Connectionist Text Summarizer ECTS (ECTS); Evolving Connectionist systems; Evolving systems; Fuzzy systems (FS); Part of Speech (POS) disambiguation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544
742 An Architecture for High Performance File SystemI/O

Authors: Mikulas Patocka

Abstract:

This paper presents an architecture of current filesystem implementations as well as our new filesystem SpadFS and operating system Spad with rewritten VFS layer targeted at high performance I/O applications. The paper presents microbenchmarks and real-world benchmarks of different filesystems on the same kernel as well as benchmarks of the same filesystem on different kernels – enabling the reader to make conclusion how much is the performance of various tasks affected by operating system and how much by physical layout of data on disk. The paper describes our novel features–most notably continuous allocation of directories and cross-file readahead – and shows their impact on performance.

Keywords: Filesystem, operating system, VFS, performance, readahead

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407
741 Hybrid Machine Learning Approach for Text Categorization

Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite

Abstract:

Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.

Keywords: Text categorization, decision trees, neural networks, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766
740 Speech Encryption and Decryption Using Linear Feedback Shift Register (LFSR)

Authors: Tin Lai Win, Nant Christina Kyaw

Abstract:

This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.

Keywords: Linear Feedback Shift Register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3067
739 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition

Authors: Redouane Tlemsani, Abdelkader Benyettou

Abstract:

Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.

This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.

Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.

In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.

The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.

Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690
738 Hybrid Authentication System Using QR Code with OTP

Authors: Salim Istyaq

Abstract:

As we know, number of Internet users are increasing drastically. Now, people are using different online services provided by banks, colleges/schools, hospitals, online utility, bill payment and online shopping sites. To access online services, text-based authentication system is in use. The text-based authentication scheme faces some drawbacks with usability and security issues that bring troubles to users. The core element of computational trust is identity. The aim of the paper is to make the system more compliable for the imposters and more reliable for the users, by using the graphical authentication approach. In this paper, we are using the more powerful tool of encoding the options in graphical QR format and also there will be the acknowledgment which will send to the user’s mobile for final verification. The main methodology depends upon the encryption option and final verification by confirming a set of pass phrase on the legal users, the outcome of the result is very powerful as it only gives the result at once when the process is successfully done. All processes are cross linked serially as the output of the 1st process, is the input of the 2nd and so on. The system is a combination of recognition and pure recall based technique. Presented scheme is useful for devices like PDAs, iPod, phone etc. which are more handy and convenient to use than traditional desktop computer systems.

Keywords: Graphical Password, OTP, QR Codes, Recognition based graphical user authentication, usability and security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
737 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang

Abstract:

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: Text mining, Twitter, topic model, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1733
736 A Novel Arabic Text Steganography Method Using Letter Points and Extensions

Authors: Adnan Abdul-Aziz Gutub, Manal Mohammad Fattani

Abstract:

This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.

Keywords: Arabic text, Cryptography, Feature coding, Information security, Text steganography, Text watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3460
735 Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks

Authors: Jalil Z., Farooq M., Zafar H., Sabir M., Ashraf E.

Abstract:

Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author-s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers.

Keywords: Copyright protection, Digital watermarking, Document authentication, Information security, Watermark.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2118