Search results for: unstructured text
595 Delaunay Triangulations Efficiency for Conduction-Convection Problems
Authors: Bashar Albaalbaki, Roger E. Khayat
Abstract:
This work is a comparative study on the effect of Delaunay triangulation algorithms on discretization error for conduction-convection conservation problems. A structured triangulation and many unstructured Delaunay triangulations using three popular algorithms for node placement strategies are used. The numerical method employed is the vertex-centered finite volume method. It is found that when the computational domain can be meshed using a structured triangulation, the discretization error is lower for structured triangulations compared to unstructured ones for only low Peclet number values, i.e. when conduction is dominant. However, as the Peclet number is increased and convection becomes more significant, the unstructured triangulations reduce the discretization error. Also, no statistical correlation between triangulation angle extremums and the discretization error is found using 200 samples of randomly generated Delaunay and non-Delaunay triangulations. Thus, the angle extremums cannot be an indicator of the discretization error on their own and need to be combined with other triangulation quality measures, which is the subject of further studies.
Keywords: Conduction-convection problems, Delaunay triangulation, discretization error, finite volume method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 153594 Journals Subheadlines Text Extraction Using Wavelet Thresholding and New Projection Profile
Authors: Davod Zaravi, Habib Rostami, Alireza Malahzaheh, S. S. Mortazavi
Abstract:
In this paper a new robust and efficient algorithm to automatic text extraction from colored book and journal cover sheets is proposed. First, we perform wavelet transform. Next for edge detecting from detail wavelet coefficient, we use dynamic threshold. By blurring approximate coefficients with alternative heuristic thresholding, achieve effective edge,. Afterward, with ROI technique get binary image. Finally text boxes would be extracted with new projection profile.
Keywords: Text extraction, colored cover sheet, wavelet threshold, region of interest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649593 Text-Mining Approach for Evaluation of Affective Management Practices
Authors: Masaaki Saito, Qin Tang, Hiroyuki Umemuro
Abstract:
The purpose of this paper is to propose a text mining approach to evaluate companies- practices on affective management. Affective management argues that it is critical to take stakeholders- affects into consideration during decision-making process, along with the traditional numerical and rational indices. CSR reports published by companies were collected as source information. Indices were proposed based on the frequency and collocation of words relevant to affective management concept using text mining approach to analyze the text information of CSR reports. In addition, the relationships between the results obtained using proposed indices and traditional indicators of business performance were investigated using correlation analysis. Those correlations were also compared between manufacturing and non-manufacturing companies. The results of this study revealed the possibility to evaluate affective management practices of companies based on publicly available text documents.Keywords: Affective management, Affect, Stakeholder, Text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1844592 Meta-Classification using SVM Classifiers for Text Documents
Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp
Abstract:
Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614591 An Automatic Bayesian Classification System for File Format Selection
Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan
Abstract:
This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.Keywords: Data mining, digital libraries, digital preservation, file format.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658590 A Content Vector Model for Text Classification
Authors: Eric Jiang
Abstract:
As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1884589 Narrative and Expository Text Reading Comprehension by Fourth Grade Spanish-Speaking Children
Authors: Mariela V. De Mier, Veronica S. Sanchez Abchi, Ana M. Borzone
Abstract:
This work aims to explore the factors that have an incidence in reading comprehension process, with different type of texts. In a recent study with 2nd, 3rd and 4th grade children, it was observed that reading comprehension of narrative texts was better than comprehension of expository texts. Nevertheless it seems that not only the type of text but also other textual factors would account for comprehension depending on the cognitive processing demands posed by the text. In order to explore this assumption, three narrative and three expository texts were elaborated with different degree of complexity. A group of 40 fourth grade Spanish-speaking children took part in the study. Children were asked to read the texts and answer orally three literal and three inferential questions for each text. The quantitative and qualitative analysis of children responses showed that children had difficulties in both, narrative and expository texts. The problem was to answer those questions that involved establishing complex relationships among information units that were present in the text or that should be activated from children’s previous knowledge to make an inference. Considering the data analysis, it could be concluded that there is some interaction between the type of text and the cognitive processing load of a specific text.
Keywords: comprehension, textual factors, type of text, processing demands.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407588 Kinetic Studies on Microbial Production of Tannase Using Redgram Husk
Authors: S. K. Mohan, T. Viruthagiri, C. Arunkumar
Abstract:
Tannase (tannin acyl hydrolase, E.C.3.1.1.20) is an important hydrolysable enzyme with innumerable applications and industrial potential. In the present study, a kinetic model has been developed for the batch fermentation used for the production of tannase by A.flavus MTCC 3783. Maximum tannase activity of 143.30 U/ml was obtained at 96 hours under optimum operating conditions at 35oC, an initial pH of 5.5 and with an inducer tannic acid concentration of 3% (w/v) for a fermentation period of 120 hours. The biomass concentration reaches a maximum of 6.62 g/l at 96 hours and further there was no increase in biomass concentration till the end of the fermentation. Various unstructured kinetic models were analyzed to simulate the experimental values of microbial growth, tannase activity and substrate concentration. The Logistic model for microbial growth , Luedeking - Piret model for production of tannase and Substrate utilization kinetic model for utilization of substrate were capable of predicting the fermentation profile with high coefficient of determination (R2) values of 0.980, 0.942 and 0.983 respectively. The results indicated that the unstructured models were able to describe the fermentation kinetics more effectively.Keywords: Aspergillus flavus, Batch fermentation, Kinetic model, Tannase, Unstructured models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563587 A Talking Head System for Korean Text
Authors: Sang-Wan Kim, Hoon Lee, Kyung-Ho Choi, Soon-Young Park
Abstract:
A talking head system (THS) is presented to animate the face of a speaking 3D avatar in such a way that it realistically pronounces the given Korean text. The proposed system consists of SAPI compliant text-to-speech (TTS) engine and MPEG-4 compliant face animation generator. The input to the THS is a unicode text that is to be spoken with synchronized lip shape. The TTS engine generates a phoneme sequence with their duration and audio data. The TTS applies the coarticulation rules to the phoneme sequence and sends a mouth animation sequence to the face modeler. The proposed THS can make more natural lip sync and facial expression by using the face animation generator than those using the conventional visemes only. The experimental results show that our system has great potential for the implementation of talking head for Korean text.Keywords: Talking head, Lip sync, TTS, MPEG4.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490586 The Morphology of Sri Lankan Text Messages
Authors: Chamindi Dilkushi Senaratne
Abstract:
Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.Keywords: Bilingual, deviations, morphology, texts.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1976585 Growing Self Organising Map Based Exploratory Analysis of Text Data
Authors: Sumith Matharage, Damminda Alahakoon
Abstract:
Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.
Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1995584 A Methodology for Automatic Diversification of Document Categories
Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim
Abstract:
Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1744583 Parallel Text Processing: Alignment of Indonesian to Javanese Language
Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy
Abstract:
Parallel text alignment is proposed as a way of aligning bahasa Indonesia to words in Javanese. Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).
Keywords: Parallel text alignment, phrase pair combination, edit distance coefficient, Javanese-Indonesian language.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2481582 Performance Evaluation of an Online Text-Based Strategy Game
Authors: Nazleeni S. Haron, Mohd K. Zaime , Izzatdin A. Aziz, Mohd H. Hasan
Abstract:
Text-based game is supposed to be a low resource consumption application that delivers good performances when compared to graphical-intensive type of games. But, nowadays, some of the online text-based games are not offering performances that are acceptable to the users. Therefore, an online text-based game called Star_Quest has been developed in order to analyze its behavior under different performance measurements. Performance metrics such as throughput, scalability, response time and page loading time are captured to yield the performance of the game. The techniques in performing the load testing are also disclosed to exhibit the viability of our work. The comparative assessment between the results obtained and the accepted level of performances are conducted as to determine the performance level of the game. The study reveals that the developed game managed to meet all the performance objectives set forth.Keywords: Online text-based games, performance evaluation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608581 Web Application to Profiling Scientific Institutions through Citation Mining
Authors: Hector D. Cortes, Jesus A. del Rio, Esther O. Garcia, Miguel Robles
Abstract:
Recently the use of data mining to scientific bibliographic data bases has been implemented to analyze the pathways of the knowledge or the core scientific relevances of a laureated novel or a country. This specific case of data mining has been named citation mining, and it is the integration of citation bibliometrics and text mining. In this paper we present an improved WEB implementation of statistical physics algorithms to perform the text mining component of citation mining. In particular we use an entropic like distance between the compression of text as an indicator of the similarity between them. Finally, we have included the recently proposed index h to characterize the scientific production. We have used this web implementation to identify users, applications and impact of the Mexican scientific institutions located in the State of Morelos.
Keywords: Citation Mining, Text Mining, Science Impact
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1754580 Chinese Event Detection Technique Based on Dependency Parsing and Rule Matching
Authors: Weitao Lin
Abstract:
To quickly extract adequate information from large-scale unstructured text data, this paper studies the representation of events in Chinese scenarios and performs the regularized abstraction. It proposes a Chinese event detection technique based on dependency parsing and rule matching. The method first performs dependency parsing on the original utterance, then performs pattern matching at the word or phrase granularity based on the results of dependent syntactic analysis, filters out the utterances with prominent non-event characteristics, and obtains the final results. The experimental results show the effectiveness of the method.
Keywords: Natural Language Processing, Chinese event detection, rules matching, dependency parsing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172579 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation
Authors: Mario Kubek, Herwig Unger
Abstract:
Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.Keywords: Search algorithm, centroid, query, keyword, cooccurrence, categorisation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 622578 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.
Keywords: Apriori, Association rules mining, Big Data, data mining, Hadoop, Map Reduce, MongoDB, NoSQL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 693577 Dynamic Variational Multiscale LES of Bluff Body Flows on Unstructured Grids
Authors: Carine Moussaed, Stephen Wornom, Bruno Koobus, Maria Vittoria Salvetti, Alain Dervieux,
Abstract:
The effects of dynamic subgrid scale (SGS) models are investigated in variational multiscale (VMS) LES simulations of bluff body flows. The spatial discretization is based on a mixed finite element/finite volume formulation on unstructured grids. In the VMS approach used in this work, the separation between the largest and the smallest resolved scales is obtained through a variational projection operator and a finite volume cell agglomeration. The dynamic version of Smagorinsky and WALE SGS models are used to account for the effects of the unresolved scales. In the VMS approach, these effects are only modeled in the smallest resolved scales. The dynamic VMS-LES approach is applied to the simulation of the flow around a circular cylinder at Reynolds numbers 3900 and 20000 and to the flow around a square cylinder at Reynolds numbers 22000 and 175000. It is observed as in previous studies that the dynamic SGS procedure has a smaller impact on the results within the VMS approach than in LES. But improvements are demonstrated for important feature like recirculating part of the flow. The global prediction is improved for a small computational extra cost.Keywords: variational multiscale LES, dynamic SGS model, unstructured grids, circular cylinder, square cylinder.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1823576 Connectionist Approach to Generic Text Summarization
Authors: Rajesh S.Prasad, U. V. Kulkarni, Jayashree.R.Prasad
Abstract:
As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance. Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589575 A Finite Volume Procedure on Unstructured Meshes for Fluid-Structure Interaction Problems
Authors: P I Jagad, B P Puranik, A W Date
Abstract:
Flow through micro and mini channels requires relatively high driving pressure due to the large fluid pressure drop through these channels. Consequently the forces acting on the walls of the channel due to the fluid pressure are also large. Due to these forces there are displacement fields set up in the solid substrate containing the channels. If the movement of the substrate is constrained at some points, then stress fields are established in the substrate. On the other hand, if the deformation of the channel shape is sufficiently large then its effect on the fluid flow is important to be calculated. Such coupled fluid-solid systems form a class of problems known as fluidstructure interactions. In the present work a co-located finite volume discretization procedure on unstructured meshes is described for solving fluid-structure interaction type of problems. A linear elastic solid is assumed for which the effect of the channel deformation on the flow is neglected. Thus the governing equations for the fluid and the solid are decoupled and are solved separately. The procedure is validated by solving two benchmark problems, one from fluid mechanics and another from solid mechanics. A fluid-structure interaction problem of flow through a U-shaped channel embedded in a plate is solved.Keywords: Finite volume method, flow induced stresses, fluidstructureinteraction, unstructured meshes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1888574 Hybrid Machine Learning Approach for Text Categorization
Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite
Abstract:
Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.
Keywords: Text categorization, decision trees, neural networks, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805573 Speech Encryption and Decryption Using Linear Feedback Shift Register (LFSR)
Authors: Tin Lai Win, Nant Christina Kyaw
Abstract:
This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.
Keywords: Linear Feedback Shift Register.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3110572 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition
Authors: Redouane Tlemsani, Abdelkader Benyettou
Abstract:
Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.
This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.
Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.
In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.
The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.
Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1780571 A File Splitting Technique for Reducing the Entropy of Text Files
Authors: Abdel-Rahman M. Jaradat, , Mansour I. Irshid, Talha T. Nassar
Abstract:
A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.
Keywords: Bit-wise compression, entropy, file splitting, source mapping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1442570 A Novel Arabic Text Steganography Method Using Letter Points and Extensions
Authors: Adnan Abdul-Aziz Gutub, Manal Mohammad Fattani
Abstract:
This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.Keywords: Arabic text, Cryptography, Feature coding, Information security, Text steganography, Text watermarking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3503569 Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks
Authors: Jalil Z., Farooq M., Zafar H., Sabir M., Ashraf E.
Abstract:
Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author-s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers.Keywords: Copyright protection, Digital watermarking, Document authentication, Information security, Watermark.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2158568 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification
Authors: Bharatendra Rai
Abstract:
Sequences of words in text data have long-term dependencies and are known to suffer from vanishing gradient problem when developing deep learning models. Although recurrent networks such as long short-term memory networks help overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine advantages of long short-term memory networks and convolutional neural networks, can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting of a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning.
Keywords: Convolutional recurrent networks, hyperparameter tuning, long short-term memory networks, Tukey honest significant differences
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 113567 Finite Volume Method for Flow Prediction Using Unstructured Meshes
Authors: Juhee Lee, Yongjun Lee
Abstract:
In designing a low-energy-consuming buildings, the heat transfer through a large glass or wall becomes critical. Multiple layers of the window glasses and walls are employed for the high insulation. The gravity driven air flow between window glasses or wall layers is a natural heat convection phenomenon being a key of the heat transfer. For the first step of the natural heat transfer analysis, in this study the development and application of a finite volume method for the numerical computation of viscous incompressible flows is presented. It will become a part of the natural convection analysis with high-order scheme, multi-grid method, and dual-time step in the future. A finite volume method based on a fully-implicit second-order is used to discretize and solve the fluid flow on unstructured grids composed of arbitrary-shaped cells. The integrations of the governing equation are discretised in the finite volume manner using a collocated arrangement of variables. The convergence of the SIMPLE segregated algorithm for the solution of the coupled nonlinear algebraic equations is accelerated by using a sparse matrix solver such as BiCGSTAB. The method used in the present study is verified by applying it to some flows for which either the numerical solution is known or the solution can be obtained using another numerical technique available in the other researches. The accuracy of the method is assessed through the grid refinement.
Keywords: Finite volume method, fluid flow, laminar flow, unstructured grid.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1843566 Development of Multimodal e-Slide Presentation to Support Self-Learning for the Visually Impaired
Authors: Rustam Asnawi, Wan Fatimah Wan Ahmad
Abstract:
Currently electronic slide (e-slide) is one of the most common styles in educational presentation. Unfortunately, the utilization of e-slide for the visually impaired is uncommon since they are unable to see the content of such e-slides which are usually composed of text, images and animation. This paper proposes a model for presenting e-slide in multimodal presentation i.e. using conventional slide concurrent with voicing, in both languages Malay and English. At the design level, live multimedia presentation concept is used, while at the implementation level several components are used. The text content of each slide is extracted using COM component, Microsoft Speech API for voicing the text in English language and the text in Malay language is voiced using dictionary approach. To support the accessibility, an auditory user interface is provided as an additional feature. A prototype of such model named as VSlide has been developed and introduced.
Keywords: presentation, self-learning, slide, visually impaired
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568