Search results for: Handwritten document verification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 556

Search results for: Handwritten document verification

526 Combining Color and Layout Features for the Identification of Low-resolution Documents

Authors: Ardhendu Behera, Denis Lalanne, Rolf Ingold

Abstract:

This paper proposes a method, combining color and layout features, for identifying documents captured from lowresolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represented. The combined color and layout features are arranged in a symbolic file, which is unique for each document and is called the document-s visual signature. Our identification method first uses the color information in the signatures in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining search space. Finally, our experiment considers slide documents, which are often captured using handheld devices.

Keywords: Document color modeling, document visual signature, kernel density estimation, document identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1340
525 Color and Layout-based Identification of Documents Captured from Handheld Devices

Authors: Ardhendu Behera, Denis Lalanne, Rolf Ingold

Abstract:

This paper proposes a method, combining color and layout features, for identifying documents captured from low-resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represented. Our identification method first uses the color information in the documents in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining of the search space.

Keywords: Document color modeling, document visualsignature, kernel density estimation, document identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1529
524 Human Verification in a Video Surveillance System Using Statistical Features

Authors: Sanpachai Huvanandana

Abstract:

A human verification system is presented in this paper. The system consists of several steps: background subtraction, thresholding, line connection, region growing, morphlogy, star skelatonization, feature extraction, feature matching, and decision making. The proposed system combines an advantage of star skeletonization and simple statistic features. A correlation matching and probability voting have been used for verification, followed by a logical operation in a decision making stage. The proposed system uses small number of features and the system reliability is convincing.

Keywords: Human verification, object recognition, videounderstanding, segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1471
523 Highlighting Document's Structure

Authors: Sylvie Ratté, Wilfried Njomgue, Pierre-André Ménard

Abstract:

In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).

Keywords: Information retrieval, document structures, symbolic grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1200
522 Localizing and Recognizing Integral Pitches of Cheque Document Images

Authors: Bremananth R., Veerabadran C. S., Andy W. H. Khong

Abstract:

Automatic reading of handwritten cheque is a computationally complex process and it plays an important role in financial risk management. Machine vision and learning provide a viable solution to this problem. Research effort has mostly been focused on recognizing diverse pitches of cheques and demand drafts with an identical outline. However most of these methods employ templatematching to localize the pitches and such schemes could potentially fail when applied to different types of outline maintained by the bank. In this paper, the so-called outline problem is resolved by a cheque information tree (CIT), which generalizes the localizing method to extract active-region-of-entities. In addition, the weight based density plot (WBDP) is performed to isolate text entities and read complete pitches. Recognition is based on texture features using neural classifiers. Legal amount is subsequently recognized by both texture and perceptual features. A post-processing phase is invoked to detect the incorrect readings by Type-2 grammar using the Turing machine. The performance of the proposed system was evaluated using cheque and demand drafts of 22 different banks. The test data consists of a collection of 1540 leafs obtained from 10 different account holders from each bank. Results show that this approach can easily be deployed without significant design amendments.

Keywords: Cheque reading, Connectivity checking, Text localization, Texture analysis, Turing machine, Signature verification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
521 Collaborative Document Evaluation: An Alternative Approach to Classic Peer Review

Authors: J. Beel, B. Gipp

Abstract:

Research papers are usually evaluated via peer review. However, peer review has limitations in evaluating research papers. In this paper, Scienstein and the new idea of 'collaborative document evaluation' are presented. Scienstein is a project to evaluate scientific papers collaboratively based on ratings, links, annotations and classifications by the scientific community using the internet. In this paper, critical success factors of collaborative document evaluation are analyzed. That is the scientists- motivation to participate as reviewers, the reviewers- competence and the reviewers- trustworthiness. It is shown that if these factors are ensured, collaborative document evaluation may prove to be a more objective, faster and less resource intensive approach to scientific document evaluation in comparison to the classical peer review process. It is shown that additional advantages exist as collaborative document evaluation supports interdisciplinary work, allows continuous post-publishing quality assessments and enables the implementation of academic recommendation engines. In the long term, it seems possible that collaborative document evaluation will successively substitute peer review and decrease the need for journals.

Keywords: Peer Review, Alternative, Collaboration, Document Evaluation, Rating, Annotations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
520 Data Extraction of XML Files using Searching and Indexing Techniques

Authors: Sushma Satpute, Vaishali Katkar, Nilesh Sahare

Abstract:

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

Keywords: XML Retrieval, Indexed Search, Information Retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
519 Skew Detection Technique for Binary Document Images based on Hough Transform

Authors: Manjunath Aradhya V N, Hemantha Kumar G, Shivakumara P

Abstract:

Document image processing has become an increasingly important technology in the automation of office documentation tasks. During document scanning, skew is inevitably introduced into the incoming document image. Since the algorithm for layout analysis and character recognition are generally very sensitive to the page skew. Hence, skew detection and correction in document images are the critical steps before layout analysis. In this paper, a novel skew detection method is presented for binary document images. The method considered the some selected characters of the text which may be subjected to thinning and Hough transform to estimate skew angle accurately. Several experiments have been conducted on various types of documents such as documents containing English Documents, Journals, Text-Book, Different Languages and Document with different fonts, Documents with different resolutions, to reveal the robustness of the proposed method. The experimental results revealed that the proposed method is accurate compared to the results of well-known existing methods.

Keywords: Optical Character Recognition, Skew angle, Thinning, Hough transform, Document processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2068
518 Ottoman Script Recognition Using Hidden Markov Model

Authors: Ayşe Onat, Ferruh Yildiz, Mesut Gündüz

Abstract:

In this study, an OCR system for segmentation, feature extraction and recognition of Ottoman Scripts has been developed using handwritten characters. Detection of handwritten characters written by humans is a difficult process. Segmentation and feature extraction stages are based on geometrical feature analysis, followed by the chain code transformation of the main strokes of each character. The output of segmentation is well-defined segments that can be fed into any classification approach. The classes of main strokes are identified through left-right Hidden Markov Model (HMM).

Keywords: Chain Code, HMM, Ottoman Script Recognition, OCR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2263
517 Physical Verification Flow on Multiple Foundries

Authors: R. Abdul Wahab, R. Mohd Fuad Tengku Aziz, N. Othman, S. Saleh, N. Razali, M. Al Baqir Zinal Abidin, M. Hanif Md Nasir

Abstract:

This paper will discuss how we optimize our physical verification flow in our IC Design Department having various rule decks from multiple foundries. Our ultimate goal is to achieve faster time to tape-out and avoid schedule delay. Currently the physical verification runtimes and memory usage have drastically increased with the increasing number of design rules, design complexity, and the size of the chips to be verified. To manage design violations, we use a number of solutions to reduce the amount of violations needed to be checked by physical verification engineers. The most important functions in physical verifications are DRC (design rule check), LVS (layout vs. schematic), and XRC (extraction). Since we have a multiple number of foundries for our design tape-outs, we need a flow that improve the overall turnaround time and ease of use of the physical verification process. The demand for fast turnaround time is even more critical since the physical design is the last stage before sending the layout to the foundries.

Keywords: Physical verification, DRC, LVS, XRC, flow, foundry, runset.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3180
516 Persian/Arabic Document Segmentation Based On Pyramidal Image Structure

Authors: Seyyed Yasser Hashemi, Khalil Monfaredi

Abstract:

Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus in most previously reported methods it is inevitable to include these parameters. This problem becomes excessively acute and severe, especially in Persian/Arabic documents. Since the Persian/Arabic scripts differ considerably from the English scripts, most of the proposed methods for the English scripts do not render good results for the Persian scripts. In this paper, we present a novel parameter-free method for segmenting the Persian/Arabic document images which also works well for English scripts. This method segments the document image into maximal homogeneous regions and identifies them as texts and non-texts based on a pyramidal image structure. In other words the proposed method is capable of document segmentation without considering the character font sizes, text line spacing, and document layout structures. This algorithm is examined for 150 Arabic/Persian and English documents and document segmentation process are done successfully for 96 percent of documents.

Keywords: Persian/Arabic document, document segmentation, Pyramidal Image Structure, skew detection and correction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
515 A New Pattern for Handwritten Persian/Arabic Digit Recognition

Authors: A. Harifi, A. Aghagolzadeh

Abstract:

The main problem for recognition of handwritten Persian digits using Neural Network is to extract an appropriate feature vector from image matrix. In this research an asymmetrical segmentation pattern is proposed to obtain the feature vector. This pattern can be adjusted as an optimum model thanks to its one degree of freedom as a control point. Since any chosen algorithm depends on digit identity, a Neural Network is used to prevail over this dependence. Inputs of this Network are the moment of inertia and the center of gravity which do not depend on digit identity. Recognizing the digit is carried out using another Neural Network. Simulation results indicate the high recognition rate of 97.6% for new introduced pattern in comparison to the previous models for recognition of digits.

Keywords: Pattern recognition, Persian digits, NeuralNetwork.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646
514 Online Signature Verification Using Angular Transformation for e-Commerce Services

Authors: Peerapong Uthansakul, Monthippa Uthansakul

Abstract:

The rapid growth of e-Commerce services is significantly observed in the past decade. However, the method to verify the authenticated users still widely depends on numeric approaches. A new search on other verification methods suitable for online e-Commerce is an interesting issue. In this paper, a new online signature-verification method using angular transformation is presented. Delay shifts existing in online signatures are estimated by the estimation method relying on angle representation. In the proposed signature-verification algorithm, all components of input signature are extracted by considering the discontinuous break points on the stream of angular values. Then the estimated delay shift is captured by comparing with the selected reference signature and the error matching can be computed as a main feature used for verifying process. The threshold offsets are calculated by two types of error characteristics of the signature verification problem, False Rejection Rate (FRR) and False Acceptance Rate (FAR). The level of these two error rates depends on the decision threshold chosen whose value is such as to realize the Equal Error Rate (EER; FAR = FRR). The experimental results show that through the simple programming, employed on Internet for demonstrating e-Commerce services, the proposed method can provide 95.39% correct verifications and 7% better than DP matching based signature-verification method. In addition, the signature verification with extracting components provides more reliable results than using a whole decision making.

Keywords: Online signature verification, e-Commerce services, Angular transformation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543
513 Hand Written Digit Recognition by Multiple Classifier Fusion based on Decision Templates Approach

Authors: Reza Ebrahimpour, Samaneh Hamedi

Abstract:

Classifier fusion may generate more accurate classification than each of the basic classifiers. Fusion is often based on fixed combination rules like the product, average etc. This paper presents decision templates as classifier fusion method for the recognition of the handwritten English and Farsi numerals (1-9). The process involves extracting a feature vector on well-known image databases. The extracted feature vector is fed to multiple classifier fusion. A set of experiments were conducted to compare decision templates (DTs) with some combination rules. Results from decision templates conclude 97.99% and 97.28% for Farsi and English handwritten digits.

Keywords: Decision templates, multi-layer perceptron, characteristics Loci, principle component analysis (PCA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1922
512 Computer Verification in Cryptography

Authors: Markus Kaiser, Johannes Buchmann

Abstract:

In this paper we explore the application of a formal proof system to verification problems in cryptography. Cryptographic properties concerning correctness or security of some cryptographic algorithms are of great interest. Beside some basic lemmata, we explore an implementation of a complex function that is used in cryptography. More precisely, we describe formal properties of this implementation that we computer prove. We describe formalized probability distributions (o--algebras, probability spaces and condi¬tional probabilities). These are given in the formal language of the formal proof system Isabelle/HOL. Moreover, we computer prove Bayes' Formula. Besides we describe an application of the presented formalized probability distributions to cryptography. Furthermore, this paper shows that computer proofs of complex cryptographic functions are possible by presenting an implementation of the Miller- Rabin primality test that admits formal verification. Our achievements are a step towards computer verification of cryptographic primitives. They describe a basis for computer verification in cryptography. Computer verification can be applied to further problems in crypto-graphic research, if the corresponding basic mathematical knowledge is available in a database.

Keywords: prime numbers, primality tests, (conditional) proba¬bility distributions, formal proof system, higher-order logic, formal verification, Bayes' Formula, Miller-Rabin primality test.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2150
511 Evolving Neural Networks using Moment Method for Handwritten Digit Recognition

Authors: H. El Fadili, K. Zenkouar, H. Qjidaa

Abstract:

This paper proposes a neural network weights and topology optimization using genetic evolution and the backpropagation training algorithm. The proposed crossover and mutation operators aims to adapt the networks architectures and weights during the evolution process. Through a specific inheritance procedure, the weights are transmitted from the parents to their offsprings, which allows re-exploitation of the already trained networks and hence the acceleration of the global convergence of the algorithm. In the preprocessing phase, a new feature extraction method is proposed based on Legendre moments with the Maximum entropy principle MEP as a selection criterion. This allows a global search space reduction in the design of the networks. The proposed method has been applied and tested on the well known MNIST database of handwritten digits.

Keywords: Genetic algorithm, Legendre Moments, MEP, Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1636
510 A Scheme of Model Verification of the Concurrent Discrete Wavelet Transform (DWT) for Image Compression

Authors: Kamrul Hasan Talukder, Koichi Harada

Abstract:

The scientific community has invested a great deal of effort in the fields of discrete wavelet transform in the last few decades. Discrete wavelet transform (DWT) associated with the vector quantization has been proved to be a very useful tool for the compression of image. However, the DWT is very computationally intensive process requiring innovative and computationally efficient method to obtain the image compression. The concurrent transformation of the image can be an important solution to this problem. This paper proposes a model of concurrent DWT for image compression. Additionally, the formal verification of the model has also been performed. Here the Symbolic Model Verifier (SMV) has been used as the formal verification tool. The system has been modeled in SMV and some properties have been verified formally.

Keywords: Computation Tree Logic, Discrete WaveletTransform, Formal Verification, Image Compression, Symbolic Model Verifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1718
509 Automatic Verification Technology of Virtual Machine Software Patch on IaaS Cloud

Authors: Yoji Yamato

Abstract:

In this paper, we propose an automatic verification technology of software patches for user virtual environments on IaaS Cloud to decrease verification costs of patches. In these days, IaaS services have been spread and many users can customize virtual machines on IaaS Cloud like their own private servers. Regarding to software patches of OS or middleware installed on virtual machines, users need to adopt and verify these patches by themselves. This task increases operation costs of users. Our proposed method replicates user virtual environments, extracts verification test cases for user virtual environments from test case DB, distributes patches to virtual machines on replicated environments and conducts those test cases automatically on replicated environments. We have implemented the proposed method on OpenStack using Jenkins and confirmed the feasibility. Using the implementation, we confirmed the effectiveness of test case creation efforts by our proposed idea of 2-tier abstraction of software functions and test cases. We also evaluated the automatic verification performance of environment replications, test cases extractions and test cases conductions.

Keywords: OpenStack, Cloud Computing, Automatic verification, Jenkins.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2126
508 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1085
507 OHASD: The First On-Line Arabic Sentence Database Handwritten on Tablet PC

Authors: Randa I. M. Elanwar, Mohsen A. Rashwan, Samia A. Mashali

Abstract:

In this paper we present the first Arabic sentence dataset for on-line handwriting recognition written on tablet pc. The dataset is natural, simple and clear. Texts are sampled from daily newspapers. To collect naturally written handwriting, forms are dictated to writers. The current version of our dataset includes 154 paragraphs written by 48 writers. It contains more than 3800 words and more than 19,400 characters. Handwritten texts are mainly written by researchers from different research centers. In order to use this dataset in a recognition system word extraction is needed. In this paper a new word extraction technique based on the Arabic handwriting cursive nature is also presented. The technique is applied to this dataset and good results are obtained. The results can be considered as a bench mark for future research to be compared with.

Keywords: Arabic, Handwriting recognition, on-line dataset.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2026
506 Multilevel Classifiers in Recognition of Handwritten Kannada Numerals

Authors: Dinesh Acharya U., N. V. Subba Reddy, Krishnamoorthi Makkithaya

Abstract:

The recognition of handwritten numeral is an important area of research for its applications in post office, banks and other organizations. This paper presents automatic recognition of handwritten Kannada numerals based on structural features. Five different types of features, namely, profile based 10-segment string, water reservoir; vertical and horizontal strokes, end points and average boundary length from the minimal bounding box are used in the recognition of numeral. The effect of each feature and their combination in the numeral classification is analyzed using nearest neighbor classifiers. It is common to combine multiple categories of features into a single feature vector for the classification. Instead, separate classifiers can be used to classify based on each visual feature individually and the final classification can be obtained based on the combination of separate base classification results. One popular approach is to combine the classifier results into a feature vector and leaving the decision to next level classifier. This method is extended to extract a better information, possibility distribution, from the base classifiers in resolving the conflicts among the classification results. Here, we use fuzzy k Nearest Neighbor (fuzzy k-NN) as base classifier for individual feature sets, the results of which together forms the feature vector for the final k Nearest Neighbor (k-NN) classifier. Testing is done, using different features, individually and in combination, on a database containing 1600 samples of different numerals and the results are compared with the results of different existing methods.

Keywords: Fuzzy k Nearest Neighbor, Multiple Classifiers, Numeral Recognition, Structural features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1715
505 A Study on Finding Similar Document with Multiple Categories

Authors: R. Saraçoğlu, N. Allahverdi

Abstract:

Searching similar documents and document management subjects have important place in text mining. One of the most important parts of similar document research studies is the process of classifying or clustering the documents. In this study, a similar document search approach that includes discussion of out the case of belonging to multiple categories (multiple categories problem) has been carried. The proposed method that based on Fuzzy Similarity Classification (FSC) has been compared with Rocchio algorithm and naive Bayes method which are widely used in text mining. Empirical results show that the proposed method is quite successful and can be applied effectively. For the second stage, multiple categories vector method based on information of categories regarding to frequency of being seen together has been used. Empirical results show that achievement is increased almost two times, when proposed method is compared with classical approach.

Keywords: Document similarity, Fuzzy classification, Multiple categories, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671
504 Research on Applying the Continuity Care Document to Generate a Medical Record with Entry Level

Authors: Hsing-Yi Kao, Der-Ming Liou

Abstract:

Transferring patient information between medical care sites is necessary to deliver better patient care and to reduce medical cost. So developing of electronic medical records is an important trend for the world.The Continuity of Care Document (CCD) is product of collaboration between CDA and CCR standards. In this study, we will develop a system to generate medical records with entry level based on CCD template module.

Keywords: Continuity Care Document, medical record, entrylevel

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954
503 Handwritten Character Recognition Using Multiscale Neural Network Training Technique

Authors: Velappa Ganapathy, Kok Leong Liew

Abstract:

Advancement in Artificial Intelligence has lead to the developments of various “smart" devices. Character recognition device is one of such smart devices that acquire partial human intelligence with the ability to capture and recognize various characters in different languages. Firstly multiscale neural training with modifications in the input training vectors is adopted in this paper to acquire its advantage in training higher resolution character images. Secondly selective thresholding using minimum distance technique is proposed to be used to increase the level of accuracy of character recognition. A simulator program (a GUI) is designed in such a way that the characters can be located on any spot on the blank paper in which the characters are written. The results show that such methods with moderate level of training epochs can produce accuracies of at least 85% and more for handwritten upper case English characters and numerals.

Keywords: Character recognition, multiscale, backpropagation, neural network, minimum distance technique.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883
502 A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, feature selection, information gain, text categorization, text representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2029
501 Information Filtering using Index Word Selection based on the Topics

Authors: Takeru YOKOI, Hidekazu YANAGIMOTO, Sigeru OMATU

Abstract:

We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user-s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.

Keywords: Information Filtering, Sparse NMF, Index wordSelection, User Profile, Chi-squared Measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1415
500 A Proposed Approach for Emotion Lexicon Enrichment

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Keywords: Document analysis, sentimental analysis, emotion detection, WEKA tool, NRC Lexicon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1414
499 Verification of Protocol Design using UML - SMV

Authors: Prashanth C.M., K. Chandrashekar Shet

Abstract:

In recent past, the Unified Modeling Language (UML) has become the de facto industry standard for object-oriented modeling of the software systems. The syntax and semantics rich UML has encouraged industry to develop several supporting tools including those capable of generating deployable product (code) from the UML models. As a consequence, ensuring the correctness of the model/design has become challenging and extremely important task. In this paper, we present an approach for automatic verification of protocol model/design. As a case study, Session Initiation Protocol (SIP) design is verified for the property, “the CALLER will not converse with the CALLEE before the connection is established between them ". The SIP is modeled using UML statechart diagrams and the desired properties are expressed in temporal logic. Our prototype verifier “UML-SMV" is used to carry out the verification. We subjected an erroneous SIP model to the UML-SMV, the verifier could successfully detect the error (in 76.26ms) and generate the error trace.

Keywords: Unified Modeling Language, Statechart, Verification, Protocol Design, Model Checking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1825
498 Verification and Validation for Java Classes using Design by Contract. The Modular External Approach

Authors: Dario Ramirez de Leon, Oscar Chavez Bosquez, Julian J. Francisco Leon

Abstract:

Since the conception of JML, many tools, applications and implementations have been done. In this context, the users or developers who want to use JML seem surounded by many of these tools, applications and so on. Looking for a common infrastructure and an independent language to provide a bridge between these tools and JML, we developed an approach to embedded contracts in XML for Java: XJML. This approach offer us the ability to separate preconditions, posconditions and class invariants using JML and XML, so we made a front-end which can process Runtime Assertion Checking, Extended Static Checking and Full Static Program Verification. Besides, the capabilities for this front-end can be extended and easily implemented thanks to XML. We believe that XJML is an easy way to start the building of a Graphic User Interface delivering in this way a friendly and IDE independency to developers community wich want to work with JML.

Keywords: Model checking, verification and validation, JML, XML, java, runtime assertion checking, extended static checking, full static program verification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545
497 Off-Line Hand Written Thai Character Recognition using Ant-Miner Algorithm

Authors: P. Phokharatkul, K. Sankhuangaw, S. Somkuarnpanit, S. Phaiboon, C. Kimpan

Abstract:

Much research into handwritten Thai character recognition have been proposed, such as comparing heads of characters, Fuzzy logic and structure trees, etc. This paper presents a system of handwritten Thai character recognition, which is based on the Ant-minor algorithm (data mining based on Ant colony optimization). Zoning is initially used to determine each character. Then three distinct features (also called attributes) of each character in each zone are extracted. The attributes are Head zone, End point, and Feature code. All attributes are used for construct the classification rules by an Ant-miner algorithm in order to classify 112 Thai characters. For this experiment, the Ant-miner algorithm is adapted, with a small change to increase the recognition rate. The result of this experiment is a 97% recognition rate of the training set (11200 characters) and 82.7% recognition rate of unseen data test (22400 characters).

Keywords: Hand written, Thai character recognition, Ant-mineralgorithm, distinct feature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1897