Search results for: Text Features.
2010 Classification of Political Affiliations by Reduced Number of Features
Authors: Vesile Evrim, Aliyu Awwal
Abstract:
By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.Keywords: Politics, machine learning, feature selection, LIWC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23122009 Unsupervised Feature Selection Using Feature Density Functions
Authors: Mina Alibeigi, Sattar Hashemi, Ali Hamzeh
Abstract:
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. In this paper, we propose a new unsupervised feature selection method which will remove redundant features from the original feature space by the use of probability density functions of various features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several datasets derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both classification accuracy and the number of selected features.Keywords: Feature, Feature Selection, Filter, Probability Density Function
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20262008 Dynamic Decompression for Text Files
Authors: Ananth Kamath, Ankit Kant, Aravind Srivatsa, Harisha J.A
Abstract:
Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv (LZ) family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. Decompression is also required to retrieve the original data by lossless means. A compression scheme for text files coupled with the principle of dynamic decompression, which decompresses only the section of the compressed text file required by the user instead of decompressing the entire text file. Dynamic decompressed files offer better disk space utilization due to higher compression ratios compared to most of the currently available text file formats.Keywords: Compression, Dynamic Decompression, Text file format, Portable Document Format, Compression Ratio.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17132007 Journals Subheadlines Text Extraction Using Wavelet Thresholding and New Projection Profile
Authors: Davod Zaravi, Habib Rostami, Alireza Malahzaheh, S. S. Mortazavi
Abstract:
In this paper a new robust and efficient algorithm to automatic text extraction from colored book and journal cover sheets is proposed. First, we perform wavelet transform. Next for edge detecting from detail wavelet coefficient, we use dynamic threshold. By blurring approximate coefficients with alternative heuristic thresholding, achieve effective edge,. Afterward, with ROI technique get binary image. Finally text boxes would be extracted with new projection profile.
Keywords: Text extraction, colored cover sheet, wavelet threshold, region of interest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15872006 Text-Mining Approach for Evaluation of Affective Management Practices
Authors: Masaaki Saito, Qin Tang, Hiroyuki Umemuro
Abstract:
The purpose of this paper is to propose a text mining approach to evaluate companies- practices on affective management. Affective management argues that it is critical to take stakeholders- affects into consideration during decision-making process, along with the traditional numerical and rational indices. CSR reports published by companies were collected as source information. Indices were proposed based on the frequency and collocation of words relevant to affective management concept using text mining approach to analyze the text information of CSR reports. In addition, the relationships between the results obtained using proposed indices and traditional indicators of business performance were investigated using correlation analysis. Those correlations were also compared between manufacturing and non-manufacturing companies. The results of this study revealed the possibility to evaluate affective management practices of companies based on publicly available text documents.Keywords: Affective management, Affect, Stakeholder, Text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18042005 Meta-Classification using SVM Classifiers for Text Documents
Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp
Abstract:
Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15652004 Clustering Unstructured Text Documents Using Fading Function
Authors: Pallav Roxy, Durga Toshniwal
Abstract:
Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19392003 A Content Vector Model for Text Classification
Authors: Eric Jiang
Abstract:
As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18432002 Narrative and Expository Text Reading Comprehension by Fourth Grade Spanish-Speaking Children
Authors: Mariela V. De Mier, Veronica S. Sanchez Abchi, Ana M. Borzone
Abstract:
This work aims to explore the factors that have an incidence in reading comprehension process, with different type of texts. In a recent study with 2nd, 3rd and 4th grade children, it was observed that reading comprehension of narrative texts was better than comprehension of expository texts. Nevertheless it seems that not only the type of text but also other textual factors would account for comprehension depending on the cognitive processing demands posed by the text. In order to explore this assumption, three narrative and three expository texts were elaborated with different degree of complexity. A group of 40 fourth grade Spanish-speaking children took part in the study. Children were asked to read the texts and answer orally three literal and three inferential questions for each text. The quantitative and qualitative analysis of children responses showed that children had difficulties in both, narrative and expository texts. The problem was to answer those questions that involved establishing complex relationships among information units that were present in the text or that should be activated from children’s previous knowledge to make an inference. Considering the data analysis, it could be concluded that there is some interaction between the type of text and the cognitive processing load of a specific text.
Keywords: comprehension, textual factors, type of text, processing demands.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13482001 Feature Selection for Web Page Classification Using Swarm Optimization
Authors: B. Leela Devi, A. Sankar
Abstract:
The web’s increased popularity has included a huge amount of information, due to which automated web page classification systems are essential to improve search engines’ performance. Web pages have many features like HTML or XML tags, hyperlinks, URLs and text contents which can be considered during an automated classification process. It is known that Webpage classification is enhanced by hyperlinks as it reflects Web page linkages. The aim of this study is to reduce the number of features to be used to improve the accuracy of the classification of web pages. In this paper, a novel feature selection method using an improved Particle Swarm Optimization (PSO) using principle of evolution is proposed. The extracted features were tested on the WebKB dataset using a parallel Neural Network to reduce the computational cost.
Keywords: Web page classification, WebKB Dataset, Term Frequency-Inverse Document Frequency (TF-IDF), Particle Swarm Optimization (PSO).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32112000 A Talking Head System for Korean Text
Authors: Sang-Wan Kim, Hoon Lee, Kyung-Ho Choi, Soon-Young Park
Abstract:
A talking head system (THS) is presented to animate the face of a speaking 3D avatar in such a way that it realistically pronounces the given Korean text. The proposed system consists of SAPI compliant text-to-speech (TTS) engine and MPEG-4 compliant face animation generator. The input to the THS is a unicode text that is to be spoken with synchronized lip shape. The TTS engine generates a phoneme sequence with their duration and audio data. The TTS applies the coarticulation rules to the phoneme sequence and sends a mouth animation sequence to the face modeler. The proposed THS can make more natural lip sync and facial expression by using the face animation generator than those using the conventional visemes only. The experimental results show that our system has great potential for the implementation of talking head for Korean text.Keywords: Talking head, Lip sync, TTS, MPEG4.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14451999 Word Base Line Detection in Handwritten Text Recognition Systems
Authors: Kamil R. Aida-zade, Jamaladdin Z. Hasanov
Abstract:
An approach is offered for more precise definition of base lines- borders in handwritten cursive text and general problems of handwritten text segmentation have also been analyzed. An offered method tries to solve problems arose in handwritten recognition with specific slant or in other words, where the letters of the words are not on the same vertical line. As an informative features, some recognition systems use ascending and descending parts of the letters, found after the word-s baseline detection. In such recognition systems, problems in baseline detection, impacts the quality of the recognition and decreases the rate of the recognition. Despite other methods, here borders are found by small pieces containing segmentation elements and defined as a set of linear functions. In this method, separate borders for top and bottom border lines are found. At the end of the paper, as a result, azerbaijani cursive handwritten texts written in Latin alphabet by different authors has been analyzed.
Keywords: Azeri, azerbaijani, latin, segmentation, cursive, HWR, handwritten, recognition, baseline, ascender, descender, symbols.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24311998 Techniques with Statistics for Web Page Watermarking
Authors: Mohamed Lahcen BenSaad, Sun XingMing
Abstract:
Information hiding, especially watermarking is a promising technique for the protection of intellectual property rights. This technology is mainly advanced for multimedia but the same has not been done for text. Web pages, like other documents, need a protection against piracy. In this paper, some techniques are proposed to show how to hide information in web pages using some features of the markup language used to describe these pages. Most of the techniques proposed here use the white space to hide information or some varieties of the language in representing elements. Experiments on a very small page and analysis of five thousands web pages show that these techniques have a wide bandwidth available for information hiding, and they might form a solid base to develop a robust algorithm for web page watermarking.Keywords: Digital Watermarking, Information Hiding, Markup Language, Text watermarking, Software Watermarking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17491997 Growing Self Organising Map Based Exploratory Analysis of Text Data
Authors: Sumith Matharage, Damminda Alahakoon
Abstract:
Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.
Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19001996 Ontology-based Concept Weighting for Text Documents
Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt
Abstract:
Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23591995 Parallel Text Processing: Alignment of Indonesian to Javanese Language
Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy
Abstract:
Parallel text alignment is proposed as a way of aligning bahasa Indonesia to words in Javanese. Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).
Keywords: Parallel text alignment, phrase pair combination, edit distance coefficient, Javanese-Indonesian language.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24341994 Performance Evaluation of an Online Text-Based Strategy Game
Authors: Nazleeni S. Haron, Mohd K. Zaime , Izzatdin A. Aziz, Mohd H. Hasan
Abstract:
Text-based game is supposed to be a low resource consumption application that delivers good performances when compared to graphical-intensive type of games. But, nowadays, some of the online text-based games are not offering performances that are acceptable to the users. Therefore, an online text-based game called Star_Quest has been developed in order to analyze its behavior under different performance measurements. Performance metrics such as throughput, scalability, response time and page loading time are captured to yield the performance of the game. The techniques in performing the load testing are also disclosed to exhibit the viability of our work. The comparative assessment between the results obtained and the accepted level of performances are conducted as to determine the performance level of the game. The study reveals that the developed game managed to meet all the performance objectives set forth.Keywords: Online text-based games, performance evaluation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15601993 Speaker Identification using Neural Networks
Authors: R.V Pawar, P.P.Kajave, S.N.Mali
Abstract:
The speech signal conveys information about the identity of the speaker. The area of speaker identification is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as the telephone, financial transactions and information retrieval from speech databases, the utility of automatically identifying a speaker is based solely on vocal characteristic. This paper emphasizes on text dependent speaker identification, which deals with detecting a particular speaker from a known population. The system prompts the user to provide speech utterance. System identifies the user by comparing the codebook of speech utterance with those of the stored in the database and lists, which contain the most likely speakers, could have given that speech utterance. The speech signal is recorded for N speakers further the features are extracted. Feature extraction is done by means of LPC coefficients, calculating AMDF, and DFT. The neural network is trained by applying these features as input parameters. The features are stored in templates for further comparison. The features for the speaker who has to be identified are extracted and compared with the stored templates using Back Propogation Algorithm. Here, the trained network corresponds to the output; the input is the extracted features of the speaker to be identified. The network does the weight adjustment and the best match is found to identify the speaker. The number of epochs required to get the target decides the network performance.Keywords: Average Mean Distance function, Backpropogation, Linear Predictive Coding, MultilayeredPerceptron,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18531992 Web Application to Profiling Scientific Institutions through Citation Mining
Authors: Hector D. Cortes, Jesus A. del Rio, Esther O. Garcia, Miguel Robles
Abstract:
Recently the use of data mining to scientific bibliographic data bases has been implemented to analyze the pathways of the knowledge or the core scientific relevances of a laureated novel or a country. This specific case of data mining has been named citation mining, and it is the integration of citation bibliometrics and text mining. In this paper we present an improved WEB implementation of statistical physics algorithms to perform the text mining component of citation mining. In particular we use an entropic like distance between the compression of text as an indicator of the similarity between them. Finally, we have included the recently proposed index h to characterize the scientific production. We have used this web implementation to identify users, applications and impact of the Mexican scientific institutions located in the State of Morelos.
Keywords: Citation Mining, Text Mining, Science Impact
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16901991 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation
Authors: Mario Kubek, Herwig Unger
Abstract:
Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.Keywords: Search algorithm, centroid, query, keyword, cooccurrence, categorisation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5791990 Connectionist Approach to Generic Text Summarization
Authors: Rajesh S.Prasad, U. V. Kulkarni, Jayashree.R.Prasad
Abstract:
As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance. Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15431989 Hybrid Machine Learning Approach for Text Categorization
Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite
Abstract:
Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.
Keywords: Text categorization, decision trees, neural networks, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17651988 Speech Encryption and Decryption Using Linear Feedback Shift Register (LFSR)
Authors: Tin Lai Win, Nant Christina Kyaw
Abstract:
This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.
Keywords: Linear Feedback Shift Register.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 30661987 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition
Authors: Redouane Tlemsani, Abdelkader Benyettou
Abstract:
Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.
This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.
Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.
In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.
The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.
Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16891986 Relevant LMA Features for Human Motion Recognition
Authors: Insaf Ajili, Malik Mallem, Jean-Yves Didier
Abstract:
Motion recognition from videos is actually a very complex task due to the high variability of motions. This paper describes the challenges of human motion recognition, especially motion representation step with relevant features. Our descriptor vector is inspired from Laban Movement Analysis method. We propose discriminative features using the Random Forest algorithm in order to remove redundant features and make learning algorithms operate faster and more effectively. We validate our method on MSRC-12 and UTKinect datasets.Keywords: Human motion recognition, Discriminative LMA features, random forest, features reduction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7151985 Object Identification with Color, Texture, and Object-Correlation in CBIR System
Authors: Awais Adnan, Muhammad Nawaz, Sajid Anwar, Tamleek Ali, Muhammad Ali
Abstract:
Needs of an efficient information retrieval in recent years in increased more then ever because of the frequent use of digital information in our life. We see a lot of work in the area of textual information but in multimedia information, we cannot find much progress. In text based information, new technology of data mining and data marts are now in working that were started from the basic concept of database some where in 1960. In image search and especially in image identification, computerized system at very initial stages. Even in the area of image search we cannot see much progress as in the case of text based search techniques. One main reason for this is the wide spread roots of image search where many area like artificial intelligence, statistics, image processing, pattern recognition play their role. Even human psychology and perception and cultural diversity also have their share for the design of a good and efficient image recognition and retrieval system. A new object based search technique is presented in this paper where object in the image are identified on the basis of their geometrical shapes and other features like color and texture where object-co-relation augments this search process. To be more focused on objects identification, simple images are selected for the work to reduce the role of segmentation in overall process however same technique can also be applied for other images.Keywords: Object correlation, Geometrical shape, Color, texture, features, contents.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19781984 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis
Authors: Sidi Yang, Haiyi Zhang
Abstract:
Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.
Keywords: Text mining, Twitter, topic model, sentiment analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17311983 A File Splitting Technique for Reducing the Entropy of Text Files
Authors: Abdel-Rahman M. Jaradat, , Mansour I. Irshid, Talha T. Nassar
Abstract:
A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.
Keywords: Bit-wise compression, entropy, file splitting, source mapping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13941982 Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks
Authors: Jalil Z., Farooq M., Zafar H., Sabir M., Ashraf E.
Abstract:
Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author-s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers.Keywords: Copyright protection, Digital watermarking, Document authentication, Information security, Watermark.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21151981 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification
Authors: Bharatendra Rai
Abstract:
Sequences of words in text data have long-term dependencies and are known to suffer from vanishing gradient problem when developing deep learning models. Although recurrent networks such as long short-term memory networks help overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine advantages of long short-term memory networks and convolutional neural networks, can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting of a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning.
Keywords: Convolutional recurrent networks, hyperparameter tuning, long short-term memory networks, Tukey honest significant differences
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25