Search results for: visual text analytics tools
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2241

Search results for: visual text analytics tools

2151 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: Search algorithm, centroid, query, keyword, cooccurrence, categorisation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 576
2150 Digital Watermarking Based on Visual Cryptography and Histogram

Authors: R. Rama Kishore, Sunesh

Abstract:

Nowadays, robust and secure watermarking algorithm and its optimization have been need of the hour. A watermarking algorithm is presented to achieve the copy right protection of the owner based on visual cryptography, histogram shape property and entropy. In this, both host image and watermark are preprocessed. Host image is preprocessed by using Butterworth filter, and watermark is with visual cryptography. Applying visual cryptography on water mark generates two shares. One share is used for embedding the watermark, and the other one is used for solving any dispute with the aid of trusted authority. Usage of histogram shape makes the process more robust against geometric and signal processing attacks. The combination of visual cryptography, Butterworth filter, histogram, and entropy can make the algorithm more robust, imperceptible, and copy right protection of the owner.

Keywords: Butterworth filter, digital watermarking, histogram, visual cryptography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1613
2149 The Effects of Visual Elements and Cognitive Styles on Students Learning in Hypermedia Environment

Authors: Rishi Ruttun

Abstract:

One of the major features of hypermedia learning is its non-linear structure, allowing learners, the opportunity of flexible navigation to accommodate their own needs. Nevertheless, such flexibility can also cause problems such as insufficient navigation and disorientation for some learners, especially those with Field Dependent cognitive styles. As a result students learning performance can be deteriorated and in turn, they can have negative attitudes with hypermedia learning systems. It was suggested that visual elements can be used to compensate dilemmas. However, it is unclear whether these visual elements improve their learning or whether problems still exist. The aim of this study is to investigate the effect of students cognitive styles and visual elements on students learning performance and attitudes in hypermedia learning environment. Cognitive Style Analysis (CSA), Learning outcome in terms of pre and post-test, practical task, and Attitude Questionnaire (AQ) were administered to a sample of 60 university students. The findings revealed that FD students preformed equally to those of FI. Also, FD students experienced more disorientation in the hypermedia learning system where they depend a lot on the visual elements for navigation and orientation purposes. Furthermore, they had more positive attitudes towards the visual elements which escape them from experiencing navigation and disorientation dilemmas. In contrast, FI students were more comfortable, did not get disturbed or did not need some of the visual elements in the hypermedia learning system.

Keywords: Hypermedia learning, cognitive styles, visual elements, support, learning performance, attitudes and perceptions

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1627
2148 Connectionist Approach to Generic Text Summarization

Authors: Rajesh S.Prasad, U. V. Kulkarni, Jayashree.R.Prasad

Abstract:

As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance.

Keywords: Artificial Neural Networks (ANN); Computational Intelligence (CI); Connectionist Text Summarizer ECTS (ECTS); Evolving Connectionist systems; Evolving systems; Fuzzy systems (FS); Part of Speech (POS) disambiguation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542
2147 A Case of Study for 3D Stereoscopic Conversion in Visual Effects Industry

Authors: Jin Zhi

Abstract:

This paper covered a series of key points in terms of 2D to 3D stereoscopic conversion. A successfully applied stereoscopic conversion approach in current visual effects industry was presented. The purpose of this paper is to cover a detailed workflow and concept, which has been successfully used in 3D stereoscopic conversion for feature films in visual effects industry, and therefore to clarify the process in stereoscopic conversion production and provide a clear idea for those entry-level artists to improve an overall understanding of 3D stereoscopic in digital compositing field as well as to the higher education factor of visual effects and hopefully inspire further collaboration and participants particularly between academia and industry.

Keywords: Clean plates, Mattes, Stereoscopic conversion, 3Dprojection, Z-depth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2188
2146 Hybrid Machine Learning Approach for Text Categorization

Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite

Abstract:

Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.

Keywords: Text categorization, decision trees, neural networks, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1761
2145 Speech Encryption and Decryption Using Linear Feedback Shift Register (LFSR)

Authors: Tin Lai Win, Nant Christina Kyaw

Abstract:

This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.

Keywords: Linear Feedback Shift Register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3065
2144 Laser Welded Ni-Cr Dental Alloys Inspection

Authors: Porojan S., Sandu L., Topală F.

Abstract:

Minor problems arising from optimizations by welding of fixed prostheses frameworks can be identified by macroscopic and microscopic visual inspection. The purpose of this study was to highlight the visible discontinuities present in the laser welds of dental Ni-Cr alloys. Ni-Cr base metal alloys designated for fixed prostheses manufacture were selected for the experiments. Using cast plates, preliminary tests were conducted by laser welding. Macroscopic visual inspection was done carefully to assess the defects of the welding rib. Electron microscopy images allowed visualization of small discontinuities, which escapes visual inspection. Making comparison to Ni-Cr alloys taken in the experiment and laser welded, after visual analysis, the best welds appear for Heraenium NA alloy.

Keywords: macroscopic visual inspection, electron microscopyimages, Ni-Cr dental alloys, laser welding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506
2143 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition

Authors: Redouane Tlemsani, Abdelkader Benyettou

Abstract:

Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.

This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.

Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.

In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.

The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.

Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1688
2142 An Improved Algorithm of SPIHT based on the Human Visual Characteristics

Authors: Meng Wang, Qi-rui Han

Abstract:

Because of excellent properties, people has paid more attention to SPIHI algorithm, which is based on the traditional wavelet transformation theory, but it also has its shortcomings. Combined the progress in the present wavelet domain and the human's visual characteristics, we propose an improved algorithm based on human visual characteristics of SPIHT in the base of analysis of SPIHI algorithm. The experiment indicated that the coding speed and quality has been enhanced well compared to the original SPIHT algorithm, moreover improved the quality of the transmission cut off.

Keywords: Lifted wavelet transform, SPIHT, Human Visual Characteristics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1496
2141 Visual and Clinical Outcome in Patients with Corneal Lacerations

Authors: Avantika Verma

Abstract:

In industrialized nations, corneal lacerations are one of the most common reason for hospitalization. This study was designed to study visual and clinical outcome in patients presenting with full thickness corneal lacerations in Indian population and to ascertain the impact of various preoperative and operative factors influencing prognosis after repair of corneal lacerations. Males in third decade with injuries at work with metallic objects were common. Lens damage, hyphema, vitreous hemorrhage, retinal detachment and endophthalmitis were seen. All the patients underwent primary repair within first 24 hours of presentation. At 3 months, 74.3% had a good visual outcome. About 5.7% of patients had no perception of light.In conclusion, various demographic and preoperative factors like age, time of presentation, vision at presentation, length of corneal wound, involvement of visual axis, associated ocular features like hyphaema, lenticular changes, vitreous haemorrhage and retinal detachment are significant prognostic indicators for final visual outcome.

Keywords: Cornea, laceration, visual outcome, wound repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1046
2140 Architecture, Implementation and Application of Tools for Experimental Analysis

Authors: Tom Dowling, Adam Duffy

Abstract:

This paper presents an architecture to assist in the development of tools to perform experimental analysis. Existing implementations of tools based on this architecture are also described in this paper. These tools are applied to the real world problem of fault attack emulation and detection in cryptographic algorithms.

Keywords: Software Architectures and Design, Software Componentsand Reuse, Engineering Secure Software.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1344
2139 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang

Abstract:

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: Text mining, Twitter, topic model, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1728
2138 A File Splitting Technique for Reducing the Entropy of Text Files

Authors: Abdel-Rahman M. Jaradat, , Mansour I. Irshid, Talha T. Nassar

Abstract:

A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.

Keywords: Bit-wise compression, entropy, file splitting, source mapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1392
2137 A Novel Arabic Text Steganography Method Using Letter Points and Extensions

Authors: Adnan Abdul-Aziz Gutub, Manal Mohammad Fattani

Abstract:

This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.

Keywords: Arabic text, Cryptography, Feature coding, Information security, Text steganography, Text watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3458
2136 Exploring Pisa Monuments Using Mobile Augmented Reality

Authors: Mihai Duguleana, Florin Girbacia, Cristian Postelnicu, Raffaello Brodi, Marcello Carrozzino

Abstract:

Augmented Reality (AR) has taken a big leap with the introduction of mobile applications which co-locate bi-dimensional (e.g. photo, video, text) and tridimensional information with the location of the user enriching his/her experience. This study presents the advantages of using Mobile Augmented Reality (MAR) technologies in traveling applications, improving cultural heritage exploration. We propose a location-based AR application which combines co-location with the augmented visual information about Pisa monuments to establish a friendly navigation in this historic city. AR was used to render contextual visual information in the outdoor environment. The developed Android-based application offers two different options: it provides the ability to identify the monuments positioned close to the user’s position and it offers location information for getting near the key touristic objectives. We present the process of creating the monuments’ 3D map database and the navigation algorithm.

Keywords: Augmented reality, electronic compass, GPS, location-based service.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1648
2135 An Improved Method to Watermark Images Sensitive to Blocking Artifacts

Authors: Afzel Noore

Abstract:

A new digital watermarking technique for images that are sensitive to blocking artifacts is presented. Experimental results show that the proposed MDCT based approach produces highly imperceptible watermarked images and is robust to attacks such as compression, noise, filtering and geometric transformations. The proposed MDCT watermarking technique is applied to fingerprints for ensuring security. The face image and demographic text data of an individual are used as multiple watermarks. An AFIS system was used to quantitatively evaluate the matching performance of the MDCT-based watermarked fingerprint. The high fingerprint matching scores show that the MDCT approach is resilient to blocking artifacts. The quality of the extracted face and extracted text images was computed using two human visual system metrics and the results show that the image quality was high.

Keywords: Digital watermarking, data hiding, modified discretecosine transformation (MDCT).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563
2134 Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks

Authors: Jalil Z., Farooq M., Zafar H., Sabir M., Ashraf E.

Abstract:

Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author-s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers.

Keywords: Copyright protection, Digital watermarking, Document authentication, Information security, Watermark.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2110
2133 A Collaborative Framework for Visual Modeling on Web 2.0

Authors: Song Meng, Dianfu Ma, Yongwang Zhao, Jianxin Li

Abstract:

Cooperative visual modeling is more and more necessary in our complicated world. A collaborative environment which supports interactive operation and communication is required to increase work efficiency. We present a collaborative visual modeling framework which collaborative platform could be built on. On this platform, cooperation and communication is available for designers from different regions. This framework, which is different from other collaborative frameworks, contains a uniform message format, a message handling mechanism and other functions such as message pretreatment and Role-Communication-Token Access Control (RCTAC). We also show our implementation of this framework called Orchestra Designer, which support BPLE workflow modeling cooperatively online.

Keywords: colllaborative framework; visual modeling; message handling mechanism

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1507
2132 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification

Authors: Bharatendra Rai

Abstract:

Sequences of words in text data have long-term dependencies and are known to suffer from vanishing gradient problem when developing deep learning models. Although recurrent networks such as long short-term memory networks help overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine advantages of long short-term memory networks and convolutional neural networks, can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting of a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning. 

Keywords: Convolutional recurrent networks, hyperparameter tuning, long short-term memory networks, Tukey honest significant differences

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18
2131 Multiplayer RC-Car Driving System in a Collaborative Augmented Reality Environment

Authors: Kikuo Asai, Yuji Sugimoto

Abstract:

We developed a prototype system for multiplayer RC-car driving in a collaborative augmented reality (AR) environment. The tele-existence environment is constructed by superimposing digital data onto images captured by a camera on an RC-car, enabling players to experience an augmented coexistence of the digital content and the real world. Marker-based tracking was used for estimating position and orientation of the camera. The plural RC-cars can be operated in a field where square markers are arranged. The video images captured by the camera are transmitted to a PC for visual tracking. The RC-cars are also tracked by using an infrared camera attached to the ceiling, so that the instability is reduced in the visual tracking. Multimedia data such as texts and graphics are visualized to be overlaid onto the video images in the geometrically correct manner. The prototype system allows a tele-existence sensation to be augmented in a collaborative AR environment.

Keywords: Multiplayer, RC-car, Collaborative Environment, Augmented Reality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014
2130 Efficient Tools for Managing Uncertainties in Design and Operation of Engineering Structures

Authors: J. Menčík

Abstract:

Actual load, material characteristics and other quantities often differ from the design values. This can cause worse function, shorter life or failure of a civil engineering structure, a machine, vehicle or another appliance. The paper shows main causes of the uncertainties and deviations and presents a systematic approach and efficient tools for their elimination or mitigation of consequences. Emphasis is put on the design stage, which is most important for reliability ensuring. Principles of robust design and important tools are explained, including FMEA, sensitivity analysis and probabilistic simulation methods. The lifetime prediction of long-life objects can be improved by long-term monitoring of the load response and damage accumulation in operation. The condition evaluation of engineering structures, such as bridges, is often based on visual inspection and verbal description. Here, methods based on fuzzy logic can reduce the subjective influences.

Keywords: Design, fuzzy methods, Monte Carlo, reliability, robust design, sensitivity analysis, simulation, uncertainties.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1767
2129 HSV Image Watermarking Scheme Based on Visual Cryptography

Authors: Rawan I. Zaghloul, Enas F. Al-Rawashdeh

Abstract:

In this paper a simple watermarking method for color images is proposed. The proposed method is based on watermark embedding for the histograms of the HSV planes using visual cryptography watermarking. The method has been proved to be robust for various image processing operations such as filtering, compression, additive noise, and various geometrical attacks such as rotation, scaling, cropping, flipping, and shearing.

Keywords: Histogram, HSV image, Visual Cryptography, Watermark.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925
2128 Enhance Construction Visual As-Built Schedule Management Using BIM Technology

Authors: Shu-Hui Jan, Hui-Ping Tserng, Shih-Ping Ho

Abstract:

Construction project control attempts to obtain real-time as-built schedule information and to eliminate project delays by effectively enhancing dynamic schedule control and management. Suitable platforms for enhancing an as-built schedule visually during the construction phase are necessary and important for general contractors. As the application of building information modeling (BIM) becomes more common, schedule management integrated with the BIM approach becomes essential to enhance visual construction management implementation for the general contractor during the construction phase. To enhance visualization of the updated as-built schedule for the general contractor, this study presents a novel system called the Construction BIM-assisted Schedule Management (ConBIM-SM) system for general contractors in Taiwan. The primary purpose of this study is to develop a web ConBIM-SM system for the general contractor to enhance visual as-built schedule information sharing and efficiency in tracking construction as-built schedule. Finally, the ConBIM-SM system is applied to a case study of a commerce building project in Taiwan to verify its efficacy and demonstrate its effectiveness during the construction phase. The advantages of the ConBIM-SM system lie in improved project control and management efficiency for general contractors, and in providing BIM-assisted as-built schedule tracking and management, to access the most current as-built schedule information through a web browser. The case study results show that the ConBIM-SM system is an effective visual as-built schedule management platform integrated with the BIM approach for general contractors in a construction project.

Keywords: BIM, Building information modeling, construction schedule management, as-built schedule management, BIM schedule updating mechanism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3353
2127 Mining Association Rules from Unstructured Documents

Authors: Hany Mahgoub

Abstract:

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.

Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2392
2126 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: Cooccurrence graph, entity relation graph, unstructured text, weighted distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 627
2125 Development of Multimodal e-Slide Presentation to Support Self-Learning for the Visually Impaired

Authors: Rustam Asnawi, Wan Fatimah Wan Ahmad

Abstract:

Currently electronic slide (e-slide) is one of the most common styles in educational presentation. Unfortunately, the utilization of e-slide for the visually impaired is uncommon since they are unable to see the content of such e-slides which are usually composed of text, images and animation. This paper proposes a model for presenting e-slide in multimodal presentation i.e. using conventional slide concurrent with voicing, in both languages Malay and English. At the design level, live multimedia presentation concept is used, while at the implementation level several components are used. The text content of each slide is extracted using COM component, Microsoft Speech API for voicing the text in English language and the text in Malay language is voiced using dictionary approach. To support the accessibility, an auditory user interface is provided as an additional feature. A prototype of such model named as VSlide has been developed and introduced.

Keywords: presentation, self-learning, slide, visually impaired

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1514
2124 The Influence of Preprocessing Parameters on Text Categorization

Authors: Jan Pomikalek, Radim Rehurek

Abstract:

Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in turn require tuning nontrivial internal parameters. Although partial studies are available, many authors fail to report values of the parameters they use in their experiments, or reasons why these values were used instead of others. The goal of this work then is to create a more thorough comparison of preprocessing parameters and their mutual influence, and report interesting observations and results.

Keywords: Text categorization, machine learning, electronic documents, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534
2123 Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden

Abstract:

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.

Keywords: text-to-speech synthesis, prosody modeling, speech user interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1396
2122 Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution

Authors: S. M. Khalessizadeh, R. Zaefarian, S.H. Nasseri, E. Ardil

Abstract:

Today, Genetic Algorithm has been used to solve wide range of optimization problems. Some researches conduct on applying Genetic Algorithm to text classification, summarization and information retrieval system in text mining process. This researches show a better performance due to the nature of Genetic Algorithm. In this paper a new algorithm for using Genetic Algorithm in concept weighting and topic identification, based on concept standard deviation will be explored.

Keywords: Genetic Algorithm, Text Mining, Term Weighting, Concept Extraction, Concept Distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3643