Search results for: Text Processing
1934 Signal Driven Sampling and Filtering a Promising Approach for Time Varying Signals Processing
Authors: Saeed Mian Qaisar, Laurent Fesquet, Marc Renaudin
Abstract:
The mobile systems are powered by batteries. Reducing the system power consumption is a key to increase its autonomy. It is known that mostly the systems are dealing with time varying signals. Thus, we aim to achieve power efficiency by smartly adapting the system processing activity in accordance with the input signal local characteristics. It is done by completely rethinking the processing chain, by adopting signal driven sampling and processing. In this context, a signal driven filtering technique, based on the level crossing sampling is devised. It adapts the sampling frequency and the filter order by analysing the input signal local variations. Thus, it correlates the processing activity with the signal variations. It leads towards a drastic computational gain of the proposed technique compared to the classical one.Keywords: Level Crossing Sampling, Activity Selection, Adaptive Rate Filtering, Computational Complexity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13611933 A New Recognition Scheme for Machine- Printed Arabic Texts based on Neural Networks
Authors: Z. Shaaban
Abstract:
This paper presents a new approach to tackle the problem of recognizing machine-printed Arabic texts. Because of the difficulty of recognizing cursive Arabic words, the text has to be normalized and segmented to be ready for the recognition stage. The new scheme for recognizing Arabic characters depends on multiple parallel neural networks classifier. The classifier has two phases. The first phase categories the input character into one of eight groups. The second phase classifies the character into one of the Arabic character classes in the group. The system achieved high recognition rate.
Keywords: Neural Networks, character recognition, feature extraction, multiple networks, Arabic text.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14771932 Adaptive Kernel Filtering Used in Video Processing
Authors: Rasmus Engholm, Eva B. Vedel Jensen, Henrik Karstoft
Abstract:
In this paper we present a noise reduction filter for video processing. It is based on the recently proposed two dimensional steering kernel, extended to three dimensions and further augmented to suit the spatial-temporal domain of video processing. Two alternative filters are proposed - the time symmetric kernel and the time asymmetric kernel. The first reduces the noise on single sequences, but to handle the problems at scene shift the asymmetric kernel is introduced. The performance of both are tested on simulated data and on a real video sequence together with the existing steering kernel. The proposed kernels improves the Rooted Mean Squared Error (RMSE) compared to the original steering kernel method on video material.
Keywords: Adaptive image filtering, noise reduction, kernel methods, video processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14701931 The Challenges of Hyper-Textual Learning Approach for Religious Education
Authors: Elham Shirvani–Ghadikolaei, Seyed Mahdi Sajjadi
Abstract:
State of the art technology has the tremendous impact on our life, in this situation education system have been influenced as well as. In this paper, tried to compare two space of learning text and hypertext with each other, and some challenges of using hypertext in religious education. Regarding the fact that, hypertext is an undeniable part of learning in this world and it has highly beneficial for the education process from class to office and home. In this paper tried to solve this question: the consequences and challenges of applying hypertext in religious education. Also, the consequences of this survey demonstrate the role of curriculum designer and planner of education to solve this problem.
Keywords: Hyper-textual, education, religious text, religious education.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14181930 Thematic Role Extraction Using Shallow Parsing
Authors: Mehrnoush Shamsfard, Maryam Sadr Mousavi
Abstract:
Extracting thematic (semantic) roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a rule-based approach to extract semantic roles from Persian sentences. The system exploits a twophase architecture to (1) identify the arguments and (2) label them for each predicate. For the first phase we developed a rule based shallow parser to chunk Persian sentences and for the second phase we developed a knowledge-based system to assign 16 selected thematic roles to the chunks. The experimental results of testing each phase are shown at the end of the paper.Keywords: Natural Language Processing, Semantic RoleLabeling, Shallow parsing, Thematic Roles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20271929 Experimental Design and Performance Analysis in Plasma Arc Surface Hardening
Authors: M.I.S. Ismail, Z. Taha
Abstract:
In this paper, the experimental design of using the Taguchi method is employed to optimize the processing parameters in the plasma arc surface hardening process. The processing parameters evaluated are arc current, scanning velocity and carbon content of steel. In addition, other significant effects such as the relation between processing parameters are also investigated. An orthogonal array, signal-to-noise (S/N) ratio and analysis of variance (ANOVA) are employed to investigate the effects of these processing parameters. Through this study, not only the hardened depth increased and surface roughness improved, but also the parameters that significantly affect the hardening performance are identified. Experimental results are provided to verify the effectiveness of this approach.Keywords: Plasma arc, hardened depth, surface roughness, Taguchi method, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23601928 A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features
Authors: Farshideh Einsele, Rolf Ingold
Abstract:
Current OCR technology does not allow to accurately recognizing small text images, such as those found in web images. Our goal is to investigate new approaches to recognize very low resolution text images containing antialiased character shapes. This paper presents a preliminary study on the variability of such characters and the feasibility to discriminate them by using geometrical features. In a first stage we analyze the distribution of these features. In a second stage we present a study on the discriminative power for recognizing isolated characters, using various rendering methods and font properties. Finally we present interesting results of our evaluation tests leading to our conclusion and future focus.Keywords: World Wide Web, document analysis, pattern recognition, Optical Character Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13711927 Thermo-Mechanical Treatment of Chromium Alloyed Low Carbon Steel
Authors: L. Kučerová, M. Bystrianský, V. Kotěšovec
Abstract:
Thermo-mechanical processing with various processing parameters was applied to 0.2%C-0.6%Mn-2S%i-0.8%Cr low alloyed high strength steel. The aim of the processing was to achieve the microstructures typical for transformation induced plasticity (TRIP) steels. Thermo-mechanical processing used in this work incorporated two or three deformation steps. The deformations were in all the cases carried out during the cooling from soaking temperatures to various bainite hold temperatures. In this way, 4-10% of retained austenite were retained in the final microstructures, consisting further of ferrite, bainite, martensite and pearlite. The complex character of TRIP steel microstructure is responsible for its good strength and ductility. The strengths achieved in this work were in the range of 740 MPa – 836 MPa with ductility A5mm of 31-41%.Keywords: Pearlite, retained austenite, thermo-mechanical treatment, TRIP steel.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8491926 Component-based Segmentation of Words from Handwritten Arabic Text
Authors: Jawad H AlKhateeb, Jianmin Jiang, Jinchang Ren, Stan S Ipson
Abstract:
Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.Keywords: Arabic OCR, off-line recognition, Baseline estimation, Word segmentation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22061925 Characterization of 3D-MRP for Analyzing of Brain Balancing Index (BBI) Pattern
Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan
Abstract:
This paper discusses on power spectral density (PSD) characteristics which are extracted from three-dimensional (3D) electroencephalogram (EEG) models. The EEG signal recording was conducted on 150 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, the values of maximum PSD were extracted as features from the model. These features are analyzed using mean relative power (MRP) and different mean relative power (DMRP) technique to observe the pattern among different brain balancing indexes. The results showed that by implementing these techniques, the pattern of brain balancing indexes can be clearly observed. Some patterns are indicates between index 1 to index 5 for left frontal (LF) and right frontal (RF).
Keywords: Power spectral density, 3D EEG model, brain balancing, mean relative power, different mean relative power.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19151924 Emotions in Health Tweets: Analysis of American Government Official Accounts
Authors: García López
Abstract:
The Government Departments of Health have the task of informing and educating citizens about public health issues. For this, they use channels like Twitter, key in the search for health information and the propagation of content. The tweets, important in the virality of the content, may contain emotions that influence the contagion and exchange of knowledge. The goal of this study is to perform an analysis of the emotional projection of health information shared on Twitter by official American accounts: the disease control account CDCgov, National Institutes of Health, NIH, the government agency HHSGov, and the professional organization PublicHealth. For this, we used Tone Analyzer, an International Business Machines Corporation (IBM) tool specialized in emotion detection in text, corresponding to the categorical model of emotion representation. For 15 days, all tweets from these accounts were analyzed with the emotional analysis tool in text. The results showed that their tweets contain an important emotional load, a determining factor in the success of their communications. This exposes that official accounts also use subjective language and contain emotions. The predominance of emotion joy over sadness and the strong presence of emotions in their tweets stimulate the virality of content, a key in the work of informing that government health departments have.
Keywords: Emotions in tweets emotion detection in text, health information on Twitter, American health official accounts, emotions on Twitter, emotions and content.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6971923 Techniques with Statistics for Web Page Watermarking
Authors: Mohamed Lahcen BenSaad, Sun XingMing
Abstract:
Information hiding, especially watermarking is a promising technique for the protection of intellectual property rights. This technology is mainly advanced for multimedia but the same has not been done for text. Web pages, like other documents, need a protection against piracy. In this paper, some techniques are proposed to show how to hide information in web pages using some features of the markup language used to describe these pages. Most of the techniques proposed here use the white space to hide information or some varieties of the language in representing elements. Experiments on a very small page and analysis of five thousands web pages show that these techniques have a wide bandwidth available for information hiding, and they might form a solid base to develop a robust algorithm for web page watermarking.Keywords: Digital Watermarking, Information Hiding, Markup Language, Text watermarking, Software Watermarking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17941922 Compression of Semistructured Documents
Authors: Leo Galambos, Jan Lansky, Katsiaryna Chernik
Abstract:
EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratioKeywords: Compression, search engine, HTML, XML.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15771921 A Combined Cipher Text Policy Attribute-Based Encryption and Timed-Release Encryption Method for Securing Medical Data in Cloud
Authors: G. Shruthi, Purohit Shrinivasacharya
Abstract:
The biggest problem in cloud is securing an outsourcing data. A cloud environment cannot be considered to be trusted. It becomes more challenging when outsourced data sources are managed by multiple outsourcers with different access rights. Several methods have been proposed to protect data confidentiality against the cloud service provider to support fine-grained data access control. We propose a method with combined Cipher Text Policy Attribute-based Encryption (CP-ABE) and Timed-release encryption (TRE) secure method to control medical data storage in public cloud.Keywords: Attribute, encryption, security, trapdoor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7591920 Event Information Extraction System (EIEE): FSM vs HMM
Authors: Shaukat Wasi, Zubair A. Shaikh, Sajid Qasmi, Hussain Sachwani, Rehman Lalani, Aamir Chagani
Abstract:
Automatic Extraction of Event information from social text stream (emails, social network sites, blogs etc) is a vital requirement for many applications like Event Planning and Management systems and security applications. The key information components needed from Event related text are Event title, location, participants, date and time. Emails have very unique distinctions over other social text streams from the perspective of layout and format and conversation style and are the most commonly used communication channel for broadcasting and planning events. Therefore we have chosen emails as our dataset. In our work, we have employed two statistical NLP methods, named as Finite State Machines (FSM) and Hidden Markov Model (HMM) for the extraction of event related contextual information. An application has been developed providing a comparison among the two methods over the event extraction task. It comprises of two modules, one for each method, and works for both bulk as well as direct user input. The results are evaluated using Precision, Recall and F-Score. Experiments show that both methods produce high performance and accuracy, however HMM was good enough over Title extraction and FSM proved to be better for Venue, Date, and time.Keywords: Emails, Event Extraction, Event Detection, Finite state machines, Hidden Markov Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23171919 Designing Ontology-Based Knowledge Integration for Preprocessing of Medical Data in Enhancing a Machine Learning System for Coding Assignment of a Multi-Label Medical Text
Authors: Phanu Waraporn
Abstract:
This paper discusses the designing of knowledge integration of clinical information extracted from distributed medical ontologies in order to ameliorate a machine learning-based multilabel coding assignment system. The proposed approach is implemented using a decision tree technique of the machine learning on the university hospital data for patients with Coronary Heart Disease (CHD). The preliminary results obtained show a satisfactory finding that the use of medical ontologies improves the overall system performance.
Keywords: Medical Ontology, Knowledge Integration, Machine Learning, Medical Coding, Text Assignment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18501918 The Effects of RCA Clean Variables on Particle Removal Efficiency
Authors: Siti Kudnie Sahari, Jane Chai Hai Sing, Khairuddin Ab. Hamid
Abstract:
Shrunken patterning for integrated device manufacturing requires surface cleanliness and surface smoothness in wet chemical processing [1]. It is necessary to control all process parameters perfectly especially for the common cleaning technique RCA clean (SC-1 and SC-2) [2]. In this paper the characteristic and effect of surface preparation parameters are discussed. The properties of RCA wet chemical processing in silicon technology is based on processing time, temperature, concentration and megasonic power of SC-1 and QDR. An improvement of wafer surface preparation by the enhanced variables of the wet cleaning chemical process is proposed.Keywords: RCA, SC-1, SC-2, QDR
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32421917 Rough Neural Networks in Adapting Cellular Automata Rule for Reducing Image Noise
Authors: Yasser F. Hassan
Abstract:
The reduction or removal of noise in a color image is an essential part of image processing, whether the final information is used for human perception or for an automatic inspection and analysis. This paper describes the modeling system based on the rough neural network model to adaptive cellular automata for various image processing tasks and noise remover. In this paper, we consider the problem of object processing in colored image using rough neural networks to help deriving the rules which will be used in cellular automata for noise image. The proposed method is compared with some classical and recent methods. The results demonstrate that the new model is capable of being trained to perform many different tasks, and that the quality of these results is comparable or better than established specialized algorithms.
Keywords: Rough Sets, Rough Neural Networks, Cellular Automata, Image Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19481916 Evaluation of Sensitometric Properties of Radiographic Films at Different Processing Solutions
Authors: Mojiri M, Ghazi Khanloo Sani K, Moghim Beigi A
Abstract:
The aim of this study was to compare the sensitometric properties of commonly used radiographic films processed with chemical solutions in different workload hospitals. The effect of different processing conditions on induced densities on radiologic films was investigated. Two accessible double emulsions Fuji and Kodak films were exposed with 11-step wedge and processed with Champion and CPAC processing solutions. The mentioned films provided in both workloads centers, high and low. Our findings displays that the speed and contrast of Kodak filmscreen in both work load (high and low) is higher than Fuji filmscreen for both processing solutions. However there was significant differences in films contrast for both workloads when CPAC solution had been used (p=0.000 and 0.028). The results showed base plus fog density for Kodak film was lower than Fuji. Generally Champion processing solution caused more speed and contrast for investigated films in different conditions and there was significant differences in 95% confidence level between two used processing solutions (p=0.01). Low base plus fog density for Kodak films provide more visibility and accuracy and higher contrast results in using lower exposure factors to obtain better quality in resulting radiographs. In this study we found an economic advantages since Champion solution and Kodak film are used while it makes lower patient dose. Thus, in a radiologic facility any change in film processor/processing cycle or chemistry should be carefully investigated before radiological procedures of patients are acquired.Keywords: Sensitometry, densitometry, Radiographic film, processing solution
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16271915 Expert System for Chose Material used Gears
Authors: E.V. Butilă, F. Gîrbacia
Abstract:
In order to give high expertise the computer aided design of mechanical systems involves specific activities focused on processing two type of information: knowledge and data. Expert rule based knowledge is generally processing qualitative information and involves searching for proper solutions and their combination into synthetic variant. Data processing is based on computational models and it is supposed to be inter-related with reasoning in the knowledge processing. In this paper an Intelligent Integrated System is proposed, for the objective of choosing the adequate material. The software is developed in Prolog – Flex software and takes into account various constraints that appear in the accurate operation of gears.Keywords: Expert System, computer aided design, gear boxdesign, chose material, Prolog, Flex
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16951914 Urdu Nastaleeq Optical Character Recognition
Authors: Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, Awais Adnan
Abstract:
This paper discusses the Urdu script characteristics, Urdu Nastaleeq and a simple but a novel and robust technique to recognize the printed Urdu script without a lexicon. Urdu being a family of Arabic script is cursive and complex script in its nature, the main complexity of Urdu compound/connected text is not its connections but the forms/shapes the characters change when it is placed at initial, middle or at the end of a word. The characters recognition technique presented here is using the inherited complexity of Urdu script to solve the problem. A word is scanned and analyzed for the level of its complexity, the point where the level of complexity changes is marked for a character, segmented and feeded to Neural Networks. A prototype of the system has been tested on Urdu text and currently achieves 93.4% accuracy on the average.Keywords: Cursive Script, OCR, Urdu.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27761913 Inefficiency of Data Storing in Physical Memory
Authors: Kamaruddin Malik Mohamad, Sapiee Haji Jamel, Mustafa Mat Deris
Abstract:
Memory forensic is important in digital investigation. The forensic is based on the data stored in physical memory that involve memory management and processing time. However, the current forensic tools do not consider the efficiency in terms of storage management and the processing time. This paper shows the high redundancy of data found in the physical memory that cause inefficiency in processing time and memory management. The experiment is done using Borland C compiler on Windows XP with 512 MB of physical memory.Keywords: Digital Evidence, Memory Forensics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20191912 A Stereo Image Processing System for Visually Impaired
Authors: G. Balakrishnan, G. Sainarayanan, R. Nagarajan, Sazali Yaacob
Abstract:
This paper presents a review on vision aided systems and proposes an approach for visual rehabilitation using stereo vision technology. The proposed system utilizes stereo vision, image processing methodology and a sonification procedure to support blind navigation. The developed system includes a wearable computer, stereo cameras as vision sensor and stereo earphones, all moulded in a helmet. The image of the scene infront of visually handicapped is captured by the vision sensors. The captured images are processed to enhance the important features in the scene in front, for navigation assistance. The image processing is designed as model of human vision by identifying the obstacles and their depth information. The processed image is mapped on to musical stereo sound for the blind-s understanding of the scene infront. The developed method has been tested in the indoor and outdoor environments and the proposed image processing methodology is found to be effective for object identification.Keywords: Blind navigation, stereo vision, image processing, object preference, music tones.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 41151911 Feature Selection Methods for an Improved SVM Classifier
Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp
Abstract:
Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18291910 Automatic Music Score Recognition System Using Digital Image Processing
Authors: Yuan-Hsiang Chang, Zhong-Xian Peng, Li-Der Jeng
Abstract:
Music has always been an integral part of human’s daily lives. But, for the most people, reading musical score and turning it into melody is not easy. This study aims to develop an Automatic music score recognition system using digital image processing, which can be used to read and analyze musical score images automatically. The technical approaches included: (1) staff region segmentation; (2) image preprocessing; (3) note recognition; and (4) accidental and rest recognition. Digital image processing techniques (e.g., horizontal /vertical projections, connected component labeling, morphological processing, template matching, etc.) were applied according to musical notes, accidents, and rests in staff notations. Preliminary results showed that our system could achieve detection and recognition rates of 96.3% and 91.7%, respectively. In conclusion, we presented an effective automated musical score recognition system that could be integrated in a system with a media player to play music/songs given input images of musical score. Ultimately, this system could also be incorporated in applications for mobile devices as a learning tool, such that a music player could learn to play music/songs.
Keywords: Connected component labeling, image processing, morphological processing, optical musical recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19311909 Motion Parameter Estimation via Dopplerlet-Transform-Based Matched Field Processing
Authors: Hongyan Dai
Abstract:
This work presents a matched field processing (MFP) algorithm based on Dopplerlet transform for estimating the motion parameters of a sound source moving along a straight line and with a constant speed by using a piecewise strategy, which can significantly reduce the computational burden. Monte Carlo simulation results and an experimental result are presented to verify the effectiveness of the algorithm advocated.Keywords: matched field processing; Dopplerlet transform; motion parameter estimation; piecewise strategy
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12261908 A Similarity Measure for Clustering and its Applications
Authors: Guadalupe J. Torres, Ram B. Basnet, Andrew H. Sung, Srinivas Mukkamala, Bernardete M. Ribeiro
Abstract:
This paper introduces a measure of similarity between two clusterings of the same dataset produced by two different algorithms, or even the same algorithm (K-means, for instance, with different initializations usually produce different results in clustering the same dataset). We then apply the measure to calculate the similarity between pairs of clusterings, with special interest directed at comparing the similarity between various machine clusterings and human clustering of datasets. The similarity measure thus can be used to identify the best (in terms of most similar to human) clustering algorithm for a specific problem at hand. Experimental results pertaining to the text categorization problem of a Portuguese corpus (wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The significance and other potential applications of the proposed measure are discussed.Keywords: Clustering Algorithms, Clustering Applications, Similarity Measures, Text Clustering
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15711907 Alignment of e-Government Policy Formulation with Practical Implementation: The Case of Sub-Saharan Africa
Authors: W. Munyoka, F. M. Manzira
Abstract:
The purpose of this study is to analyze how varying alignment of e-Government policies in four countries in Sub-Saharan Africa Region, namely South Africa, Seychelles, Mauritius and Cape Verde lead to the success or failure of e-Government; and what should be done to ensure positive alignment that lead to e-Government project growth. In addition, the study aims to understand how various governments’ efforts in e-Government awareness campaign strategies, international cooperation, functional literacy and anticipated organizational change can influence implementation.
This study extensively explores contemporary research undertaken in the field of e-Government and explores the actual respective national ICT policies, strategies and implemented e-Government projects for in-depth comprehension of the status core. Data is analyzed qualitatively and quantitatively to reach a conclusion.
The study found that resounding successes in strategic e-Government alignment was achieved in Seychelles, Mauritius, South Africa and Cape Verde - (Ranked number 1 to 4 respectively).
The implications of the study is that policy makers in developing countries should put mechanisms in place for constant monitoring and evaluation of project implementation in line with ICT policies to ensure that e-Government projects reach maturity levels and do not die mid-way implementation as often noticed in many countries. The study recommends that countries within the region should make consented collaborative efforts and synergies with the private sector players and international donor agencies to achieve the implementation part of the set ICT policies.
Keywords: E-Government, ICT-Policy Alignment, Implementation, Sub-Saharan Africa.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23401906 A Proposed Approach for Emotion Lexicon Enrichment
Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees
Abstract:
Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.Keywords: Document analysis, sentimental analysis, emotion detection, WEKA tool, NRC Lexicon.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14561905 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis
Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan
Abstract:
Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of big data technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centres or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through VADER and RoBERTa model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and Term Frequency – Inverse Document Frequency (TFIDF) Vectorization and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide if the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.
Keywords: Counter vectorization, Convolutional Neural Network, Crawler, data technology, Long Short-Term Memory, LSTM, Web Scraping, sentiment analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 175