Search results for: Emotions in tweets emotion detection in text
2069 Connectionist Approach to Generic Text Summarization
Authors: Rajesh S.Prasad, U. V. Kulkarni, Jayashree.R.Prasad
Abstract:
As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance. Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15902068 Accuracy of Divergence Measures for Detection of Abrupt Changes
Authors: P. Bergl
Abstract:
Numerous divergence measures (spectral distance, cepstral distance, difference of the cepstral coefficients, Kullback-Leibler divergence, distance given by the General Likelihood Ratio, distance defined by the Recursive Bayesian Changepoint Detector and the Mahalanobis measure) are compared in this study. The measures are used for detection of abrupt spectral changes in synthetic AR signals via the sliding window algorithm. Two experiments are performed; the first is focused on detection of single boundary while the second concentrates on detection of a couple of boundaries. Accuracy of detection is judged for each method; the measures are compared according to results of both experiments.Keywords: Abrupt changes detection, autoregressive model, divergence measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14482067 Image Spam Detection Using Color Features and K-Nearest Neighbor Classification
Authors: T. Kumaresan, S. Sanjushree, C. Palanisamy
Abstract:
Image spam is a kind of email spam where the spam text is embedded with an image. It is a new spamming technique being used by spammers to send their messages to bulk of internet users. Spam email has become a big problem in the lives of internet users, causing time consumption and economic losses. The main objective of this paper is to detect the image spam by using histogram properties of an image. Though there are many techniques to automatically detect and avoid this problem, spammers employing new tricks to bypass those techniques, as a result those techniques are inefficient to detect the spam mails. In this paper we have proposed a new method to detect the image spam. Here the image features are extracted by using RGB histogram, HSV histogram and combination of both RGB and HSV histogram. Based on the optimized image feature set classification is done by using k- Nearest Neighbor(k-NN) algorithm. Experimental result shows that our method has achieved better accuracy. From the result it is known that combination of RGB and HSV histogram with k-NN algorithm gives the best accuracy in spam detection.
Keywords: File Type, HSV Histogram, k-NN, RGB Histogram, Spam Detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21422066 Hybrid Machine Learning Approach for Text Categorization
Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite
Abstract:
Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.
Keywords: Text categorization, decision trees, neural networks, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18052065 Speech Encryption and Decryption Using Linear Feedback Shift Register (LFSR)
Authors: Tin Lai Win, Nant Christina Kyaw
Abstract:
This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.
Keywords: Linear Feedback Shift Register.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31102064 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition
Authors: Redouane Tlemsani, Abdelkader Benyettou
Abstract:
Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.
This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.
Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.
In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.
The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.
Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17812063 Objective Evaluation of Mathematical Morphology Edge Detection on Computed Tomography (CT) Images
Authors: Emhimed Saffor, Abdelkader Salama
Abstract:
In this paper problem of edge detection in digital images is considered. Edge detection based on morphological operators was applied on two sets (brain & chest) ct images. Three methods of edge detection by applying line morphological filters with multi structures in different directions have been used. 3x3 filter for first method, 5x5 filter for second method, and 7x7 filter for third method. We had applied this algorithm on (13 images) under MATLAB program environment. In order to evaluate the performance of the above mentioned edge detection algorithms, standard deviation (SD) and peak signal to noise ratio (PSNR) were used for justification for all different ct images. The objective method and the comparison of different methods of edge detection, shows that high values of both standard deviation and PSNR values of edge detection images were obtained.
Keywords: Medical images, Matlab, Edge detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26382062 A New Implementation of PCA for Fast Face Detection
Authors: Hazem M. El-Bakry
Abstract:
Principal Component Analysis (PCA) has many different important applications especially in pattern detection such as face detection / recognition. Therefore, for real time applications, the response time is required to be as small as possible. In this paper, new implementation of PCA for fast face detection is presented. Such new implementation is designed based on cross correlation in the frequency domain between the input image and eigenvectors (weights). Simulation results show that the proposed implementation of PCA is faster than conventional one.Keywords: Fast Face Detection, PCA, Cross Correlation, Frequency Domain
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17962061 Image Segmentation and Contour Recognition Based on Mathematical Morphology
Authors: Pinaki Pratim Acharjya, Esha Dutta
Abstract:
In image segmentation contour detection is one of the important pre-processing steps in recent days. Contours characterize boundaries and contour detection is one of the most difficult tasks in image processing. Hence it is a problem of fundamental importance in image processing. Contour detection of an image decreases the volume of data considerably and useless information is removed, but the structural properties of the image remain same. In this research, a robust and effective contour detection technique has been proposed using mathematical morphology. Three different contour detection results are obtained by using morphological dilation and erosion. The comparative analyses of three different results also have been done.Keywords: Image segmentation, contour detection, mathematical morphology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14272060 T-Wave Detection Based on an Adjusted Wavelet Transform Modulus Maxima
Authors: Samar Krimi, Kaïs Ouni, Noureddine Ellouze
Abstract:
The method described in this paper deals with the problems of T-wave detection in an ECG. Determining the position of a T-wave is complicated due to the low amplitude, the ambiguous and changing form of the complex. A wavelet transform approach handles these complications therefore a method based on this concept was developed. In this way we developed a detection method that is able to detect T-waves with a sensitivity of 93% and a correct-detection ratio of 93% even with a serious amount of baseline drift and noise.Keywords: ECG, Modulus Maxima Wavelet Transform, Performance, T-wave detection
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18522059 Segmentation of Korean Words on Korean Road Signs
Authors: Lae-Jeong Park, Kyusoo Chung, Jungho Moon
Abstract:
This paper introduces an effective method of segmenting Korean text (place names in Korean) from a Korean road sign image. A Korean advanced directional road sign is composed of several types of visual information such as arrows, place names in Korean and English, and route numbers. Automatic classification of the visual information and extraction of Korean place names from the road sign images make it possible to avoid a lot of manual inputs to a database system for management of road signs nationwide. We propose a series of problem-specific heuristics that correctly segments Korean place names, which is the most crucial information, from the other information by leaving out non-text information effectively. The experimental results with a dataset of 368 road sign images show 96% of the detection rate per Korean place name and 84% per road sign image.Keywords: Segmentation, road signs, characters, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27502058 Tonal Pitch Structure as a Tool of Social Consolidation
Authors: Piotr Podlipniak
Abstract:
This paper proposes that in the course of evolution pitch structure became a human specific tool of communication the function of which is to induce emotional states such as uncertainty and cohesion. By the means of eliciting these emotions during collective music performance people are able to unconsciously give cues concerning social acceptance. This is probably one of the reasons why in all cultures people collectively perform tonal music. It is also suggested that tonal pitch structure had been invented socially before it became an evolutionary innovation of hominines. It means that a predisposition to tonally organize pitches evolved by the means of ‘Baldwin effect’ – a process in which natural selection transforms the learned response of an organism into the instinctive response. In the proposed, hypothetical evolutionary scenario of the emergence of tonal pitch structure social forces such as a need for closer cooperation play the crucial role.Keywords: Emotion, evolution, tonality, social consolidation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14072057 Words Reordering based on Statistical Language Model
Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou
Abstract:
There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.Keywords: Permutations filtering, Statistical languagemodel N-grams, Word order errors
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15852056 Improvements in Edge Detection Based on Mathematical Morphology and Wavelet Transform using Fuzzy Rules
Authors: Masrour Dowlatabadi, Jalil Shirazi
Abstract:
In this paper, an improved edge detection algorithm based on fuzzy combination of mathematical morphology and wavelet transform is proposed. The combined method is proposed to overcome the limitation of wavelet based edge detection and mathematical morphology based edge detection in noisy images. Experimental results show superiority of the proposed method, as compared to the traditional Prewitt, wavelet based and morphology based edge detection methods. The proposed method is an effective edge detection method for noisy image and keeps clear and continuous edges.Keywords: Edge detection, Wavelet transform, Mathematical morphology, Fuzzy logic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24002055 Defect Prevention and Detection of DSP-software
Authors: Deng Shiwei
Abstract:
The users are now expecting higher level of DSP(Digital Signal Processing) software quality than ever before. Prevention and detection of defect are critical elements of software quality assurance. In this paper, principles and rules for prevention and detection of defect are suggested, which are not universal guidelines, but are useful for both novice and experienced DSP software developers.Keywords: defect detection, defect prevention, DSP-software, software development, software testing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18102054 Event Information Extraction System (EIEE): FSM vs HMM
Authors: Shaukat Wasi, Zubair A. Shaikh, Sajid Qasmi, Hussain Sachwani, Rehman Lalani, Aamir Chagani
Abstract:
Automatic Extraction of Event information from social text stream (emails, social network sites, blogs etc) is a vital requirement for many applications like Event Planning and Management systems and security applications. The key information components needed from Event related text are Event title, location, participants, date and time. Emails have very unique distinctions over other social text streams from the perspective of layout and format and conversation style and are the most commonly used communication channel for broadcasting and planning events. Therefore we have chosen emails as our dataset. In our work, we have employed two statistical NLP methods, named as Finite State Machines (FSM) and Hidden Markov Model (HMM) for the extraction of event related contextual information. An application has been developed providing a comparison among the two methods over the event extraction task. It comprises of two modules, one for each method, and works for both bulk as well as direct user input. The results are evaluated using Precision, Recall and F-Score. Experiments show that both methods produce high performance and accuracy, however HMM was good enough over Title extraction and FSM proved to be better for Venue, Date, and time.Keywords: Emails, Event Extraction, Event Detection, Finite state machines, Hidden Markov Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23172053 Real-time Detection of Space Manipulator Self-collision
Authors: Zhang Xiaodong, Tang Zixin, Liu Xin
Abstract:
In order to avoid self-collision of space manipulators during operation process, a real-time detection method is proposed in this paper. The manipulator is fitted into a cylinder-enveloping surface, and then, a kind of detection algorithm of collision between cylinders is analyzed. The collision model of space manipulator self-links can be detected by using this algorithm in real-time detection during the operation process. To ensure security of the operation, a safety threshold is designed. The simulation and experiment results verify the effectiveness of the proposed algorithm for a 7-DOF space manipulator.Keywords: Space manipulator, Collision detection, Self-collision, the real-time collision detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20342052 A File Splitting Technique for Reducing the Entropy of Text Files
Authors: Abdel-Rahman M. Jaradat, , Mansour I. Irshid, Talha T. Nassar
Abstract:
A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.
Keywords: Bit-wise compression, entropy, file splitting, source mapping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14422051 A Novel Arabic Text Steganography Method Using Letter Points and Extensions
Authors: Adnan Abdul-Aziz Gutub, Manal Mohammad Fattani
Abstract:
This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.Keywords: Arabic text, Cryptography, Feature coding, Information security, Text steganography, Text watermarking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35032050 Early Depression Detection for Young Adults with a Psychiatric and AI Interdisciplinary Multimodal Framework
Authors: Raymond Xu, Ashley Hua, Andrew Wang, Yuru Lin
Abstract:
During COVID-19, the depression rate has increased dramatically. Young adults are most vulnerable to the mental health effects of the pandemic. Lower-income families have a higher ratio to be diagnosed with depression than the general population, but less access to clinics. This research aims to achieve early depression detection at low cost, large scale, and high accuracy with an interdisciplinary approach by incorporating clinical practices defined by American Psychiatric Association (APA) as well as multimodal AI framework. The proposed approach detected the nine depression symptoms with Natural Language Processing sentiment analysis and a symptom-based Lexicon uniquely designed for young adults. The experiments were conducted on the multimedia survey results from adolescents and young adults and unbiased Twitter communications. The result was further aggregated with the facial emotional cues analyzed by the Convolutional Neural Network on the multimedia survey videos. Five experiments each conducted on 10k data entries reached consistent results with an average accuracy of 88.31%, higher than the existing natural language analysis models. This approach can reach 300+ million daily active Twitter users and is highly accessible by low-income populations to promote early depression detection to raise awareness in adolescents and young adults and reveal complementary cues to assist clinical depression diagnosis.
Keywords: Artificial intelligence, depression detection, facial emotion recognition, natural language processing, mental disorder.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11762049 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data
Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro
Abstract:
Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.
Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4592048 Improved Skin Detection Using Colour Space and Texture
Authors: Medjram Sofiane, Babahenini Mohamed Chaouki, Mohamed Benali Yamina
Abstract:
Skin detection is an important task for computer vision systems. A good method of skin detection means a good and successful result of the system. The colour is a good descriptor for image segmentation and classification; it allows detecting skin colour in the images. The lighting changes and the objects that have a colour similar than skin colour make the operation of skin detection difficult. In this paper, we proposed a method using the YCbCr colour space for skin detection and lighting effects elimination, then we use the information of texture to eliminate the false regions detected by the YCbCr skin model.
Keywords: Skin detection, YCbCr, GLCM, Texture, Human skin.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24492047 Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks
Authors: Jalil Z., Farooq M., Zafar H., Sabir M., Ashraf E.
Abstract:
Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author-s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers.Keywords: Copyright protection, Digital watermarking, Document authentication, Information security, Watermark.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21592046 The Greek Version of the Southampton Nostalgia Scale: Psychometric Properties in Young Adults and Associations with Life Satisfaction, Positive and Negative Emotions, Time Perspective and Wellbeing
Authors: Eirini Petratou, Pezirkianidis Christos, Anastassios Stalikas
Abstract:
Nostalgia is characterized as a mental state of human’s emotional longing for the past that activates both positive and negative emotions. The bittersweet emotions that are activated by nostalgia aid psychological functions to humans and are depended on the type of stimuli that evoke nostalgia but also on the nostalgia activation context. In general, despite that nostalgia can be activated and experienced by all people; however, it differs both in terms of nostalgia experience but also nostalgia frequency. As a matter of fact, nostalgia experience along with nostalgia frequency differs according to the level of the nostalgia proneness. People with high nostalgia proneness tend to experience nostalgia more intensely and frequently than people with low nostalgia proneness. Nostalgia proneness is considered as a basic individual difference that affects the experience of nostalgia, and it can be measured by the Southampton Nostalgia Scale (SNS); a psychometric instrument that measures human’s nostalgia proneness consisting of seven questions that assess a person’s attitude towards nostalgia, the degree of experience or tendency to nostalgic feelings and the nostalgia frequency. In the current study, we translated, validated and calibrated the SNS in Greek population (N = 267). For the calibration process, we used several scales relevant to positive dimensions, such as life satisfaction, positive and negative emotions, time perspective and wellbeing. A confirmatory factor analysis revealed the factors that provide a good Southampton Nostalgia Proneness model fit for young adult Greek population.
Keywords: Nostalgia proneness, nostalgia, psychometric instruments, positive emotions.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13542045 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification
Authors: Bharatendra Rai
Abstract:
Sequences of words in text data have long-term dependencies and are known to suffer from vanishing gradient problem when developing deep learning models. Although recurrent networks such as long short-term memory networks help overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine advantages of long short-term memory networks and convolutional neural networks, can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting of a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning.
Keywords: Convolutional recurrent networks, hyperparameter tuning, long short-term memory networks, Tukey honest significant differences
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1152044 Lexical Based Method for Opinion Detection on Tripadvisor Collection
Authors: Faiza Belbachir, Thibault Schienhinski
Abstract:
The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.Keywords: Tripadvisor, Opinion detection, SentiWordNet, trust score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7502043 Mining Association Rules from Unstructured Documents
Authors: Hany Mahgoub
Abstract:
This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24412042 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text
Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni
Abstract:
The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.Keywords: Cooccurrence graph, entity relation graph, unstructured text, weighted distance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6842041 A Comparative Study of Virus Detection Techniques
Authors: Sulaiman Al Amro, Ali Alkhalifah
Abstract:
The growing number of computer viruses and the detection of zero day malware have been the concern for security researchers for a large period of time. Existing antivirus products (AVs) rely on detecting virus signatures which do not provide a full solution to the problems associated with these viruses. The use of logic formulae to model the behaviour of viruses is one of the most encouraging recent developments in virus research, which provides alternatives to classic virus detection methods. In this paper, we proposed a comparative study about different virus detection techniques. This paper provides the advantages and drawbacks of different detection techniques. Different techniques will be used in this paper to provide a discussion about what technique is more effective to detect computer viruses.Keywords: Computer viruses, virus detection, signature-based, behaviour-based, heuristic-based.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 45972040 Development of Multimodal e-Slide Presentation to Support Self-Learning for the Visually Impaired
Authors: Rustam Asnawi, Wan Fatimah Wan Ahmad
Abstract:
Currently electronic slide (e-slide) is one of the most common styles in educational presentation. Unfortunately, the utilization of e-slide for the visually impaired is uncommon since they are unable to see the content of such e-slides which are usually composed of text, images and animation. This paper proposes a model for presenting e-slide in multimodal presentation i.e. using conventional slide concurrent with voicing, in both languages Malay and English. At the design level, live multimedia presentation concept is used, while at the implementation level several components are used. The text content of each slide is extracted using COM component, Microsoft Speech API for voicing the text in English language and the text in Malay language is voiced using dictionary approach. To support the accessibility, an auditory user interface is provided as an additional feature. A prototype of such model named as VSlide has been developed and introduced.
Keywords: presentation, self-learning, slide, visually impaired
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1569