Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 43

Text Mining Related Abstracts

43 Development of Terrorist Threat Prediction Model in Indonesia by Using Bayesian Network

Authors: Hilya Mudrika Arini, Nur Aini Masruroh, Budi Hartono

Abstract:

There are more than 20 terrorist threats from 2002 to 2012 in Indonesia. Despite of this fact, preventive solution through studies in the field of national security in Indonesia has not been conducted comprehensively. This study aims to provide a preventive solution by developing prediction model of the terrorist threat in Indonesia by using Bayesian network. There are eight stages to build the model, started from literature review, build and verify Bayesian belief network to what-if scenario. In order to build the model, four experts from different perspectives are utilized. This study finds several significant findings. First, news and the readiness of terrorist group are the most influent factor. Second, according to several scenarios of the news portion, it can be concluded that the higher positive news proportion, the higher probability of terrorist threat will occur. Therefore, the preventive solution to reduce the terrorist threat in Indonesia based on the model is by keeping the positive news portion to a maximum of 38%.

Keywords: Decision Analysis, Text Mining, Bayesian network, national security system

Procedia PDF Downloads 283
42 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: Social Media, Data Mining, Machine Learning, Text Mining, sentiment analysis, Twitter, Amazon

Procedia PDF Downloads 206
41 Issue Reorganization Using the Measure of Relevance

Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim

Abstract:

Recently, the demand of extracting the R&D keywords from the issues and using them in retrieving R&D information is increasing rapidly. But it is hard to identify the related issues or to distinguish them. Although the similarity between the issues cannot be identified, but with the R&D lexicon, the issues that always shared the same R&D keywords can be determined. In details, the R&D keywords that associated with particular issue is implied the key technology elements that needed to solve the problem of the particular issue. Furthermore, the related issues that sharing the same R&D keywords can be showed in a more systematic way through the issue clustering constructed from the perspective of R&D. Thus, sharing of the R&D result and reusable of the R&D technology can be facilitated. Indirectly, the redundancy of investment on the same R&D can be reduce as the R&D information can be shared between those corresponding issues and reusability of the related R&D can be improved. Therefore, a methodology of constructing an issue clustering from the perspective of common R&D keywords is proposed to satisfy the demands mentioned.

Keywords: Clustering, Text Mining, Social Network Analysis, topic analysis

Procedia PDF Downloads 415
40 What the Future Holds for Social Media Data Analysis

Authors: P. Wlodarczak, J. Soar, M. Ally

Abstract:

The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.

Keywords: Social Media, Knowledge Discovery, Machine Learning, Text Mining, predictive analysis

Procedia PDF Downloads 316
39 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey

Abstract:

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound method to simplify the texts.

Keywords: Text Mining, classification, Extraction, max-prod, fuzzy relations, memberships

Procedia PDF Downloads 424
38 Text Mining Techniques for Prioritizing Pathogenic Mutations in Protein Families Known to Misfold or Aggregate

Authors: Khaleel Saleh Al-Rababah

Abstract:

Amyloid fibril forming regions, which are known as protein aggregates, in sequences of some protein families are associated with a number of diseases known as amyloidosis. Mutations play a role in forming fibrils by accelerating the fibril formation process. In this paper we want to extract diseases that caused by those mutations as a result of the impact of the mutations on structural and functional properties of the aggregated protein. We propose a text mining system, to automatically extract mutations, diseases and relations between mutations and diseases. We presented an algorithm based on finite state to cluster mutations found in the same sentence as a sentence could contain different mutation cause different diseases. Also, we presented a co reference algorithm that enables cross-link sentences.

Keywords: Text Mining, Protein, amyloidosis, amyloid, co reference

Procedia PDF Downloads 381
37 Feature-Based Summarizing and Ranking from Customer Reviews

Authors: Dim En Nyaung, Thin Lai Lai Thein

Abstract:

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Keywords: Text Mining, Opinion mining, sentiment analysis, opinion summarization

Procedia PDF Downloads 197
36 Exploring Social Impact of Emerging Technologies from Futuristic Data

Authors: Yongtae Park, Heeyeul Kwon

Abstract:

Despite the highly touted benefits, emerging technologies have unleashed pervasive concerns regarding unintended and unforeseen social impacts. Thus, those wishing to create safe and socially acceptable products need to identify such side effects and mitigate them prior to the market proliferation. Various methodologies in the field of technology assessment (TA), namely Delphi, impact assessment, and scenario planning, have been widely incorporated in such a circumstance. However, literatures face a major limitation in terms of sole reliance on participatory workshop activities. They unfortunately missed out the availability of a massive untapped data source of futuristic information flooding through the Internet. This research thus seeks to gain insights into utilization of futuristic data, future-oriented documents from the Internet, as a supplementary method to generate social impact scenarios whilst capturing perspectives of experts from a wide variety of disciplines. To this end, network analysis is conducted based on the social keywords extracted from the futuristic documents by text mining, which is then used as a guide to produce a comprehensive set of detailed scenarios. Our proposed approach facilitates harmonized depictions of possible hazardous consequences of emerging technologies and thereby makes decision makers more aware of, and responsive to, broad qualitative uncertainties.

Keywords: Text Mining, Emerging Technologies, scenario, futuristic data

Procedia PDF Downloads 373
35 Spontaneous Message Detection of Annoying Situation in Community Networks Using Mining Algorithm

Authors: P. Senthil Kumari

Abstract:

Main concerns in data mining investigation are social controls of data mining for handling ambiguity, noise, or incompleteness on text data. We describe an innovative approach for unplanned text data detection of community networks achieved by classification mechanism. In a tangible domain claim with humble secrecy backgrounds provided by community network for evading annoying content is presented on consumer message partition. To avoid this, mining methodology provides the capability to unswervingly switch the messages and similarly recover the superiority of ordering. Here we designated learning-centered mining approaches with pre-processing technique to complete this effort. Our involvement of work compact with rule-based personalization for automatic text categorization which was appropriate in many dissimilar frameworks and offers tolerance value for permits the background of comments conferring to a variety of conditions associated with the policy or rule arrangements processed by learning algorithm. Remarkably, we find that the choice of classifier has predicted the class labels for control of the inadequate documents on community network with great value of effect.

Keywords: Text Mining, data classification, community network, learning algorithm

Procedia PDF Downloads 368
34 Investigating Dynamic Transition Process of Issues Using Unstructured Text Analysis

Authors: Chen Liu, Yoonjin Hyun, Seongi Choi, Namgyu Kim, Dasom Kim, Myungsu Lim, William Xiu Shun Wong

Abstract:

The amount of real-time data generated through various mass media has been increasing rapidly. In this study, we had performed topic analysis by using the unstructured text data that is distributed through news article. As one of the most prevalent applications of topic analysis, the issue tracking technique investigates the changes of the social issues that identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has limitation that it cannot discover dynamic mutation process of complex social issues. The purpose of this study is to overcome the limitations of the existing issue tracking method. We first derived core issues of each period, and then discover the dynamic mutation process of various issues. In this study, we further analyze the mutation process from the perspective of the issues categories, in order to figure out the pattern of issue flow, including the frequency and reliability of the pattern. In other words, this study allows us to understand the components of the complex issues by tracking the dynamic history of issues. This methodology can facilitate a clearer understanding of complex social phenomena by providing mutation history and related category information of the phenomena.

Keywords: Data Mining, Text Mining, topic analysis, Issue Tracking, topic Detection, Trend Detection

Procedia PDF Downloads 188
33 Uplift Modeling Approach to Optimizing Content Quality in Social Q/A Platforms

Authors: Igor A. Podgorny

Abstract:

TurboTax AnswerXchange is a social Q/A system supporting users working on federal and state tax returns. Content quality and popularity in the AnswerXchange can be predicted with propensity models using attributes of the question and answer. Using uplift modeling, we identify features of questions and answers that can be modified during the question-asking and question-answering experience in order to optimize the AnswerXchange content quality. We demonstrate that adding details to the questions always results in increased question popularity that can be used to promote good quality content. Responding to close-ended questions assertively improve content quality in the AnswerXchange in 90% of cases. Answering knowledge questions with web links increases the likelihood of receiving a negative vote from 60% of the askers. Our findings provide a rationale for employing the uplift modeling approach for AnswerXchange operations.

Keywords: Text Mining, Human-machine interaction, Customer Relationship Management, uplift modeling

Procedia PDF Downloads 126
32 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques

Authors: Mei-Yi Wu, Shang-Ming Huang

Abstract:

The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.

Keywords: Text Mining, Online Marketing, mobile image retrieval, product information service system

Procedia PDF Downloads 244
31 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues

Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid

Abstract:

New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.

Keywords: Text Mining, visual analytics, information visualization, big data visualization, visual text analytics tools

Procedia PDF Downloads 253
30 A Methodology for Automatic Diversification of Document Categories

Authors: Chen Liu, Kee-Young Kwahk, Namgyu Kim, Dasom Kim, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: Text Mining, Big Data Analysis, Document Classification, topic analysis, multi-category

Procedia PDF Downloads 143
29 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis

Authors: Chen Liu, Kee-Young Kwahk, Yoonjin Hyun, Seongi Choi, Namgyu Kim, Dasom Kim, Myungsu Lim, William Xiu Shun Wong

Abstract:

Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.

Keywords: Text Mining, Big Data, Social Network Analysis, topic modeling

Procedia PDF Downloads 163
28 Facilitating Written Biology Assessment in Large-Enrollment Courses Using Machine Learning

Authors: Luanna B. Prevost, Kelli Carter, Margaurete Romero, Kirsti Martinez

Abstract:

Writing is an essential scientific practice, yet, in several countries, the increasing university science class-size limits the use of written assessments. Written assessments allow students to demonstrate their learning in their own words and permit the faculty to evaluate students’ understanding. However, the time and resources required to grade written assessments prohibit their use in large-enrollment science courses. This study examined the use of machine learning algorithms to automatically analyze student writing and provide timely feedback to the faculty about students' writing in biology. Written responses to questions about matter and energy transformation were collected from large-enrollment undergraduate introductory biology classrooms. Responses were analyzed using the LightSide text mining and classification software. Cohen’s Kappa was used to measure agreement between the LightSide models and human raters. Predictive models achieved agreement with human coding of 0.7 Cohen’s Kappa or greater. Models captured that when writing about matter-energy transformation at the ecosystem level, students focused on primarily on the concepts of heat loss, recycling of matter, and conservation of matter and energy. Models were also produced to capture writing about processes such as decomposition and biochemical cycling. The models created in this study can be used to provide automatic feedback about students understanding of these concepts to biology faculty who desire to use formative written assessments in larger enrollment biology classes, but do not have the time or personnel for manual grading.

Keywords: Machine Learning, Text Mining, Biology Education, written assessment

Procedia PDF Downloads 151
27 Analyzing Semantic Feature Using Multiple Information Sources for Reviews Summarization

Authors: Yu Hung Chiang, Hei Chia Wang

Abstract:

Nowadays, tourism has become a part of life. Before reserving hotels, customers need some information, which the most important source is online reviews, about hotels to help them make decisions. Due to the dramatic growing of online reviews, it is impossible for tourists to read all reviews manually. Therefore, designing an automatic review analysis system, which summarizes reviews, is necessary for them. The main purpose of the system is to understand the opinion of reviews, which may be positive or negative. In other words, the system would analyze whether the customers who visited the hotel like it or not. Using sentiment analysis methods will help the system achieve the purpose. In sentiment analysis methods, the targets of opinion (here they are called the feature) should be recognized to clarify the polarity of the opinion because polarity of the opinion may be ambiguous. Hence, the study proposes an unsupervised method using Part-Of-Speech pattern and multi-lexicons sentiment analysis to summarize all reviews. We expect this method can help customers search what they want information as well as make decisions efficiently.

Keywords: Text Mining, sentiment analysis, product feature extraction, multi-lexicons

Procedia PDF Downloads 194
26 A Social Decision Support Mechanism for Group Purchasing

Authors: Lien-Fa Lin, Yung-Ming Li, Fu-Shun Hsieh

Abstract:

With the advancement of information technology and development of group commerce, people have obviously changed in their lifestyle. However, group commerce faces some challenging problems. The products or services provided by vendors do not satisfactorily reflect customers’ opinions, so that the sale and revenue of group commerce gradually become lower. On the other hand, the process for a formed customer group to reach group-purchasing consensus is time-consuming and the final decision is not the best choice for each group members. In this paper, we design a social decision support mechanism, by using group discussion message to recommend suitable options for group members and we consider social influence and personal preference to generate option ranking list. The proposed mechanism can enhance the group purchasing decision making efficiently and effectively and venders can provide group products or services according to the group option ranking list.

Keywords: Social Network, Text Mining, Group Decision, group commerce

Procedia PDF Downloads 257
25 Investigation of Topic Modeling-Based Semi-Supervised Interpretable Document Classifier

Authors: Yoonjin Hyun, Namgyu Kim, Dasom Kim, William Xiu Shun Wong, Donghoon Lee, Minji Paek, Sungho Byun

Abstract:

There have been many researches on document classification for classifying voluminous documents automatically. Through document classification, we can assign a specific category to each unlabeled document on the basis of various machine learning algorithms. However, providing labeled documents manually requires considerable time and effort. To overcome the limitations, the semi-supervised learning which uses unlabeled document as well as labeled documents has been invented. However, traditional document classifiers, regardless of supervised or semi-supervised ones, cannot sufficiently explain the reason or the process of the classification. Thus, in this paper, we proposed a methodology to visualize major topics and class components of each document. We believe that our methodology for visualizing topics and classes of each document can enhance the reliability and explanatory power of document classifiers.

Keywords: Data Mining, Text Mining, topic modeling, document classifier

Procedia PDF Downloads 206
24 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Sung Ho Ha, Kyung Bae Park

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: Text Mining, Visualization, latent dirichlet allocation, topic model, R program, user generated contents

Procedia PDF Downloads 80
23 Searching Linguistic Synonyms through Parts of Speech Tagging

Authors: Usman Qamar, Faiza Hussain

Abstract:

Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.

Keywords: Semantics, Information Retrieval, Text Mining, natural language processing, Grammar, parts-of-speech tagging

Procedia PDF Downloads 155
22 A Recommender System Fusing Collaborative Filtering and User’s Review Mining

Authors: Hyunchul Ahn, Seulbi Choi

Abstract:

Collaborative filtering (CF) algorithm has been popularly used for recommender systems in both academic and practical applications. It basically generates recommendation results using users’ numeric ratings. However, the additional use of the information other than user ratings may lead to better accuracy of CF. Considering that a lot of people are likely to share their honest opinion on the items they purchased recently due to the advent of the Web 2.0, user's review can be regarded as the new informative source for identifying user's preference with accuracy. Under this background, this study presents a hybrid recommender system that fuses CF and user's review mining. Our system adopts conventional memory-based CF, but it is designed to use both user’s numeric ratings and his/her text reviews on the items when calculating similarities between users.

Keywords: Text Mining, Recommender System, Collaborative Filtering, Review mining

Procedia PDF Downloads 141
21 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis

Authors: Tae-Heon Moon, Ju-Young Kim, Jung-Hun Cho

Abstract:

Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.

Keywords: Text Mining, Urban Regeneration, Land Use, SNS

Procedia PDF Downloads 132
20 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: Text Mining, independent component analysis, independent, incremental, topic extraction

Procedia PDF Downloads 154
19 Evolving Knowledge Extraction from Online Resources

Authors: Zhibo Xiao, Tharini Nayanika de Silva, Kezhi Mao

Abstract:

In this paper, we present an evolving knowledge extraction system named AKEOS (Automatic Knowledge Extraction from Online Sources). AKEOS consists of two modules, including a one-time learning module and an evolving learning module. The one-time learning module takes in user input query, and automatically harvests knowledge from online unstructured resources in an unsupervised way. The output of the one-time learning is a structured vector representing the harvested knowledge. The evolving learning module automatically schedules and performs repeated one-time learning to extract the newest information and track the development of an event. In addition, the evolving learning module summarizes the knowledge learned at different time points to produce a final knowledge vector about the event. With the evolving learning, we are able to visualize the key information of the event, discover the trends, and track the development of an event.

Keywords: Text Mining, knowledge extraction, evolving learning, knowledge graph

Procedia PDF Downloads 314
18 A Case Study of Ontology-Based Sentiment Analysis for Fan Pages

Authors: C. -L. Huang, J. -H. Ho

Abstract:

Social media has become more and more important in our life. Many enterprises promote their services and products to fans via the social media. The positive or negative sentiment of feedbacks from fans is very important for enterprises to improve their products, services, and promotion activities. The purpose of this paper is to understand the sentiment of the fan’s responses by analyzing the responses posted by fans on Facebook. The entity and aspect of fan’s responses were analyzed based on a predefined ontology. The ontology for cell phone sentiment analysis consists of aspect categories on the top level as follows: overall, shape, hardware, brand, price, and service. Each category consists of several sub-categories. All aspects for a fan’s response were found based on the ontology, and their corresponding sentimental terms were found using lexicon-based approach. The sentimental scores for aspects of fan responses were obtained by summarizing the sentimental terms in responses. The frequency of 'like' was also weighted in the sentimental score calculation. Three famous cell phone fan pages on Facebook were selected as demonstration cases to evaluate performances of the proposed methodology. Human judgment by several domain experts was also built for performance comparison. The performances of proposed approach were as good as those of human judgment on precision, recall and F1-measure.

Keywords: Ontology, Text Mining, Opinion mining, sentiment analysis

Procedia PDF Downloads 118
17 Developing an Exhaustive and Objective Definition of Social Enterprise through Computer Aided Text Analysis

Authors: Deepika Verma, Runa Sarkar

Abstract:

One of the prominent debates in the social entrepreneurship literature has been to establish whether entrepreneurial work for social well-being by for-profit organizations can be classified as social entrepreneurship or not. Of late, the scholarship has reached a consensus. It concludes that there seems little sense in confining social entrepreneurship to just non-profit organizations. Boosted by this research, increasingly a lot of businesses engaged in filling the social infrastructure gaps in developing countries are calling themselves social enterprise. These organizations are diverse in their ownership, size, objectives, operations and business models. The lack of a comprehensive definition of social enterprise leads to three issues. Firstly, researchers may face difficulty in creating a database for social enterprises because the choice of an entity as a social enterprise becomes subjective or based on some pre-defined parameters by the researcher which is not replicable. Secondly, practitioners who use ‘social enterprise’ in their vision/mission statement(s) may find it difficult to adjust their business models accordingly especially during the times when they face the dilemma of choosing social well-being over business viability. Thirdly, social enterprise and social entrepreneurship attract a lot of donor funding and venture capital. In the paucity of a comprehensive definitional guide, the donors or investors may find assigning grants and investments difficult. It becomes necessary to develop an exhaustive and objective definition of social enterprise and examine whether the understanding of the academicians and practitioners about social enterprise match. This paper develops a dictionary of words often associated with social enterprise or (and) social entrepreneurship. It further compares two lexicographic definitions of social enterprise imputed from the abstracts of academic journal papers and trade publications extracted from the EBSCO database using the ‘tm’ package in R software.

Keywords: Text Mining, Social Enterprise, EBSCO database, lexicographic definition

Procedia PDF Downloads 238
16 Recognizing Customer Preferences Using Review Documents: A Hybrid Text and Data Mining Approach

Authors: Oshin Anand, Atanu Rakshit

Abstract:

The vast increment in the e-commerce ventures makes this area a prominent research stream. Besides several quantified parameters, the textual content of reviews is a storehouse of many information that can educate companies and help them earn profit. This study is an attempt in this direction. The article attempts to categorize data based on a computed metric that quantifies the influencing capacity of reviews rendering two categories of high and low influential reviews. Further, each of these document is studied to conclude several product feature categories. Each of these categories along with the computed metric is converted to linguistic identifiers and are used in an association mining model. The article makes a novel attempt to combine feature attraction with quantified metric to categorize review text and finally provide frequent patterns that depict customer preferences. Frequent mentions in a highly influential score depict customer likes or preferred features in the product whereas prominent pattern in low influencing reviews highlights what is not important for customers. This is achieved using a hybrid approach of text mining for feature and term extraction, sentiment analysis, multicriteria decision-making technique and association mining model.

Keywords: Text Mining, association mining, customer preference, frequent pattern, online reviews

Procedia PDF Downloads 252
15 Twitter's Impact on Print Media with Respect to Real World Events

Authors: Basit Shahzad, Abdullatif M. Abdullatif

Abstract:

Recent advancements in Information and Communication Technologies (ICT) and easy access to Internet have made social media the first choice for information sharing related to any important events or news. On Twitter, trend is a common feature that quantifies the level of popularity of a certain news or event. In this work, we examine the impact of Twitter trends on real world events by hypothesizing that Twitter trends have an influence on print media in Pakistan. For this, Twitter is used as a platform and Twitter trends as a base line. We first collect data from two sources (Twitter trends and print media) in the period May to August 2016. Obtained data from two sources is analyzed and it is observed that social media is significantly influencing the print media and majority of the news printed in newspaper are posted on Twitter earlier.

Keywords: Text Mining, print media, twitter trends, effectiveness of trends

Procedia PDF Downloads 111
14 The Crisis of Turkey's Downing the Russian Warplane within the Concept of Country Branding: The Examples of BBC World, and Al Jazeera English

Authors: Derya Gül Ünlü, Oguz Kuş

Abstract:

The branding of a country means that the country has its own position different from other countries in its region and thus it is perceived more specifically. It is made possible by the branding efforts of a country and the uniqueness of all the national structures, by presenting it in a specific way, by creating the desired image and attracting tourists and foreign investors. Establishing a national brand involves, in a sense, the process of managing the perceptions of the citizens of the other country about the target country, by structuring the image of the country permanently and holistically. By this means, countries are not easily affected by their crisis of international relations. Therefore, within the scope of the research that will be carried out from this point, it is aimed to show how the warplane downing crisis between Turkey and Russia is perceived on social media. The Russian warplane was downed by Turkey on November 24, 2015, on the grounds that Turkey violated the airspace on the Syrian border. Whereupon the relations between the two countries have been tensed, and Russia has called on its citizens not to go to Turkey and citizens in Turkey to return to their countries. Moreover, relations between two countries have been weakened, for example, tourism tours organized in Russia to Turkey and visa-free travel were canceled and all military dialogue was cut off. After the event, various news sites on social media published plenty of news related to topic and the readers made various comments about the event and Turkey. In this context, an investigation into the perception of Turkey's national brand before and after the warplane downing crisis has been conducted. through comments fetched from the reports on the BBC World, and from Al Jazeera English news sites on Facebook accounts, which takes place widely in the social media. In order to realize study, user comments were fetched from jet downing-related news which are published on Facebook fan-page of BBC World Service, and Al Jazeera English. Regarding this, all the news published between 24.10.2015-24.12.2015 and containing Turk and Turkey keyword in its title composed data set of our study. Afterwards, comments written to these news were analyzed via text mining technique. Furthermore, by sentiment analysis, it was intended to reveal reader’s emotions before and after the crisis.

Keywords: Social Media, Text Mining, Al Jazeera English, BBC World, country branding

Procedia PDF Downloads 106