Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 14238

Search results for: topic model

14238 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang


Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: text mining, Twitter, topic model, sentiment analysis

Procedia PDF Downloads 90
14237 Optimized Text Summarization Model on Mobile Screens for Sight-Interpreters: An Empirical Study

Authors: Jianhua Wang


To obtain key information quickly from long texts on small screens of mobile devices, sight-interpreters need to establish optimized summarization model for fast information retrieval. Four summarization models based on previous studies were studied including title+key words (TKW), title+topic sentences (TTS), key words+topic sentences (KWTS) and title+key words+topic sentences (TKWTS). Psychological experiments were conducted on the four models for three different genres of interpreting texts to establish the optimized summarization model for sight-interpreters. This empirical study shows that the optimized summarization model for sight-interpreters to quickly grasp the key information of the texts they interpret is title+key words (TKW) for cultural texts, title+key words+topic sentences (TKWTS) for economic texts and topic sentences+key words (TSKW) for political texts.

Keywords: different genres, mobile screens, optimized summarization models, sight-interpreters

Procedia PDF Downloads 236
14236 Topic Modelling Using Latent Dirichlet Allocation and Latent Semantic Indexing on SA Telco Twitter Data

Authors: Phumelele Kubheka, Pius Owolawi, Gbolahan Aiyetoro


Twitter is one of the most popular social media platforms where users can share their opinions on different subjects. As of 2010, The Twitter platform generates more than 12 Terabytes of data daily, ~ 4.3 petabytes in a single year. For this reason, Twitter is a great source for big mining data. Many industries such as Telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model represented in Table 1. A higher topic coherence score indicates better performance of the model.

Keywords: big data, latent Dirichlet allocation, latent semantic indexing, telco, topic modeling, twitter

Procedia PDF Downloads 44
14235 AIPM:An Integrator and Pull Request Matching Model in Github

Authors: Zhifang Liao, Yanbing Li, Li Xu, Yan Zhang, Xiaoping Fan, Jinsong Wu


Pull Request (PR) is the primary method for code contributions from the external contributors in Github. PR review is an essential part of open source software developments for maintaining the quality of software. Matching a new PR of an appropriate integrator will make the PR review more effective. However, PR and integrator matching are now organized manually in Github. To reduce this cost, we presented an AIPM model to predict highly relevant integrator of incoming PRs. AIPM uses topic model to extract topics from the PRs, and builds a one-to-one correspondence between topics and integrators. Then, AIPM finds the most suitable integrator according to the maximum entry of the topic-document distribution. On average, AIPM can reach a precision of 60%, and even in some projects, can reach a precision of 80%.

Keywords: pull Request, integrator matching, Github, open source project, topic model

Procedia PDF Downloads 198
14234 Visualization and Performance Measure to Determine Number of Topics in Twitter Data Clustering Using Hybrid Topic Modeling

Authors: Moulana Mohammed


Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.

Keywords: interactive visualization, visual mon-negative matrix factorization model, optimal number of topics, cluster validity indices, Twitter data clustering

Procedia PDF Downloads 55
14233 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda


In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: independent topic analysis, topic extraction, topic naming, web search engine

Procedia PDF Downloads 48
14232 Using Bidirectional Encoder Representations from Transformers to Extract Topic-Independent Sentiment Features for Social Media Bot Detection

Authors: Maryam Heidari, James H. Jones Jr.


Millions of online posts about different topics and products are shared on popular social media platforms. One use of this content is to provide crowd-sourced information about a specific topic, event or product. However, this use raises an important question: what percentage of information available through these services is trustworthy? In particular, might some of this information be generated by a machine, i.e., a bot, instead of a human? Bots can be, and often are, purposely designed to generate enough volume to skew an apparent trend or position on a topic, yet the consumer of such content cannot easily distinguish a bot post from a human post. In this paper, we introduce a model for social media bot detection which uses Bidirectional Encoder Representations from Transformers (Google Bert) for sentiment classification of tweets to identify topic-independent features. Our use of a Natural Language Processing approach to derive topic-independent features for our new bot detection model distinguishes this work from previous bot detection models. We achieve 94\% accuracy classifying the contents of data as generated by a bot or a human, where the most accurate prior work achieved accuracy of 92\%.

Keywords: bot detection, natural language processing, neural network, social media

Procedia PDF Downloads 45
14231 Stock Market Prediction by Regression Model with Social Moods

Authors: Masahiro Ohmura, Koh Kakusho, Takeshi Okadome


This paper presents a regression model with autocorrelated errors in which the inputs are social moods obtained by analyzing the adjectives in Twitter posts using a document topic model. The regression model predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.

Keywords: stock market prediction, social moods, regression model, DJIA

Procedia PDF Downloads 434
14230 Lecture Video Indexing and Retrieval Using Topic Keywords

Authors: B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa


In this paper, we propose a framework to help users to search and retrieve the portions in the lecture video of their interest. This is achieved by temporally segmenting and indexing the lecture video using the topic keywords. We use transcribed text from the video and documents relevant to the video topic extracted from the web for this purpose. The keywords for indexing are found by applying the non-negative matrix factorization (NMF) topic modeling techniques on the web documents. Our proposed technique first creates indices on the transcribed documents using the topic keywords, and these are mapped to the video to find the start and end time of the portions of the video for a particular topic. This time information is stored in the index table along with the topic keyword which is used to retrieve the specific portions of the video for the query provided by the users.

Keywords: video indexing and retrieval, lecture videos, content based video search, multimodal indexing

Procedia PDF Downloads 149
14229 Emotion Oriented Students' Opinioned Topic Detection for Course Reviews in Massive Open Online Course

Authors: Zhi Liu, Xian Peng, Monika Domanska, Lingyun Kang, Sannyuya Liu


Massive Open education has become increasingly popular among worldwide learners. An increasing number of course reviews are being generated in Massive Open Online Course (MOOC) platform, which offers an interactive feedback channel for learners to express opinions and feelings in learning. These reviews typically contain subjective emotion and topic information towards the courses. However, it is time-consuming to artificially detect these opinions. In this paper, we propose an emotion-oriented topic detection model to automatically detect the students’ opinioned aspects in course reviews. The known overall emotion orientation and emotional words in each review are used to guide the joint probabilistic modeling of emotion and aspects in reviews. Through the experiment on real-life review data, it is verified that the distribution of course-emotion-aspect can be calculated to capture the most significant opinioned topics in each course unit. This proposed technique helps in conducting intelligent learning analytics for teachers to improve pedagogies and for developers to promote user experiences.

Keywords: Massive Open Online Course (MOOC), course reviews, topic model, emotion recognition, topical aspects

Procedia PDF Downloads 161
14228 Trend Detection Using Community Rank and Hawkes Process

Authors: Shashank Bhatnagar, W. Wilfred Godfrey


We develop in this paper, an approach to find the trendy topic, which not only considers the user-topic interaction but also considers the community, in which user belongs. This method modifies the previous approach of user-topic interaction to user-community-topic interaction with better speed-up in the range of [1.1-3]. We assume that trend detection in a social network is dependent on two things. The one is, broadcast of messages in social network governed by self-exciting point process, namely called Hawkes process and the second is, Community Rank. The influencer node links to others in the community and decides the community rank based on its PageRank and the number of users links to that community. The community rank decides the influence of one community over the other. Hence, the Hawkes process with the kernel of user-community-topic decides the trendy topic disseminated into the social network.

Keywords: community detection, community rank, Hawkes process, influencer node, pagerank, trend detection

Procedia PDF Downloads 291
14227 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park


Scripts are one of the basic text resources to understand broadcasting contents. Since broadcast media wields lots of influence over the public, tools for understanding broadcasting contents are more required. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches. Scripts also provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scripts consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics based on statistical learning method. To tackle this problem, we propose a method of learning with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, by using high quality of topics, we can get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: broadcasting contents, scripts, text similarity, topic model

Procedia PDF Downloads 242
14226 Topic Sentiments toward the COVID-19 Vaccine on Twitter

Authors: Melissa Vang, Raheyma Khan, Haihua Chen


The coronavirus disease 2019 (COVID‐19) pandemic has changed people's lives from all over the world. More people have turned to Twitter to engage online and discuss the COVID-19 vaccine. This study aims to present a text mining approach to identify people's attitudes towards the COVID-19 vaccine on Twitter. To achieve this purpose, we collected 54,268 COVID-19 vaccine tweets from September 01, 2020, to November 01, 2020, then the BERT model is used for the sentiment and topic analysis. The results show that people had more negative than positive attitudes about the vaccine, and countries with an increasing number of confirmed cases had a higher percentage of negative attitudes. Additionally, the topics discussed in positive and negative tweets are different. The tweet datasets can be helpful to information professionals to inform the public about vaccine-related informational resources. Our findings may have implications for understanding people's cognitions and feelings about the vaccine.

Keywords: BERT, COVID-19 vaccine, sentiment analysis, topic modeling

Procedia PDF Downloads 57
14225 Priming through Open Book MCQ Test: A Tool for Enhancing Learning in Medical Undergraduates

Authors: Bharti Bhandari, Bharati Mehta, Sabyasachi Sircar


Medical education is advancing in India, with its advancement newer innovations are being incorporated in teaching and assessment methodology. Our study focusses on a teaching innovation that is more student-centric than teacher-centric and is the need of the day. The teaching innovation was carried out in 1st year MBBS students of our institute. Students were assigned control and test groups. Priming was done for the students in the test group with an open-book MCQ based test in a particular topic before delivering formal didactic lecture on that topic. The control group was not assigned any such exercise. This was followed by formal didactic lecture on the same topic. Thereafter, both groups were assessed on the same topic. The marks were compiled and analysed using appropriate statistical tests. Students were also given questionnaire to elicit their views on the benefits of “self-priming”. The mean marks scored in theory assessment by the test group were statistically higher than the marks scored by the controls. According to students’ feedback, the ‘self-priming “process was interesting, helped in better orientation during class-room lectures and better understanding of the topic. They want it to be repeated for other topics with moderate difficulty level. Better performance of the students in the primed group validates the combination of student-centric priming model and didactic lecture as superior to the conventional, teacher-centric methods alone. If this system is successfully followed, the present teacher-centric pedagogy should increasingly give way to student-centric activities where the teacher is only a facilitator.

Keywords: medical education, open-book test, pedagogy, priming

Procedia PDF Downloads 348
14224 Restricted Boltzmann Machines and Deep Belief Nets for Market Basket Analysis: Statistical Performance and Managerial Implications

Authors: H. Hruschka


This paper presents the first comparison of the performance of the restricted Boltzmann machine and the deep belief net on binary market basket data relative to binary factor analysis and the two best-known topic models, namely Dirichlet allocation and the correlated topic model. This comparison shows that the restricted Boltzmann machine and the deep belief net are superior to both binary factor analysis and topic models. Managerial implications that differ between the investigated models are treated as well. The restricted Boltzmann machine is defined as joint Boltzmann distribution of hidden variables and observed variables (purchases). It comprises one layer of observed variables and one layer of hidden variables. Note that variables of the same layer are not connected. The comparison also includes deep belief nets with three layers. The first layer is a restricted Boltzmann machine based on category purchases. Hidden variables of the first layer are used as input variables by the second-layer restricted Boltzmann machine which then generates second-layer hidden variables. Finally, in the third layer hidden variables are related to purchases. A public data set is analyzed which contains one month of real-world point-of-sale transactions in a typical local grocery outlet. It consists of 9,835 market baskets referring to 169 product categories. This data set is randomly split into two halves. One half is used for estimation, the other serves as holdout data. Each model is evaluated by the log likelihood for the holdout data. Performance of the topic models is disappointing as the holdout log likelihood of the correlated topic model – which is better than Dirichlet allocation - is lower by more than 25,000 compared to the best binary factor analysis model. On the other hand, binary factor analysis on its own is clearly surpassed by both the restricted Boltzmann machine and the deep belief net whose holdout log likelihoods are higher by more than 23,000. Overall, the deep belief net performs best. We also interpret hidden variables discovered by binary factor analysis, the restricted Boltzmann machine and the deep belief net. Hidden variables characterized by the product categories to which they are related differ strongly between these three models. To derive managerial implications we assess the effect of promoting each category on total basket size, i.e., the number of purchased product categories, due to each category's interdependence with all the other categories. The investigated models lead to very different implications as they disagree about which categories are associated with higher basket size increases due to a promotion. Of course, recommendations based on better performing models should be preferred. The impressive performance advantages of the restricted Boltzmann machine and the deep belief net suggest continuing research by appropriate extensions. To include predictors, especially marketing variables such as price, seems to be an obvious next step. It might also be feasible to take a more detailed perspective by considering purchases of brands instead of purchases of product categories.

Keywords: binary factor analysis, deep belief net, market basket analysis, restricted Boltzmann machine, topic models

Procedia PDF Downloads 114
14223 Recognizing an Individual, Their Topic of Conversation and Cultural Background from 3D Body Movement

Authors: Gheida J. Shahrour, Martin J. Russell


The 3D body movement signals captured during human-human conversation include clues not only to the content of people’s communication but also to their culture and personality. This paper is concerned with automatic extraction of this information from body movement signals. For the purpose of this research, we collected a novel corpus from 27 subjects, arranged them into groups according to their culture. We arranged each group into pairs and each pair communicated with each other about different topics. A state-of-art recognition system is applied to the problems of person, culture, and topic recognition. We borrowed modeling, classification, and normalization techniques from speech recognition. We used Gaussian Mixture Modeling (GMM) as the main technique for building our three systems, obtaining 77.78%, 55.47%, and 39.06% from the person, culture, and topic recognition systems respectively. In addition, we combined the above GMM systems with Support Vector Machines (SVM) to obtain 85.42%, 62.50%, and 40.63% accuracy for person, culture, and topic recognition respectively. Although direct comparison among these three recognition systems is difficult, it seems that our person recognition system performs best for both GMM and GMM-SVM, suggesting that inter-subject differences (i.e. subject’s personality traits) are a major source of variation. When removing these traits from culture and topic recognition systems using the Nuisance Attribute Projection (NAP) and the Intersession Variability Compensation (ISVC) techniques, we obtained 73.44% and 46.09% accuracy from culture and topic recognition systems respectively.

Keywords: person recognition, topic recognition, culture recognition, 3D body movement signals, variability compensation

Procedia PDF Downloads 456
14222 Two Quasiparticle Rotor Model for Deformed Nuclei

Authors: Alpana Goel, Kawalpreet Kalra


The study of level structures of deformed nuclei is the most complex topic in nuclear physics. For the description of level structure, a simple model is good enough to bring out the basic features which may then be further refined. The low lying level structures of these nuclei can, therefore, be understood in terms of Two Quasiparticle plus axially symmetric Rotor Model (TQPRM). The formulation of TQPRM for deformed nuclei has been presented. The analysis of available experimental data on two quasiparticle rotational bands of deformed nuclei present unusual features like signature dependence, odd-even staggering, signature inversion and signature reversal in two quasiparticle rotational bands of deformed nuclei. These signature effects are well discussed within the framework of TQPRM. The model is well efficient in reproducing the large odd-even staggering and anomalous features observed in even-even and odd-odd deformed nuclei. The effect of particle-particle and the Coriolis coupling is well established from the model. Detailed description of the model with implications to deformed nuclei is presented in the paper.

Keywords: deformed nuclei, signature effects, signature inversion, signature reversal

Procedia PDF Downloads 83
14221 Focus-Latent Dirichlet Allocation for Aspect-Level Opinion Mining

Authors: Mohsen Farhadloo, Majid Farhadloo


Aspect-level opinion mining that aims at discovering aspects (aspect identification) and their corresponding ratings (sentiment identification) from customer reviews have increasingly attracted attention of researchers and practitioners as it provides valuable insights about products/services from customer's points of view. Instead of addressing aspect identification and sentiment identification in two separate steps, it is possible to simultaneously identify both aspects and sentiments. In recent years many graphical models based on Latent Dirichlet Allocation (LDA) have been proposed to solve both aspect and sentiment identifications in a single step. Although LDA models have been effective tools for the statistical analysis of document collections, they also have shortcomings in addressing some unique characteristics of opinion mining. Our goal in this paper is to address one of the limitations of topic models to date; that is, they fail to directly model the associations among topics. Indeed in many text corpora, it is natural to expect that subsets of the latent topics have higher probabilities. We propose a probabilistic graphical model called focus-LDA, to better capture the associations among topics when applied to aspect-level opinion mining. Our experiments on real-life data sets demonstrate the improved effectiveness of the focus-LDA model in terms of the accuracy of the predictive distributions over held out documents. Furthermore, we demonstrate qualitatively that the focus-LDA topic model provides a natural way of visualizing and exploring unstructured collection of textual data.

Keywords: aspect-level opinion mining, document modeling, Latent Dirichlet Allocation, LDA, sentiment analysis

Procedia PDF Downloads 32
14220 A New Nonlinear State-Space Model and Its Application

Authors: Abdullah Eqal Al Mazrooei


In this work, a new nonlinear model will be introduced. The model is in the state-space form. The nonlinearity of this model is in the state equation where the state vector is multiplied by its self. This technique makes our model generalizes many famous models as Lotka-Volterra model and Lorenz model which have many applications in the real life. We will apply our new model to estimate the wind speed by using a new nonlinear estimator which suitable to work with our model.

Keywords: nonlinear systems, state-space model, Kronecker product, nonlinear estimator

Procedia PDF Downloads 558
14219 Analysis of Trends in Environmental Health Research Using Topic Modeling

Authors: Hayoung Cho, Gabi Cho


In response to the continuing increase of demands for living environment safety, the Korean government has established and implemented various environmental health policies and set a high priority to the related R&D. However, the level of related technologies such as environmental risk assessment are still relatively low, and there is a need for detailed investment strategies in the field of environmental health research. As scientific research papers can give valuable implications on the development of a certain field, this study analyzed the global research trends in the field of environmental health over the past 10 years (2005~2015). Research topics were extracted from abstracts of the collected SCI papers using topic modeling to study the changes in research trends and discover emerging technologies. The method of topic modeling can improve the traditional bibliometric approach and provide a more comprehensive review of the global research development. The results of this study are expected to help provide insights for effective policy making and R&D investment direction.

Keywords: environmental health, paper analysis, research trends, topic modeling

Procedia PDF Downloads 206
14218 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz


In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 74
14217 Artificial Intelligence Assisted Sentiment Analysis of Hotel Reviews Using Topic Modeling

Authors: Sushma Ghogale


With a surge in user-generated content or feedback or reviews on the internet, it has become possible and important to know consumers' opinions about products and services. This data is important for both potential customers and businesses providing the services. Data from social media is attracting significant attention and has become the most prominent channel of expressing an unregulated opinion. Prospective customers look for reviews from experienced customers before deciding to buy a product or service. Several websites provide a platform for users to post their feedback for the provider and potential customers. However, the biggest challenge in analyzing such data is in extracting latent features and providing term-level analysis of the data. This paper proposes an approach to use topic modeling to classify the reviews into topics and conduct sentiment analysis to mine the opinions. This approach can analyse and classify latent topics mentioned by reviewers on business sites or review sites, or social media using topic modeling to identify the importance of each topic. It is followed by sentiment analysis to assess the satisfaction level of each topic. This approach provides a classification of hotel reviews using multiple machine learning techniques and comparing different classifiers to mine the opinions of user reviews through sentiment analysis. This experiment concludes that Multinomial Naïve Bayes classifier produces higher accuracy than other classifiers.

Keywords: latent Dirichlet allocation, topic modeling, text classification, sentiment analysis

Procedia PDF Downloads 29
14216 Academic Staff Recruitment in Islamic University: A Proposed Holistic Model

Authors: Syahruddin Sumardi, Indra Fajar Alamsyah, Junaidah Hashim


This study attempts to explore and presents a proposed recruitment model in Islamic university which aligned with holistic role. It is a conceptual paper in nature. In turn, this study is designed to utilize exploratory approach. Literature and document review that related to this topic are used as the methods to analyse the content found. Recruitment for any organization is fundamental to achieve its goal effectively. Staffing in universities is vital due to the importance role of lecturers. Currently, Islamic universities still adopt the common process of recruitment for their academic staffs. Whereas, they have own characteristics which are embedded in their institutions. Furthermore, the FCWC (Foundation, Capability, Worldview and Commitment) model of recruitment proposes to suit the holistic character of Islamic university. Further studies are required to empirically validate the concept through systematic investigations. Additionally, measuring this model by a designed means is appreciated. The model provides the map and alternative tool of recruitment for Islamic universities to determine the process of recruitment which can appropriate their institutions. In addition, it also allows stakeholders and policy makers to consider regarding Islamic values that should inculcate in the Islamic higher learning institutions. This study initiates a foundational contribution for an early sequence of research.

Keywords: academic staff, Islamic values, recruitment model, university

Procedia PDF Downloads 85
14215 Investigating Dynamic Transition Process of Issues Using Unstructured Text Analysis

Authors: Myungsu Lim, William Xiu Shun Wong, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Namgyu Kim


The amount of real-time data generated through various mass media has been increasing rapidly. In this study, we had performed topic analysis by using the unstructured text data that is distributed through news article. As one of the most prevalent applications of topic analysis, the issue tracking technique investigates the changes of the social issues that identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has limitation that it cannot discover dynamic mutation process of complex social issues. The purpose of this study is to overcome the limitations of the existing issue tracking method. We first derived core issues of each period, and then discover the dynamic mutation process of various issues. In this study, we further analyze the mutation process from the perspective of the issues categories, in order to figure out the pattern of issue flow, including the frequency and reliability of the pattern. In other words, this study allows us to understand the components of the complex issues by tracking the dynamic history of issues. This methodology can facilitate a clearer understanding of complex social phenomena by providing mutation history and related category information of the phenomena.

Keywords: Data Mining, Issue Tracking, Text Mining, topic Analysis, topic Detection, Trend Detection

Procedia PDF Downloads 317
14214 Diversity in Finance Literature Revealed through the Lens of Machine Learning: A Topic Modeling Approach on Academic Papers

Authors: Oumaima Lahmar


This paper aims to define a structured topography for finance researchers seeking to navigate the body of knowledge in their extrapolation of finance phenomena. To make sense of the body of knowledge in finance, a probabilistic topic modeling approach is applied on 6000 abstracts of academic articles published in three top journals in finance between 1976 and 2020. This approach combines both machine learning techniques and natural language processing to statistically identify the conjunctions between research articles and their shared topics described each by relevant keywords. The topic modeling analysis reveals 35 coherent topics that can well depict finance literature and provide a comprehensive structure for the ongoing research themes. Comparing the extracted topics to the Journal of Economic Literature (JEL) classification system, a significant similarity was highlighted between the characterizing keywords. On the other hand, we identify other topics that do not match the JEL classification despite being relevant in the finance literature.

Keywords: finance literature, textual analysis, topic modeling, perplexity

Procedia PDF Downloads 29
14213 Interoperability Maturity Models for Consideration When Using School Management Systems in South Africa: A Scoping Review

Authors: Keneilwe Maremi, Marlien Herselman, Adele Botha


The main purpose and focus of this paper are to determine the Interoperability Maturity Models to consider when using School Management Systems (SMS). The importance of this is to inform and help schools with knowing which Interoperability Maturity Model is best suited for their SMS. To address the purpose, this paper will apply a scoping review to ensure that all aspects are provided. The scoping review will include papers written from 2012-2019 and a comparison of the different types of Interoperability Maturity Models will be discussed in detail, which includes the background information, the levels of interoperability, and area for consideration in each Maturity Model. The literature was obtained from the following databases: IEEE Xplore and Scopus, the following search engines were used: Harzings, and Google Scholar. The topic of the paper was used as a search term for the literature and the term ‘Interoperability Maturity Models’ was used as a keyword. The data were analyzed in terms of the definition of Interoperability, Interoperability Maturity Models, and levels of interoperability. The results provide a table that shows the focus area of concern for each Maturity Model (based on the scoping review where only 24 papers were found to be best suited for the paper out of 740 publications initially identified in the field). This resulted in the most discussed Interoperability Maturity Model for consideration (Information Systems Interoperability Maturity Model (ISIMM) and Organizational Interoperability Maturity Model for C2 (OIM)).

Keywords: interoperability, interoperability maturity model, school management system, scoping review

Procedia PDF Downloads 69
14212 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee


Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 409
14211 Topic Prominence and Temporal Encoding in Mandarin Chinese

Authors: Tzu-I Chiang


A central question for finite-nonfinite distinction in Mandarin Chinese is how does Mandarin encode temporal information without the grammatical contrast between past and present tense. Moreover, how do L2 learners of Mandarin whose native language is English and whose L1 system has tense morphology, acquire the temporal encoding system in L2 Mandarin? The current study reports preliminary findings on the relationship between topic prominence and the temporal encoding in L1 and L2 Chinese. Oral narratives data from 30 natives and learners of Mandarin Chinese were collected via a film-retell task. In terms of coding, predicates collected from the narratives were transcribed and then coded based on four major verb types: n-degree Statives (quality-STA), point-scale Statives (status-STA), n-atom EVENT (ACT), and point EVENT (resultative-ACT). How native speakers and non-native speakers started retelling the story was calculated. Results of the study show that native speakers of Chinese tend to express Topic Time (TT) syntactically at the topic position; whereas L2 learners of Chinese across levels rely mainly on the default time encoded in the event types. Moreover, as the proficiency level of the learner increases, learners’ appropriate use of the event predicates increased, which supports the argument that L2 development of temporal encoding is affected by lexical aspect.

Keywords: topic prominence, temporal encoding, lexical aspect, L2 acquisition

Procedia PDF Downloads 117
14210 Logistic Regression Model versus Additive Model for Recurrent Event Data

Authors: Entisar A. Elgmati


Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.

Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event

Procedia PDF Downloads 495
14209 Solar Energy Technology Adoption; A Vignette Study for the Up-Scale Residential Sector in Egypt

Authors: Mazen Zaki, Sherwat E. Ibrahim


Renewable energy has become a very important and critical topic all around the world due to the limited resources that led to shifting to the trend of renewable energy and its integration with the conventional ones. This paper investigates the adoption of the solar energy technology for up-scale residential sector in Cairo, Egypt. The technology acceptance model uses several stakeholder points’ of views to develop vignettes to be used in examining the intention and attitude of the householders to adopt the solar energy technology.

Keywords: solar energy, technology acceptance model, TAM, stakeholder analysis, vignette, residential sector

Procedia PDF Downloads 63