Search results for: text mining analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28144

Search results for: text mining analysis

27994 Multimodal Sentiment Analysis With Web Based Application

Authors: Shreyansh Singh, Afroz Ahmed

Abstract:

Sentiment Analysis intends to naturally reveal the hidden mentality that we hold towards an entity. The total of this assumption over a populace addresses sentiment surveying and has various applications. Current text-based sentiment analysis depends on the development of word embeddings and Machine Learning models that take in conclusion from enormous text corpora. Sentiment Analysis from text is presently generally utilized for consumer loyalty appraisal and brand insight investigation. With the expansion of online media, multimodal assessment investigation is set to carry new freedoms with the appearance of integral information streams for improving and going past text-based feeling examination using the new transforms methods. Since supposition can be distinguished through compelling follows it leaves, like facial and vocal presentations, multimodal opinion investigation offers good roads for examining facial and vocal articulations notwithstanding the record or printed content. These methodologies use the Recurrent Neural Networks (RNNs) with the LSTM modes to increase their performance. In this study, we characterize feeling and the issue of multimodal assessment investigation and audit ongoing advancements in multimodal notion examination in various spaces, including spoken surveys, pictures, video websites, human-machine, and human-human connections. Difficulties and chances of this arising field are additionally examined, promoting our theory that multimodal feeling investigation holds critical undiscovered potential.

Keywords: sentiment analysis, RNN, LSTM, word embeddings

Procedia PDF Downloads 94
27993 Mining Diagnostic Investigation Process

Authors: Sohail Imran, Tariq Mahmood

Abstract:

In complex healthcare diagnostic investigation process, medical practitioners have to focus on ways to standardize their processes to perform high quality care and optimize the time and costs. Process mining techniques can be applied to extract process related knowledge from data without considering causal and dynamic dependencies in business domain and processes. The application of process mining is effective in diagnostic investigation. It is very helpful where a treatment gives no dispositive evidence favoring it. In this paper, we applied process mining to discover important process flow of diagnostic investigation for hepatitis patients. This approach has some benefits which can enhance the quality and efficiency of diagnostic investigation processes.

Keywords: process mining, healthcare, diagnostic investigation process, process flow

Procedia PDF Downloads 495
27992 Analyzing Semantic Feature Using Multiple Information Sources for Reviews Summarization

Authors: Yu Hung Chiang, Hei Chia Wang

Abstract:

Nowadays, tourism has become a part of life. Before reserving hotels, customers need some information, which the most important source is online reviews, about hotels to help them make decisions. Due to the dramatic growing of online reviews, it is impossible for tourists to read all reviews manually. Therefore, designing an automatic review analysis system, which summarizes reviews, is necessary for them. The main purpose of the system is to understand the opinion of reviews, which may be positive or negative. In other words, the system would analyze whether the customers who visited the hotel like it or not. Using sentiment analysis methods will help the system achieve the purpose. In sentiment analysis methods, the targets of opinion (here they are called the feature) should be recognized to clarify the polarity of the opinion because polarity of the opinion may be ambiguous. Hence, the study proposes an unsupervised method using Part-Of-Speech pattern and multi-lexicons sentiment analysis to summarize all reviews. We expect this method can help customers search what they want information as well as make decisions efficiently.

Keywords: text mining, sentiment analysis, product feature extraction, multi-lexicons

Procedia PDF Downloads 307
27991 An Adaptive Distributed Incremental Association Rule Mining System

Authors: Adewale O. Ogunde, Olusegun Folorunso, Adesina S. Sodiya

Abstract:

Most existing Distributed Association Rule Mining (DARM) systems are still facing several challenges. One of such challenges that have not received the attention of many researchers is the inability of existing systems to adapt to constantly changing databases and mining environments. In this work, an Adaptive Incremental Mining Algorithm (AIMA) is therefore proposed to address these problems. AIMA employed multiple mobile agents for the entire mining process. AIMA was designed to adapt to changes in the distributed databases by mining only the incremental database updates and using this to update the existing rules in order to improve the overall response time of the DARM system. In AIMA, global association rules were integrated incrementally from one data site to another through Results Integration Coordinating Agents. The mining agents in AIMA were made adaptive by defining mining goals with reasoning and behavioral capabilities and protocols that enabled them to either maintain or change their goals. AIMA employed Java Agent Development Environment Extension for designing the internal agents’ architecture. Results from experiments conducted on real datasets showed that the adaptive system, AIMA performed better than the non-adaptive systems with lower communication costs and higher task completion rates.

Keywords: adaptivity, data mining, distributed association rule mining, incremental mining, mobile agents

Procedia PDF Downloads 369
27990 Value Analysis of Islamic Banking and Conventional Banking to Measure Value Co-Creation

Authors: Amna Javed, Hisashi Masuda, Youji Kohda

Abstract:

This study examines the value analysis in Islamic and conventional banking services in Pakistan. Many scholars have focused on co-creation of values in services but mainly economic values not non-economic. As Islamic banking is based on Islamic principles that are more concerned with non-economic values (well-being, partnership, fairness, trust worthy, and justice) than economic values as money in terms of interest. This study is important to know the providers point of view about the co-created values, because, it may be more sustainable and appropriate for today’s unpredictable socioeconomic environment. Data were collected from 4 banks (2 Islamic and 2 conventional banks). Text mining technique is applied for data analysis, and values with 100% occurrences in Islamic banking are chosen. The results reflect that Islamic banking is more centric towards non-economic values than economic values and it promotes team work and partnership concept by applying Islamic spirit and trust worthiness concept.

Keywords: economic values, Islamic banking, non-economic values, value system

Procedia PDF Downloads 434
27989 Performance Evaluation of Production Schedules Based on Process Mining

Authors: Kwan Hee Han

Abstract:

External environment of enterprise is rapidly changing majorly by global competition, cost reduction pressures, and new technology. In these situations, production scheduling function plays a critical role to meet customer requirements and to attain the goal of operational efficiency. It deals with short-term decision making in the production process of the whole supply chain. The major task of production scheduling is to seek a balance between customer orders and limited resources. In manufacturing companies, this task is so difficult because it should efficiently utilize resource capacity under the careful consideration of many interacting constraints. At present, many computerized software solutions have been utilized in many enterprises to generate a realistic production schedule to overcome the complexity of schedule generation. However, most production scheduling systems do not provide sufficient information about the validity of the generated schedule except limited statistics. Process mining only recently emerged as a sub-discipline of both data mining and business process management. Process mining techniques enable the useful analysis of a wide variety of processes such as process discovery, conformance checking, and bottleneck analysis. In this study, the performance of generated production schedule is evaluated by mining event log data of production scheduling software system by using the process mining techniques since every software system generates event logs for the further use such as security investigation, auditing and error bugging. An application of process mining approach is proposed for the validation of the goodness of production schedule generated by scheduling software systems in this study. By using process mining techniques, major evaluation criteria such as utilization of workstation, existence of bottleneck workstations, critical process route patterns, and work load balance of each machine over time are measured, and finally, the goodness of production schedule is evaluated. By using the proposed process mining approach for evaluating the performance of generated production schedule, the quality of production schedule of manufacturing enterprises can be improved.

Keywords: data mining, event log, process mining, production scheduling

Procedia PDF Downloads 248
27988 Text Analysis to Support Structuring and Modelling a Public Policy Problem-Outline of an Algorithm to Extract Inferences from Textual Data

Authors: Claudia Ehrentraut, Osama Ibrahim, Hercules Dalianis

Abstract:

Policy making situations are real-world problems that exhibit complexity in that they are composed of many interrelated problems and issues. To be effective, policies must holistically address the complexity of the situation rather than propose solutions to single problems. Formulating and understanding the situation and its complex dynamics, therefore, is a key to finding holistic solutions. Analysis of text based information on the policy problem, using Natural Language Processing (NLP) and Text analysis techniques, can support modelling of public policy problem situations in a more objective way based on domain experts knowledge and scientific evidence. The objective behind this study is to support modelling of public policy problem situations, using text analysis of verbal descriptions of the problem. We propose a formal methodology for analysis of qualitative data from multiple information sources on a policy problem to construct a causal diagram of the problem. The analysis process aims at identifying key variables, linking them by cause-effect relationships and mapping that structure into a graphical representation that is adequate for designing action alternatives, i.e., policy options. This study describes the outline of an algorithm used to automate the initial step of a larger methodological approach, which is so far done manually. In this initial step, inferences about key variables and their interrelationships are extracted from textual data to support a better problem structuring. A small prototype for this step is also presented.

Keywords: public policy, problem structuring, qualitative analysis, natural language processing, algorithm, inference extraction

Procedia PDF Downloads 562
27987 On an Approach for Rule Generation in Association Rule Mining

Authors: B. Chandra

Abstract:

In Association Rule Mining, much attention has been paid for developing algorithms for large (frequent/closed/maximal) itemsets but very little attention has been paid to improve the performance of rule generation algorithms. Rule generation is an important part of Association Rule Mining. In this paper, a novel approach named NARG (Association Rule using Antecedent Support) has been proposed for rule generation that uses memory resident data structure named FCET (Frequent Closed Enumeration Tree) to find frequent/closed itemsets. In addition, the computational speed of NARG is enhanced by giving importance to the rules that have lower antecedent support. Comparative performance evaluation of NARG with fast association rule mining algorithm for rule generation has been done on synthetic datasets and real life datasets (taken from UCI Machine Learning Repository). Performance analysis shows that NARG is computationally faster in comparison to the existing algorithms for rule generation.

Keywords: knowledge discovery, association rule mining, antecedent support, rule generation

Procedia PDF Downloads 296
27986 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 70
27985 Written Argumentative Texts in Elementary School: The Development of Text Structure and Its Relation to Reading Comprehension

Authors: Sara Zadunaisky Ehrlich, Batia Seroussi, Anat Stavans

Abstract:

Text structure is a parameter of text quality. This study investigated the structure of written argumentative texts produced by elementary school age children. We set two objectives: to identify and trace the structural components of the argumentative texts and to investigate whether reading comprehension skills were correlated with text structure. 293 school children from 2nd to 5th grades were asked to write two argumentative texts about informal or everyday life controversial topics and completed two reading tasks that targeted different levels of text comprehension. The findings indicated, on the one hand, significant developmental differences between mature and more novice writers in terms of text length and mean proportion of clauses produced for a better elaboration of the different text components. On the other hand, with certain fluctuations, no meaningful differences were found in terms of presence of text structure: at all grade levels, elementary school children produced the basic and minimal structure that included the writer's argument and reasons or arguments' supports. Counter-arguments were scarce even in the upper grades. While the children captured that essentially an argument must be justified, the more the number of supports produced, the fewer the clauses the children produced. Last, weak to mild relations were found between reading comprehension and argumentative text structure. Nevertheless, children who scored higher on sophisticated questions that require inferential or world knowledge displayed more elaborated structures in terms of text length and size of supports to the writer's argument. These findings indicate how school-age children perceive the basic template of an argument with future implications regarding how to elaborate written arguments.

Keywords: argumentative text, text structure, elementary school children, written argumentations

Procedia PDF Downloads 132
27984 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: bilingual, deviations, morphology, texts

Procedia PDF Downloads 247
27983 Understanding the Qualitative Nature of Product Reviews by Integrating Text Processing Algorithm and Usability Feature Extraction

Authors: Cherry Yieng Siang Ling, Joong Hee Lee, Myung Hwan Yun

Abstract:

The quality of a product to be usable has become the basic requirement in consumer’s perspective while failing the requirement ends up the customer from not using the product. Identifying usability issues from analyzing quantitative and qualitative data collected from usability testing and evaluation activities aids in the process of product design, yet the lack of studies and researches regarding analysis methodologies in qualitative text data of usability field inhibits the potential of these data for more useful applications. While the possibility of analyzing qualitative text data found with the rapid development of data analysis studies such as natural language processing field in understanding human language in computer, and machine learning field in providing predictive model and clustering tool. Therefore, this research aims to study the application capability of text processing algorithm in analysis of qualitative text data collected from usability activities. This research utilized datasets collected from LG neckband headset usability experiment in which the datasets consist of headset survey text data, subject’s data and product physical data. In the analysis procedure, which integrated with the text-processing algorithm, the process includes training of comments onto vector space, labeling them with the subject and product physical feature data, and clustering to validate the result of comment vector clustering. The result shows 'volume and music control button' as the usability feature that matches best with the cluster of comment vectors where centroid comments of a cluster emphasized more on button positions, while centroid comments of the other cluster emphasized more on button interface issues. When volume and music control buttons are designed separately, the participant experienced less confusion, and thus, the comments mentioned only about the buttons' positions. While in the situation where the volume and music control buttons are designed as a single button, the participants experienced interface issues regarding the buttons such as operating methods of functions and confusion of functions' buttons. The relevance of the cluster centroid comments with the extracted feature explained the capability of text processing algorithms in analyzing qualitative text data from usability testing and evaluations.

Keywords: usability, qualitative data, text-processing algorithm, natural language processing

Procedia PDF Downloads 255
27982 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 222
27981 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 155
27980 Issue Reorganization Using the Measure of Relevance

Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim

Abstract:

Recently, the demand of extracting the R&D keywords from the issues and using them in retrieving R&D information is increasing rapidly. But it is hard to identify the related issues or to distinguish them. Although the similarity between the issues cannot be identified, but with the R&D lexicon, the issues that always shared the same R&D keywords can be determined. In details, the R&D keywords that associated with particular issue is implied the key technology elements that needed to solve the problem of the particular issue. Furthermore, the related issues that sharing the same R&D keywords can be showed in a more systematic way through the issue clustering constructed from the perspective of R&D. Thus, sharing of the R&D result and reusable of the R&D technology can be facilitated. Indirectly, the redundancy of investment on the same R&D can be reduce as the R&D information can be shared between those corresponding issues and reusability of the related R&D can be improved. Therefore, a methodology of constructing an issue clustering from the perspective of common R&D keywords is proposed to satisfy the demands mentioned.

Keywords: clustering, social network analysis, text mining, topic analysis

Procedia PDF Downloads 551
27979 Data Mining As A Tool For Knowledge Management: A Review

Authors: Maram Saleh

Abstract:

Knowledge has become an essential resource in today’s economy and become the most important asset of maintaining competition advantage in organizations. The importance of knowledge has made organizations to manage their knowledge assets and resources through all multiple knowledge management stages such as: Knowledge Creation, knowledge storage, knowledge sharing and knowledge use. Researches on data mining are continues growing over recent years on both business and educational fields. Data mining is one of the most important steps of the knowledge discovery in databases process aiming to extract implicit, unknown but useful knowledge and it is considered as significant subfield in knowledge management. Data miming have the great potential to help organizations to focus on extracting the most important information on their data warehouses. Data mining tools and techniques can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This review paper explores the applications of data mining techniques in supporting knowledge management process as an effective knowledge discovery technique. In this paper, we identify the relationship between data mining and knowledge management, and then focus on introducing some application of date mining techniques in knowledge management for some real life domains.

Keywords: Data Mining, Knowledge management, Knowledge discovery, Knowledge creation.

Procedia PDF Downloads 182
27978 Modelling of Powered Roof Supports Work

Authors: Marcin Michalak

Abstract:

Due to the increasing efforts on saving our natural environment a change in the structure of energy resources can be observed - an increasing fraction of a renewable energy sources. In many countries traditional underground coal mining loses its significance but there are still countries, like Poland or Germany, in which the coal based technologies have the greatest fraction in a total energy production. This necessitates to make an effort to limit the costs and negative effects of underground coal mining. The longwall complex is as essential part of the underground coal mining. The safety and the effectiveness of the work is strongly dependent of the diagnostic state of powered roof supports. The building of a useful and reliable diagnostic system requires a lot of data. As the acquisition of a data of any possible operating conditions it is important to have a possibility to generate a demanded artificial working characteristics. In this paper a new approach of modelling a leg pressure in the single unit of powered roof support. The model is a result of the analysis of a typical working cycles.

Keywords: machine modelling, underground mining, coal mining, structure

Procedia PDF Downloads 339
27977 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 492
27976 Design and Development of Data Mining Application for Medical Centers in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

Data Mining is the extraction of information from a large database which helps in predicting a trend or behavior, thereby helping management make knowledge-driven decisions. One principal problem of most hospitals in rural areas is making use of the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method, which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to easily retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: data mining, medical record system, systems programming, computing

Procedia PDF Downloads 184
27975 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 508
27974 Short Text Classification Using Part of Speech Feature to Analyze Students' Feedback of Assessment Components

Authors: Zainab Mutlaq Ibrahim, Mohamed Bader-El-Den, Mihaela Cocea

Abstract:

Students' textual feedback can hold unique patterns and useful information about learning process, it can hold information about advantages and disadvantages of teaching methods, assessment components, facilities, and other aspects of teaching. The results of analysing such a feedback can form a key point for institutions’ decision makers to advance and update their systems accordingly. This paper proposes a data mining framework for analysing end of unit general textual feedback using part of speech feature (PoS) with four machine learning algorithms: support vector machines, decision tree, random forest, and naive bays. The proposed framework has two tasks: first, to use the above algorithms to build an optimal model that automatically classifies the whole data set into two subsets, one subset is tailored to assessment practices (assessment related), and the other one is the non-assessment related data. Second task to use the same algorithms to build an optimal model for whole data set, and the new data subsets to automatically detect their sentiment. The significance of this paper is to compare the performance of the above four algorithms using part of speech feature to the performance of the same algorithms using n-grams feature. The paper follows Knowledge Discovery and Data Mining (KDDM) framework to construct the classification and sentiment analysis models, which is understanding the assessment domain, cleaning and pre-processing the data set, selecting and running the data mining algorithm, interpreting mined patterns, and consolidating the discovered knowledge. The results of this paper experiments show that both models which used both features performed very well regarding first task. But regarding the second task, models that used part of speech feature has underperformed in comparison with models that used unigrams and bigrams.

Keywords: assessment, part of speech, sentiment analysis, student feedback

Procedia PDF Downloads 111
27973 Data Mining Techniques for Anti-Money Laundering

Authors: M. Sai Veerendra

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.

Keywords: data mining, clustering, money laundering, anti-money laundering solutions

Procedia PDF Downloads 514
27972 Mining in Nigeria and Development Effort of Metallurgical Technologies at National Metallurgical Development Center Jos, Plateau State-Nigeria

Authors: Linus O. Asuquo

Abstract:

Mining in Nigeria and development effort of metallurgical technologies at National Metallurgical Development Centre Jos has been addressed in this paper. The paper has looked at the history of mining in Nigeria, the impact of mining on social and industrial development, and the contribution of the mining sector to Nigeria’s Gross Domestic Product (GDP). The paper clearly stated that Nigeria’s mining sector only contributes 0.5% to the nation’s GDP unlike Botswana that the mining sector contributes 38% to the nation’s GDP. Nigeria Bureau of Statistics has it on record that Nigeria has about 44 solid minerals awaiting to be exploited. Clearly highlighted by this paper is the abundant potentials that exist in the mining sector for investment. The paper made an exposition on the extensive efforts made at National Metallurgical Development Center (NMDC) to develop metallurgical technologies in various areas of the metals sector; like mineral processing, foundry development, nonferrous metals extraction, materials testing, lime calcination, ANO (Trade name for powder lubricant) wire drawing lubricant, refractories and many others. The paper went ahead to draw a conclusion that there is a need to develop the mining sector in Nigeria and to give a sustainable support to the efforts currently made at NMDC to develop metallurgical technologies which are capable of transforming the metals sector in Nigeria, which will lead to industrialization. Finally the paper made some recommendations which traverse the topic for the best expectation.

Keywords: mining, minerals, technologies, value addition

Procedia PDF Downloads 70
27971 Extracting Opinions from Big Data of Indonesian Customer Reviews Using Hadoop MapReduce

Authors: Veronica S. Moertini, Vinsensius Kevin, Gede Karya

Abstract:

Customer reviews have been collected by many kinds of e-commerce websites selling products, services, hotel rooms, tickets and so on. Each website collects its own customer reviews. The reviews can be crawled, collected from those websites and stored as big data. Text analysis techniques can be used to analyze that data to produce summarized information, such as customer opinions. Then, these opinions can be published by independent service provider websites and used to help customers in choosing the most suitable products or services. As the opinions are analyzed from big data of reviews originated from many websites, it is expected that the results are more trusted and accurate. Indonesian customers write reviews in Indonesian language, which comes with its own structures and uniqueness. We found that most of the reviews are expressed with “daily language”, which is informal, do not follow the correct grammar, have many abbreviations and slangs or non-formal words. Hadoop is an emerging platform aimed for storing and analyzing big data in distributed systems. A Hadoop cluster consists of master and slave nodes/computers operated in a network. Hadoop comes with distributed file system (HDFS) and MapReduce framework for supporting parallel computation. However, MapReduce has weakness (i.e. inefficient) for iterative computations, specifically, the cost of reading/writing data (I/O cost) is high. Given this fact, we conclude that MapReduce function is best adapted for “one-pass” computation. In this research, we develop an efficient technique for extracting or mining opinions from big data of Indonesian reviews, which is based on MapReduce with one-pass computation. In designing the algorithm, we avoid iterative computation and instead adopt a “look up table” technique. The stages of the proposed technique are: (1) Crawling the data reviews from websites; (2) cleaning and finding root words from the raw reviews; (3) computing the frequency of the meaningful opinion words; (4) analyzing customers sentiments towards defined objects. The experiments for evaluating the performance of the technique were conducted on a Hadoop cluster with 14 slave nodes. The results show that the proposed technique (stage 2 to 4) discovers useful opinions, is capable of processing big data efficiently and scalable.

Keywords: big data analysis, Hadoop MapReduce, analyzing text data, mining Indonesian reviews

Procedia PDF Downloads 183
27970 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification

Authors: Bharatendra Rai

Abstract:

The sequence of words in text data has long-term dependencies and is known to suffer from vanishing gradient problems when developing deep learning models. Although recurrent networks such as long short-term memory networks help to overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine the advantages of long short-term memory networks and convolutional neural networks can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning.

Keywords: long short-term memory networks, convolutional recurrent networks, text classification, hyperparameter tuning, Tukey honest significant differences

Procedia PDF Downloads 94
27969 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 136
27968 Off-Topic Text Detection System Using a Hybrid Model

Authors: Usama Shahid

Abstract:

Be it written documents, news columns, or students' essays, verifying the content can be a time-consuming task. Apart from the spelling and grammar mistakes, the proofreader is also supposed to verify whether the content included in the essay or document is relevant or not. The irrelevant content in any document or essay is referred to as off-topic text and in this paper, we will address the problem of off-topic text detection from a document using machine learning techniques. Our study aims to identify the off-topic content from a document using Echo state network model and we will also compare data with other models. The previous study uses Convolutional Neural Networks and TFIDF to detect off-topic text. We will rearrange the existing datasets and take new classifiers along with new word embeddings and implement them on existing and new datasets in order to compare the results with the previously existing CNN model.

Keywords: off topic, text detection, eco state network, machine learning

Procedia PDF Downloads 57
27967 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 246
27966 Forecasting Unusual Infection of Patient Used by Irregular Weighted Point Set

Authors: Seema Vaidya

Abstract:

Mining association rule is a key issue in data mining. In any case, the standard models ignore the distinction among the exchanges, and the weighted association rule mining does not transform on databases with just binary attributes. This paper proposes a novel continuous example and executes a tree (FP-tree) structure, which is an increased prefix-tree structure for securing compacted, discriminating data about examples, and makes a fit FP-tree-based mining system, FP enhanced capacity algorithm is used, for mining the complete game plan of examples by illustration incessant development. Here, this paper handles the motivation behind making remarkable and weighted item sets, i.e. rare weighted item set mining issue. The two novel brightness measures are proposed for figuring the infrequent weighted item set mining issue. Also, the algorithm are handled which perform IWI which is more insignificant IWI mining. Moreover we utilized the rare item set for choice based structure. The general issue of the start of reliable definite rules is troublesome for the grounds that hypothetically no inciting technique with no other person can promise the rightness of influenced theories. In this way, this framework expects the disorder with the uncommon signs. Usage study demonstrates that proposed algorithm upgrades the structure which is successful and versatile for mining both long and short diagnostics rules. Structure upgrades aftereffects of foreseeing rare diseases of patient.

Keywords: association rule, data mining, IWI mining, infrequent item set, frequent pattern growth

Procedia PDF Downloads 380
27965 Visualization of Taiwan's Religious Social Networking Sites

Authors: Jia-Jane Shuai

Abstract:

Purpose of this research aims to improve understanding of the nature of online religion by examining the religious social websites. What motivates individual users to use the online religious social websites, and which factors affect those motivations. We survey various online religious social websites provided by different religions, especially the Taiwanese folk religion. Based on the theory of the Content Analysis and Social Network Analysis, religious social websites and religious web activities are examined. This research examined the folk religion websites’ presentation and contents that promote the religious use of the Internet in Taiwan. The difference among different religions and religious websites also be compared. First, this study used keywords to examine what types of messages gained the most clicks of “Like”, “Share” and comments on Facebook. Dividing the messages into four media types, namely, text, link, video, and photo, reveal which category receive more likes and comments than the others. Meanwhile, this study analyzed the five dialogic principles of religious websites accessed from mobile phones and also assessed their mobile readiness. Using the five principles of dialogic theory as a basis, do a general survey on the websites with elements of online religion. Second, the project analyzed the characteristics of Taiwanese participants for online religious activities. Grounded by social network analysis and text mining, this study comparatively explores the network structure, interaction pattern, and geographic distribution of users involved in communication networks of the folk religion in social websites and mobile sites. We studied the linkage preference of different religious groups. The difference among different religions and religious websites also be compared. We examined the reasons for the success of these websites, as well as reasons why young users accept new religious media. The outcome of the research will be useful for online religious service providers and non-profit organizations to manage social websites and internet marketing.

Keywords: content analysis, online religion, social network analysis, social websites

Procedia PDF Downloads 143