Search results for: Predictive Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7739

Search results for: Predictive Data Mining

7139 Using the Technology Acceptance Model to Examine Seniors’ Attitudes toward Facebook

Authors: Chien-Jen Liu, Shu Ching Yang

Abstract:

Using the technology acceptance model (TAM), this study examined the external variables of technological complexity (TC) to acquire a better understanding of the factors that influence the acceptance of computer application courses by learners at Active Aging Universities. After the learners in this study had completed a 27-hour Facebook course, 44 learners responded to a modified TAM survey. Data were collected to examine the path relationships among the variables that influence the acceptance of Facebook-mediated community learning. The partial least squares (PLS) method was used to test the measurement and the structural model. The study results demonstrated that attitudes toward Facebook use directly influence behavioral intentions (BI) with respect to Facebook use, evincing a high prediction rate of 58.3%. In addition to the perceived usefulness (PU) and perceived ease of use (PEOU) measures that are proposed in the TAM, other external variables, such as TC, also indirectly influence BI. These four variables can explain 88% of the variance in BI and demonstrate a high level of predictive ability. Finally, limitations of this investigation and implications for further research are discussed.

Keywords: Technology acceptance model (TAM), technological complexity, partial least squares (PLS), perceived usefulness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3195
7138 Modeling Stress-Induced Regulatory Cascades with Artificial Neural Networks

Authors: Maria E. Manioudaki, Panayiota Poirazi

Abstract:

Yeast cells live in a constantly changing environment that requires the continuous adaptation of their genomic program in order to sustain their homeostasis, survive and proliferate. Due to the advancement of high throughput technologies, there is currently a large amount of data such as gene expression, gene deletion and protein-protein interactions for S. Cerevisiae under various environmental conditions. Mining these datasets requires efficient computational methods capable of integrating different types of data, identifying inter-relations between different components and inferring functional groups or 'modules' that shape intracellular processes. This study uses computational methods to delineate some of the mechanisms used by yeast cells to respond to environmental changes. The GRAM algorithm is first used to integrate gene expression data and ChIP-chip data in order to find modules of coexpressed and co-regulated genes as well as the transcription factors (TFs) that regulate these modules. Since transcription factors are themselves transcriptionally regulated, a three-layer regulatory cascade consisting of the TF-regulators, the TFs and the regulated modules is subsequently considered. This three-layer cascade is then modeled quantitatively using artificial neural networks (ANNs) where the input layer corresponds to the expression of the up-stream transcription factors (TF-regulators) and the output layer corresponds to the expression of genes within each module. This work shows that (a) the expression of at least 33 genes over time and for different stress conditions is well predicted by the expression of the top layer transcription factors, including cases in which the effect of up-stream regulators is shifted in time and (b) identifies at least 6 novel regulatory interactions that were not previously associated with stress-induced changes in gene expression. These findings suggest that the combination of gene expression and protein-DNA interaction data with artificial neural networks can successfully model biological pathways and capture quantitative dependencies between distant regulators and downstream genes.

Keywords: gene modules, artificial neural networks, yeast, stress

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465
7137 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: Building energy prediction, data mining, demand response, electricity market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205
7136 Using Pattern Search Methods for Minimizing Clustering Problems

Authors: Parvaneh Shabanzadeh, Malik Hj Abu Hassan, Leong Wah June, Maryam Mohagheghtabar

Abstract:

Clustering is one of an interesting data mining topics that can be applied in many fields. Recently, the problem of cluster analysis is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the cluster analysis problem based on nonsmooth optimization techniques is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. These methods do not explicitly use derivatives, an important feature that has not been addressed in previous studies. Results of numerical experiments are presented which demonstrate the effectiveness of the proposed method.

Keywords: Clustering functions, Non-smooth Optimization, Nonconvex Optimization, Pattern Search Method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1640
7135 DMC with Adaptive Weighted Output

Authors: Ahmed Abbas, M.R. M Rizk, Mohamed El-Sayed

Abstract:

This paper presents a new adaptive DMC controller that improves the controller performance in case of plant-model mismatch. The new controller monitors the plant measured output, compares it with the model output and calculates weights applied to the controller move. Simulations show that the new controller can help improve control performance and avoid instability in case of severe model mismatches.

Keywords: Adaptive control, dynamic matrix control, DMC, model predictive control

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2225
7134 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2143
7133 GIS-Based Spatial Distribution and Evaluation of Selected Heavy Metals Contamination in Topsoil around Ecton Mining Area, Derbyshire, UK

Authors: Zahid O. Alibrahim, Craig D. Williams, Clive L. Roberts

Abstract:

The study area (Ecton mining area) is located in the southern part of the Peak District in Derbyshire, England. It is bounded by the River Manifold from the west. This area has been mined for a long period. As a result, huge amounts of potentially toxic metals were released into the surrounding area and are most likely to be a significant source of heavy metal contamination to the local soil, water and vegetation. In order to appraise the potential heavy metal pollution in this area, 37 topsoil samples (5-20 cm depth) were collected and analysed for their total content of Cu, Pb, Zn, Mn, Cr, Ni and V using ICP (Inductively Coupled Plasma) optical emission spectroscopy. Multivariate Geospatial analyses using the GIS technique were utilised to draw geochemical maps of the metals of interest over the study area. A few hotspot points, areas of elevated concentrations of metals, were specified, which are presumed to be the results of anthropogenic activities. In addition, the soil’s environmental quality was evaluated by calculating the Mullers’ Geoaccumulation index (I geo), which suggests that the degree of contamination of the investigated heavy metals has the following trend: Pb > Zn > Cu > Mn > Ni = Cr = V. Furthermore, the potential ecological risk, using the enrichment factor (EF), was also specified. On the basis of the calculated amount or the EF, the levels of pollution for the studied metals in the study area have the following order: Pb>Zn>Cu>Cr>V>Ni>Mn.

Keywords: Heavy metals, GIS, multivariate analysis, geoaccumulation index, enrichment factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1241
7132 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: Data mining, information retrieval system, multi-label, problem transformation, histogram of gradients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1315
7131 Development of a Technology Assessment Model by Patents and Customers' Review Data

Authors: Kisik Song, Sungjoo Lee

Abstract:

Recent years have seen an increasing number of patent disputes due to excessive competition in the global market and a reduced technology life-cycle; this has increased the risk of investment in technology development. While many global companies have started developing a methodology to identify promising technologies and assess for decisions, the existing methodology still has some limitations. Post hoc assessments of the new technology are not being performed, especially to determine whether the suggested technologies turned out to be promising. For example, in existing quantitative patent analysis, a patent’s citation information has served as an important metric for quality assessment, but this analysis cannot be applied to recently registered patents because such information accumulates over time. Therefore, we propose a new technology assessment model that can replace citation information and positively affect technological development based on post hoc analysis of the patents for promising technologies. Additionally, we collect customer reviews on a target technology to extract keywords that show the customers’ needs, and we determine how many keywords are covered in the new technology. Finally, we construct a portfolio (based on a technology assessment from patent information) and a customer-based marketability assessment (based on review data), and we use them to visualize the characteristics of the new technologies.

Keywords: Technology assessment, patents, citation information, opinion mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 992
7130 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: Case-based reasoning, decision tree, stock selection, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1705
7129 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745
7128 A Study on the Nostalgia Contents Analysis of Hometown Alumni in the Online Community

Authors: Heejin Yun, Juanjuan Zang

Abstract:

This study aims to analyze the text terms posted on an online community of people from the same hometown and to understand the topic and trend of nostalgia composed online. For this purpose, this study collected 144 writings which the natives of Yeongjong Island, Incheon, South-Korea have posted on an online community. And it analyzed association relations. As a result, online community texts means that just defining nostalgia as ‘a mind longing for hometown’ is not an enough explanation. Second, texts composed online have abstractness rather than persons’ individual stories. This study figured out the relationship that had the most critical and closest mutual association among the terms that constituted nostalgia through literature research and association rule concerning nostalgia. The result of this study has a characteristic that it summed up the core terms and emotions related to nostalgia.

Keywords: Nostalgia, cultural memory, data mining, online community.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1044
7127 Fuzzy Clustering Analysis in Real Estate Companies in China

Authors: Jianfeng Li, Feng Jin, Xiaoyu Yang

Abstract:

This paper applies fuzzy clustering algorithm in classifying real estate companies in China according to some general financial indexes, such as income per share, share accumulation fund, net profit margins, weighted net assets yield and shareholders' equity. By constructing and normalizing initial partition matrix, getting fuzzy similar matrix with Minkowski metric and gaining the transitive closure, the dynamic fuzzy clustering analysis for real estate companies is shown clearly that different clustered result change gradually with the threshold reducing, and then, it-s shown there is the similar relationship with the prices of those companies in stock market. In this way, it-s great valuable in contrasting the real estate companies- financial condition in order to grasp some good chances of investment, and so on.

Keywords: Fuzzy clustering algorithm, data mining, real estate company, financial analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1917
7126 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: Product recommender system, Ensemble technique, Association rules, Decision tree, Artificial neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4222
7125 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2791
7124 Models of State Organization and Influence over Collective Identity and Nationalism in Spain

Authors: Muñoz-Sanchez, Victor Manuel, Perez-Flores, Antonio Manuel

Abstract:

The main objective of this paper is to establish the relationship between models of state organization and the various types of collective identity expressed by the Spanish. The question of nationalism and identity ascription in Spain has always been a topic of special importance due to the presence in that country of territories where the population emits very different opinions of nationalist sentiment than the rest of Spain. The current situation of sovereignty challenge of Catalonia to the central government exemplifies the importance of the subject matter. In order to analyze this process of interrelation, we use a secondary data mining by applying the multiple correspondence analysis technique (MCA). As a main result a typology of four types of expression of collective identity based on models of State organization are shown, which are connected with the party position on this issue.

Keywords: Models of organization of the state, nationalism, collective identity, Spain, political parties.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1688
7123 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642
7122 Fuzzy Wavelet Packet based Feature Extraction Method for Multifunction Myoelectric Control

Authors: Rami N. Khushaba, Adel Al-Jumaily

Abstract:

The myoelectric signal (MES) is one of the Biosignals utilized in helping humans to control equipments. Recent approaches in MES classification to control prosthetic devices employing pattern recognition techniques revealed two problems, first, the classification performance of the system starts degrading when the number of motion classes to be classified increases, second, in order to solve the first problem, additional complicated methods were utilized which increase the computational cost of a multifunction myoelectric control system. In an effort to solve these problems and to achieve a feasible design for real time implementation with high overall accuracy, this paper presents a new method for feature extraction in MES recognition systems. The method works by extracting features using Wavelet Packet Transform (WPT) applied on the MES from multiple channels, and then employs Fuzzy c-means (FCM) algorithm to generate a measure that judges on features suitability for classification. Finally, Principle Component Analysis (PCA) is utilized to reduce the size of the data before computing the classification accuracy with a multilayer perceptron neural network. The proposed system produces powerful classification results (99% accuracy) by using only a small portion of the original feature set.

Keywords: Biomedical Signal Processing, Data mining andInformation Extraction, Machine Learning, Rehabilitation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737
7121 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1208
7120 Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.

Keywords: Data Mining, Fuzzy Clustering, Relational Clustering, Medoid-Based Clustering, Cluster Analysis, Unsupervised Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2402
7119 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837
7118 Leveraging xAPI in a Corporate e-Learning Environment to Facilitate the Tracking, Modelling, and Predictive Analysis of Learner Behaviour

Authors: Libor Zachoval, Daire O Broin, Oisin Cawley

Abstract:

E-learning platforms, such as Blackboard have two major shortcomings: limited data capture as a result of the limitations of SCORM (Shareable Content Object Reference Model), and lack of incorporation of Artificial Intelligence (AI) and machine learning algorithms which could lead to better course adaptations. With the recent development of Experience Application Programming Interface (xAPI), a large amount of additional types of data can be captured and that opens a window of possibilities from which online education can benefit. In a corporate setting, where companies invest billions on the learning and development of their employees, some learner behaviours can be troublesome for they can hinder the knowledge development of a learner. Behaviours that hinder the knowledge development also raise ambiguity about learner’s knowledge mastery, specifically those related to gaming the system. Furthermore, a company receives little benefit from their investment if employees are passing courses without possessing the required knowledge and potential compliance risks may arise. Using xAPI and rules derived from a state-of-the-art review, we identified three learner behaviours, primarily related to guessing, in a corporate compliance course. The identified behaviours are: trying each option for a question, specifically for multiple-choice questions; selecting a single option for all the questions on the test; and continuously repeating tests upon failing as opposed to going over the learning material. These behaviours were detected on learners who repeated the test at least 4 times before passing the course. These findings suggest that gauging the mastery of a learner from multiple-choice questions test scores alone is a naive approach. Thus, next steps will consider the incorporation of additional data points, knowledge estimation models to model knowledge mastery of a learner more accurately, and analysis of the data for correlations between knowledge development and identified learner behaviours. Additional work could explore how learner behaviours could be utilised to make changes to a course. For example, course content may require modifications (certain sections of learning material may be shown to not be helpful to many learners to master the learning outcomes aimed at) or course design (such as the type and duration of feedback).

Keywords: Compliance Course, Corporate Training, Learner Behaviours, xAPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 561
7117 Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules

Authors: Suraiya Jabin, Kamal K. Bharadwaj

Abstract:

This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.

Keywords: Hierarchical Production Rule, Data Mining, Learning Classifier System, Fuzzy Subsumption Relation, Subsumption matrix, Reinforcement Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456
7116 A Rough Sets Approach for Relevant Internet/Web Online Searching

Authors: Erika Martinez Ramirez, Rene V. Mayorga

Abstract:

The internet is constantly expanding. Identifying web links of interest from web browsers requires users to visit each of the links listed, individually until a satisfactory link is found, therefore those users need to evaluate a considerable amount of links before finding their link of interest; this can be tedious and even unproductive. By incorporating web assistance, web users could be benefited from reduced time searching on relevant websites. In this paper, a rough set approach is presented, which facilitates classification of unlimited available e-vocabulary, to assist web users in reducing search times looking for relevant web sites. This approach includes two methods for identifying relevance data on web links based on the priority and percentage of relevance. As a result of these methods, a list of web sites is generated in priority sequence with an emphasis of the search criteria.

Keywords: Web search, Web Mining, Rough Sets, Web Intelligence, Intelligent Portals, Relevance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550
7115 Perception of Predictive Confounders for the Prevalence of Hypertension among Iraqi Population: A Pilot Study

Authors: Zahraa Albasry, Hadeel D. Najim, Anmar Al-Taie

Abstract:

Background: Hypertension is considered as one of the most important causes of cardiovascular complications and one of the leading causes of worldwide mortality. Identifying the potential risk factors associated with this medical health problem plays an important role in minimizing its incidence and related complications. The objective of this study is to explore the prevalence of receptor sensitivity regarding assess and understand the perception of specific predictive confounding factors on the prevalence of hypertension (HT) among a sample of Iraqi population in Baghdad, Iraq. Materials and Methods: A randomized cross sectional study was carried out on 100 adult subjects during their visit to the outpatient clinic at a certain sector of Baghdad Province, Iraq. Demographic, clinical and health records alongside specific screening and laboratory tests of the participants were collected and analyzed to detect the potential of confounding factors on the prevalence of HT. Results: 63% of the study participants suffered from HT, most of them were female patients (P < 0.005). Patients aged between 41-50 years old significantly suffered from HT than other age groups (63.5%, P < 0.001). 88.9% of the participants were obese (P < 0.001) and 47.6% had diabetes with HT. Positive family history and sedentary lifestyle were significantly higher among all hypertensive groups (P < 0.05). High salt and fatty food intake was significantly found among patients suffered from isolated systolic hypertension (ISHT) (P < 0.05). A significant positive correlation between packed cell volume (PCV) and systolic blood pressure (SBP) (r = 0.353, P = 0.048) found among normotensive participants. Among hypertensive patients, a positive significant correlation found between triglycerides (TG) and both SBP (r = 0.484, P = 0.031) and diastolic blood pressure (DBP) (r = 0.463, P = 0.040), while low density lipoprotein-cholesterol (LDL-c) showed a positive significant correlation with DBP (r = 0.443, P = 0.021). Conclusion: The prevalence of HT among Iraqi populations is of major concern. Further consideration is required to detect the impact of potential risk factors and to minimize blood pressure (BP) elevation and reduce the risk of other cardiovascular complications later in life.

Keywords: Correlation, hypertension, Iraq, risk factors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 924
7114 Using Multi-Arm Bandits to Optimize Game Play Metrics and Effective Game Design

Authors: Kenny Raharjo, Ramon Lawrence

Abstract:

Game designers have the challenging task of building games that engage players to spend their time and money on the game. There are an infinite number of game variations and design choices, and it is hard to systematically determine game design choices that will have positive experiences for players. In this work, we demonstrate how multi-arm bandits can be used to automatically explore game design variations to achieve improved player metrics. The advantage of multi-arm bandits is that they allow for continuous experimentation and variation, intrinsically converge to the best solution, and require no special infrastructure to use beyond allowing minor game variations to be deployed to users for evaluation. A user study confirms that applying multi-arm bandits was successful in determining the preferred game variation with highest play time metrics and can be a useful technique in a game designer's toolkit.

Keywords: Game design, multi-arm bandit, design exploration and data mining, player metric optimization and analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1535
7113 Upon Further Reflection: More on the History, Tripartite Role, and Challenges of the Professoriate

Authors: Jeffrey R. Mueller

Abstract:

This paper expands on the role of the professor by detailing the origins of the profession, adding some of the unique contributions of North American universities as well as some of the best practice recommendations to the unique tripartite role of the professor. It describes current challenges to the profession including the ever-controversial student rating of professors. It continues with the significance of empowerment to the role of the professor. It concludes with a predictive prescription for the future of the professoriate and the role of the university-level educational administrator toward that end.

Keywords: Professoriate history, tripartite role, challenges, empowerment, shared governance, administratization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1069
7112 Object Identification with Color, Texture, and Object-Correlation in CBIR System

Authors: Awais Adnan, Muhammad Nawaz, Sajid Anwar, Tamleek Ali, Muhammad Ali

Abstract:

Needs of an efficient information retrieval in recent years in increased more then ever because of the frequent use of digital information in our life. We see a lot of work in the area of textual information but in multimedia information, we cannot find much progress. In text based information, new technology of data mining and data marts are now in working that were started from the basic concept of database some where in 1960. In image search and especially in image identification, computerized system at very initial stages. Even in the area of image search we cannot see much progress as in the case of text based search techniques. One main reason for this is the wide spread roots of image search where many area like artificial intelligence, statistics, image processing, pattern recognition play their role. Even human psychology and perception and cultural diversity also have their share for the design of a good and efficient image recognition and retrieval system. A new object based search technique is presented in this paper where object in the image are identified on the basis of their geometrical shapes and other features like color and texture where object-co-relation augments this search process. To be more focused on objects identification, simple images are selected for the work to reduce the role of segmentation in overall process however same technique can also be applied for other images.

Keywords: Object correlation, Geometrical shape, Color, texture, features, contents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2028
7111 Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran

Authors: Samira Malekmohammadi Golsefid, Mehdi Ghazanfari, Somayeh Alizadeh

Abstract:

The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.

Keywords: Customers segmentation, Customer relationship management, Clustering, Data Mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2287
7110 Post Occupancy Life Cycle Analysis of a Green Building Energy Consumption at the University of Western Ontario in London - Canada

Authors: M. Bittencourt, E. K. Yanful, D. Velasquez, A. E. Jungles

Abstract:

The CMLP building was developed to be a model for sustainability with strategies to reduce water, energy and pollution, and to provide a healthy environment for the building occupants. The aim of this paper is to investigate the environmental effects of energy used by this building. A LCA (life cycle analysis) was led to measure the real environmental effects produced by the use of energy. The impact categories most affected by the energy use were found to be the human health effects, as well as ecotoxicity. Natural gas extraction, uranium milling for nuclear energy production, and the blasting for mining and infrastructure construction are the processes contributing the most to emissions in the human health effect. Data comparing LCA results of CMLP building with a conventional building results showed that energy used by the CMLP building has less damage for the environment and human health than a conventional building.

Keywords: Environmental Impacts, Green buildings, Life CycleAnalysis, Sustainability

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774