Search results for: Educational data visualization
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7928

Search results for: Educational data visualization

7028 Long-Range Dependence of Financial Time Series Data

Authors: Chatchai Pesee

Abstract:

This paper examines long-range dependence or longmemory of financial time series on the exchange rate data by the fractional Brownian motion (fBm). The principle of spectral density function in Section 2 is used to find the range of Hurst parameter (H) of the fBm. If 0< H <1/2, then it has a short-range dependence (SRD). It simulates long-memory or long-range dependence (LRD) if 1/2< H <1. The curve of exchange rate data is fBm because of the specific appearance of the Hurst parameter (H). Furthermore, some of the definitions of the fBm, long-range dependence and selfsimilarity are reviewed in Section II as well. Our results indicate that there exists a long-memory or a long-range dependence (LRD) for the exchange rate data in section III. Long-range dependence of the exchange rate data and estimation of the Hurst parameter (H) are discussed in Section IV, while a conclusion is discussed in Section V.

Keywords: Fractional Brownian motion, long-rangedependence, memory, short-range dependence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1862
7027 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2675
7026 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928
7025 An Exploratory Study of Reliability of Ranking vs. Rating in Peer Assessment

Authors: Yang Song, Yifan Guo, Edward F. Gehringer

Abstract:

Fifty years of research has found great potential for peer assessment as a pedagogical approach. With peer assessment, not only do students receive more copious assessments; they also learn to become assessors. In recent decades, more educational peer assessments have been facilitated by online systems. Those online systems are designed differently to suit different class settings and student groups, but they basically fall into two categories: rating-based and ranking-based. The rating-based systems ask assessors to rate the artifacts one by one following some review rubrics. The ranking-based systems allow assessors to review a set of artifacts and give a rank for each of them. Though there are different systems and a large number of users of each category, there is no comprehensive comparison on which design leads to higher reliability. In this paper, we designed algorithms to evaluate assessors' reliabilities based on their rating/ranking against the global ranks of the artifacts they have reviewed. These algorithms are suitable for data from both rating-based and ranking-based peer assessment systems. The experiments were done based on more than 15,000 peer assessments from multiple peer assessment systems. We found that the assessors in ranking-based peer assessments are at least 10% more reliable than the assessors in rating-based peer assessments. Further analysis also demonstrated that the assessors in ranking-based assessments tend to assess the more differentiable artifacts correctly, but there is no such pattern for rating-based assessors.

Keywords: Peer assessment, peer rating, peer ranking, reliability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1083
7024 Studies on Determination of the Optimum Distance Between the Tmotes for Optimum Data Transfer in a Network with WLL Capability

Authors: N C Santhosh Kumar, N K Kishore

Abstract:

Using mini modules of Tmotes, it is possible to automate a small personal area network. This idea can be extended to large networks too by implementing multi-hop routing. Linking the various Tmotes using Programming languages like Nesc, Java and having transmitter and receiver sections, a network can be monitored. It is foreseen that, depending on the application, a long range at a low data transfer rate or average throughput may be an acceptable trade-off. To reduce the overall costs involved, an optimum number of Tmotes to be used under various conditions (Indoor/Outdoor) is to be deduced. By analyzing the data rates or throughputs at various locations of Tmotes, it is possible to deduce an optimal number of Tmotes for a specific network. This paper deals with the determination of optimum distances to reduce the cost and increase the reliability of the entire sensor network with Wireless Local Loop (WLL) capability.

Keywords: Average throughput, data rate, multi-hop routing, optimum data transfer, throughput, Tmotes, wireless local loop.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1340
7023 Unified Structured Process for Health Analytics

Authors: Supunmali Ahangama, Danny Chiang Choon Poo

Abstract:

Health analytics (HA) is used in healthcare systems for effective decision making, management and planning of healthcare and related activities. However, user resistances, unique position of medical data content and structure (including heterogeneous and unstructured data) and impromptu HA projects have held up the progress in HA applications. Notably, the accuracy of outcomes depends on the skills and the domain knowledge of the data analyst working on the healthcare data. Success of HA depends on having a sound process model, effective project management and availability of supporting tools. Thus, to overcome these challenges through an effective process model, we propose a HA process model with features from rational unified process (RUP) model and agile methodology.

Keywords: Agile methodology, health analytics, unified process model, UML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2309
7022 The Classification Model for Hard Disk Drive Functional Tests under Sparse Data Conditions

Authors: S. Pattanapairoj, D. Chetchotsak

Abstract:

This paper proposed classification models that would be used as a proxy for hard disk drive (HDD) functional test equitant which required approximately more than two weeks to perform the HDD status classification in either “Pass" or “Fail". These models were constructed by using committee network which consisted of a number of single neural networks. This paper also included the method to solve the problem of sparseness data in failed part, which was called “enforce learning method". Our results reveal that the constructed classification models with the proposed method could perform well in the sparse data conditions and thus the models, which used a few seconds for HDD classification, could be used to substitute the HDD functional tests.

Keywords: Sparse data, Classifications, Committee network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714
7021 Accurate Position Electromagnetic Sensor Using Data Acquisition System

Authors: Z. Ezzouine, A. Nakheli

Abstract:

This paper presents a high position electromagnetic sensor system (HPESS) that is applicable for moving object detection. The authors have developed a high-performance position sensor prototype dedicated to students’ laboratory. The challenge was to obtain a highly accurate and real-time sensor that is able to calculate position, length or displacement. An electromagnetic solution based on a two coil induction principal was adopted. The HPESS converts mechanical motion to electric energy with direct contact. The output signal can then be fed to an electronic circuit. The voltage output change from the sensor is captured by data acquisition system using LabVIEW software. The displacement of the moving object is determined. The measured data are transmitted to a PC in real-time via a DAQ (NI USB -6281). This paper also describes the data acquisition analysis and the conditioning card developed specially for sensor signal monitoring. The data is then recorded and viewed using a user interface written using National Instrument LabVIEW software. On-line displays of time and voltage of the sensor signal provide a user-friendly data acquisition interface. The sensor provides an uncomplicated, accurate, reliable, inexpensive transducer for highly sophisticated control systems.

Keywords: Electromagnetic sensor, data acquisition, accurately, position measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 938
7020 Calculus Logarithmic Function for Image Encryption

Authors: Adil AL-Rammahi

Abstract:

When we prefer to make the data secure from various attacks and fore integrity of data, we must encrypt the data before it is transmitted or stored. This paper introduces a new effective and lossless image encryption algorithm using a natural logarithmic function. The new algorithm encrypts an image through a three stage process. In the first stage, a reference natural logarithmic function is generated as the foundation for the encryption image. The image numeral matrix is then analyzed to five integer numbers, and then the numbers’ positions are transformed to matrices. The advantages of this method is useful for efficiently encrypting a variety of digital images, such as binary images, gray images, and RGB images without any quality loss. The principles of the presented scheme could be applied to provide complexity and then security for a variety of data systems such as image and others.

Keywords: Linear Systems, Image Encryption, Calculus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2373
7019 A Methodology to Virtualize Technical Engineering Laboratories: MastrLAB-VR

Authors: Ivana Scidà, Francesco Alotto, Anna Osello

Abstract:

Due to the importance given today to innovation, the education sector is evolving thanks digital technologies. Virtual Reality (VR) can be a potential teaching tool offering many advantages in the field of training and education, as it allows to acquire theoretical knowledge and practical skills using an immersive experience in less time than the traditional educational process. These assumptions allow to lay the foundations for a new educational environment, involving and stimulating for students. Starting from the objective of strengthening the innovative teaching offer and the learning processes, the case study of the research concerns the digitalization of MastrLAB, High Quality Laboratory (HQL) belonging to the Department of Structural, Building and Geotechnical Engineering (DISEG) of the Polytechnic of Turin, a center specialized in experimental mechanical tests on traditional and innovative building materials and on the structures made with them. The MastrLAB-VR has been developed, a revolutionary innovative training tool designed with the aim of educating the class in total safety on the techniques of use of machinery, thus reducing the dangers arising from the performance of potentially dangerous activities. The virtual laboratory, dedicated to the students of the Building and Civil Engineering Courses of the Polytechnic of Turin, has been projected to simulate in an absolutely realistic way the experimental approach to the structural tests foreseen in their courses of study: from the tensile tests to the relaxation tests, from the steel qualification tests to the resilience tests on elements at environmental conditions or at characterizing temperatures. The research work proposes a methodology for the virtualization of technical laboratories through the application of Building Information Modelling (BIM), starting from the creation of a digital model. The process includes the creation of an independent application, which with Oculus Rift technology will allow the user to explore the environment and interact with objects through the use of joypads. The application has been tested in prototype way on volunteers, obtaining results related to the acquisition of the educational notions exposed in the experience through a virtual quiz with multiple answers, achieving an overall evaluation report. The results have shown that MastrLAB-VR is suitable for both beginners and experts and will be adopted experimentally for other laboratories of the University departments.

Keywords: Building Information Modelling, digital learning, education, virtual laboratory, virtual reality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 800
7018 Design of Mobile Teaching for Students Collaborative Learning in Distance Higher Education

Authors: Lisbeth Amhag

Abstract:

The aim of the study is to describe and analyze design of mobile teaching for students collaborative learning in distance higher education with a focus on mobile technologies as online webinars (web-based seminars or conferencing) by using laptops, smart phones, or tablets. These multimedia tools can provide face-toface interactions, recorded flipped classroom videos and parallel chat communications. The data collection consists of interviews with 22 students and observations of online face-to-face webinars, as well two surveys. Theoretically, the study joins the research tradition of Computer Supported Collaborative learning, CSCL, as well as Computer Self-Efficacy, CSE concerned with individuals’ media and information literacy. Important conclusions from the study demonstrated mobile interactions increased student centered learning. As the students were appreciating the working methods, they became more engaged and motivated. The mobile technology using among student also contributes to increased flexibility between space and place, as well as media and information literacy.

Keywords: Computer self-efficacy, computer supported collaborative learning, distance and open learning, educational design and technologies, media and information literacy, mobile learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1894
7017 Intelligent BRT in Tehran

Authors: P. Parvizi, S. Mohammadi

Abstract:

an intelligent BRT system is necessary when communities looking for new ways to use high capacity rapid transit at a reduced cost.This paper will describe the intelligent control system that works with Datacenter. With the help of GPS system, the data center can monitor the situation of each bus and bus station. Through RFID technology, bus station and traffic light can transfer data with bus and by Wimax communication technology all of parts can talk together; data center learns all information about the location of bus, the arrival of bus in each station and the number of passengers in station and bus.Finally, the paper presents the case study of those theories in Tehran BRT.

Keywords: TehranBRT, RFID, Intelligent Transportation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2425
7016 Spread Spectrum Image Watermarking for Secured Multimedia Data Communication

Authors: Tirtha S. Das, Ayan K. Sau, Subir K. Sarkar

Abstract:

Digital watermarking is a way to provide the facility of secure multimedia data communication besides its copyright protection approach. The Spread Spectrum modulation principle is widely used in digital watermarking to satisfy the robustness of multimedia signals against various signal-processing operations. Several SS watermarking algorithms have been proposed for multimedia signals but very few works have discussed on the issues responsible for secure data communication and its robustness improvement. The current paper has critically analyzed few such factors namely properties of spreading codes, proper signal decomposition suitable for data embedding, security provided by the key, successive bit cancellation method applied at decoder which have greater impact on the detection reliability, secure communication of significant signal under camouflage of insignificant signals etc. Based on the analysis, robust SS watermarking scheme for secure data communication is proposed in wavelet domain and improvement in secure communication and robustness performance is reported through experimental results. The reported result also shows improvement in visual and statistical invisibility of the hidden data.

Keywords: Spread spectrum modulation, spreading code, signaldecomposition, security, successive bit cancellation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2763
7015 Comparison of Hough Transform and Mean Shift Algorithm for Estimation of the Orientation Angle of Industrial Data Matrix Codes

Authors: Ion-Cosmin Dita, Vasile Gui, Franz Quint, Marius Otesteanu

Abstract:

In automatic manufacturing and assembling of mechanical, electrical and electronic parts one needs to reliably identify the position of components and to extract the information of these components. Data Matrix Codes (DMC) are established by these days in many areas of industrial manufacturing thanks to their concentration of information on small spaces. In today’s usually order-related industry, where increased tracing requirements prevail, they offer further advantages over other identification systems. This underlines in an impressive way the necessity of a robust code reading system for detecting DMC on the components in factories. This paper compares two methods for estimating the angle of orientation of Data Matrix Codes: one method based on the Hough Transform and the other based on the Mean Shift Algorithm. We concentrate on Data Matrix Codes in industrial environment, punched, milled, lasered or etched on different materials in arbitrary orientation.

Keywords: Industrial data matrix code, Hough transform, mean shift.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1321
7014 Implementation of Security Algorithms for u-Health Monitoring System

Authors: Jiho Park, Yong-Gyu Lee, Gilwon Yoon

Abstract:

Data security in u-Health system can be an important issue because wireless network is vulnerable to hacking. However, it is not easy to implement a proper security algorithm in an embedded u-health monitoring because of hardware constraints such as low performance, power consumption and limited memory size and etc. To secure data that contain personal and biosignal information, we implemented several security algorithms such as Blowfish, data encryption standard (DES), advanced encryption standard (AES) and Rivest Cipher 4 (RC4) for our u-Health monitoring system and the results were successful. Under the same experimental conditions, we compared these algorithms. RC4 had the fastest execution time. Memory usage was the most efficient for DES. However, considering performance and safety capability, however, we concluded that AES was the most appropriate algorithm for a personal u-Health monitoring system.

Keywords: biosignal, data encryption, security measures, u-health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2107
7013 A Symbol by Symbol Clustering Based Blind Equalizer

Authors: Kristina Georgoulakis

Abstract:

A new blind symbol by symbol equalizer is proposed. The operation of the proposed equalizer is based on the geometric properties of the two dimensional data constellation. An unsupervised clustering technique is used to locate the clusters formed by the received data. The symmetric properties of the clusters labels are subsequently utilized in order to label the clusters. Following this step, the received data are compared to clusters and decisions are made on a symbol by symbol basis, by assigning to each data the label of the nearest cluster. The operation of the equalizer is investigated both in linear and nonlinear channels. The performance of the proposed equalizer is compared to the performance of a CMAbased blind equalizer.

Keywords: Blind equalization, channel equalization, cluster based equalisers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410
7012 Zero Inflated Models for Overdispersed Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

The zero inflated models are usually used in modeling count data with excess zeros where the existence of the excess zeros could be structural zeros or zeros which occur by chance. These type of data are commonly found in various disciplines such as finance, insurance, biomedical, econometrical, ecology, and health sciences which involve sex and health dental epidemiology. The most popular zero inflated models used by many researchers are zero inflated Poisson and zero inflated negative binomial models. In addition, zero inflated generalized Poisson and zero inflated double Poisson models are also discussed and found in some literature. Recently zero inflated inverse trinomial model and zero inflated strict arcsine models are advocated and proven to serve as alternative models in modeling overdispersed count data caused by excessive zeros and unobserved heterogeneity. The purpose of this paper is to review some related literature and provide a variety of examples from different disciplines in the application of zero inflated models. Different model selection methods used in model comparison are discussed.

Keywords: Overdispersed count data, model selection methods, likelihood ratio, AIC, BIC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4496
7011 Formalizing a Procedure for Generating Uncertain Resource Availability Assumptions Based On Real Time Logistic Data Capturing with Auto-ID Systems for Reactive Scheduling

Authors: Lars Laußat, Manfred Helmus, Kamil Szczesny, Markus König

Abstract:

As one result of the project “Reactive Construction Project Scheduling using Real Time Construction Logistic Data and Simulation”, a procedure for using data about uncertain resource availability assumptions in reactive scheduling processes has been developed. Prediction data about resource availability is generated in a formalized way using real-time monitoring data e.g. from auto-ID systems on the construction site and in the supply chains. The paper focusses on the formalization of the procedure for monitoring construction logistic processes, for the detection of disturbance and for generating of new and uncertain scheduling assumptions for the reactive resource constrained simulation procedure that is and will be further described in other papers.

Keywords: Auto-ID, Construction Logistic, Fuzzy, Monitoring, RFID, Scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1757
7010 Nuclear Data Evaluation for 217Po

Authors: Sherif S. Nafee, Amir K. Al-Ramady, Salem S. Shaheen

Abstract:

Evaluated nuclear decay data for the 217Po nuclide is presented in the present work. These data include recommended values for the half-life T1/2, α-, β-- and γ-ray emission energies and probabilities. Decay data from 221Rn α and 217Bi β—decays are presented. Q(α) has been updated based on the recent published work of the Atomic Mass Evaluation AME2012. In addition, the logft values were calculated using the Logft program from the ENSDF evaluation package. Moreover, the total internal conversion electrons and the K-shell to L-shell and L-shell to M-shell and to N-shell conversion electrons ratios K/L, L/M and L/N have been calculated using Bricc program. Meanwhile, recommendation values or the multi-polarities have been assigned based on recently measurement yield a better intensity balance at the 254 keV and 264 keV gamma transitions.

Keywords: Atomic Mass Evaluation, Nuclear Data Evaluation, Total Electron Conversion Electrons.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2239
7009 Detection of Keypoint in Press-Fit Curve Based on Convolutional Neural Network

Authors: Shoujia Fang, Guoqing Ding, Xin Chen

Abstract:

The quality of press-fit assembly is closely related to reliability and safety of product. The paper proposed a keypoint detection method based on convolutional neural network to improve the accuracy of keypoint detection in press-fit curve. It would provide an auxiliary basis for judging quality of press-fit assembly. The press-fit curve is a curve of press-fit force and displacement. Both force data and distance data are time-series data. Therefore, one-dimensional convolutional neural network is used to process the press-fit curve. After the obtained press-fit data is filtered, the multi-layer one-dimensional convolutional neural network is used to perform the automatic learning of press-fit curve features, and then sent to the multi-layer perceptron to finally output keypoint of the curve. We used the data of press-fit assembly equipment in the actual production process to train CNN model, and we used different data from the same equipment to evaluate the performance of detection. Compared with the existing research result, the performance of detection was significantly improved. This method can provide a reliable basis for the judgment of press-fit quality.

Keywords: Keypoint detection, curve feature, convolutional neural network, press-fit assembly.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 904
7008 Proposal to Increase the Efficiency, Reliability and Safety of the Centre of Data Collection Management and Their Evaluation Using Cluster Solutions

Authors: Martin Juhas, Bohuslava Juhasova, Igor Halenar, Andrej Elias

Abstract:

This article deals with the possibility of increasing efficiency, reliability and safety of the system for teledosimetric data collection management and their evaluation as a part of complex study for activity “Research of data collection, their measurement and evaluation with mobile and autonomous units” within project “Research of monitoring and evaluation of non-standard conditions in the area of nuclear power plants”. Possible weaknesses in existing system are identified. A study of available cluster solutions with possibility of their deploying to analysed system is presented

Keywords: Teledosimetric data, efficiency, reliability, safety, cluster solution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
7007 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: Classification algorithms; data mining; tourism; knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2520
7006 Clustering Categorical Data Using Hierarchies (CLUCDUH)

Authors: Gökhan Silahtaroğlu

Abstract:

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

Keywords: Clustering, tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1530
7005 Analysis of Users’ Behavior on Book Loan Log Based On Association Rule Mining

Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong

Abstract:

This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, Apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.

Keywords: Behavior, data mining technique, Apriori algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2286
7004 Data Integrity: Challenges in Health Information Systems in South Africa

Authors: T. Thulare, M. Herselman, A. Botha

Abstract:

Poor system use, including inappropriate design of health information systems, causes difficulties in communication with patients and increased time spent by healthcare professionals in recording the necessary health information for medical records. System features like pop-up reminders, complex menus, and poor user interfaces can make medical records far more time consuming than paper cards as well as affect decision-making processes. Although errors associated with health information and their real and likely effect on the quality of care and patient safety have been documented for many years, more research is needed to measure the occurrence of these errors and determine the causes to implement solutions. Therefore, the purpose of this paper is to identify data integrity challenges in hospital information systems through a scoping review and based on the results provide recommendations on how to manage these. Only 34 papers were found to be most suitable out of 297 publications initially identified in the field. The results indicated that human and computerized systems are the most common challenges associated with data integrity and factors such as policy, environment, health workforce, and lack of awareness attribute to these challenges but if measures are taken the data integrity challenges can be managed.

Keywords: Data integrity, data integrity challenges, hospital information systems, South Africa.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1308
7003 Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks

Authors: Sean Paulsen, Michael Casey

Abstract:

In this work, we present a self-supervised pretraining framework for transformers on functional Magnetic Resonance Imaging (fMRI) data. First, we pretrain our architecture on two self-supervised tasks simultaneously to teach the model a general understanding of the temporal and spatial dynamics of human auditory cortex during music listening. Our pretraining results are the first to suggest a synergistic effect of multitask training on fMRI data. Second, we finetune the pretrained models and train additional fresh models on a supervised fMRI classification task. We observe significantly improved accuracy on held-out runs with the finetuned models, which demonstrates the ability of our pretraining tasks to facilitate transfer learning. This work contributes to the growing body of literature on transformer architectures for pretraining and transfer learning with fMRI data, and serves as a proof of concept for our pretraining tasks and multitask pretraining on fMRI data.

Keywords: Transfer learning, fMRI, self-supervised, brain decoding, transformer, multitask training.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 98
7002 On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation

Authors: Xiaohua Liu, Juan F. Beltran, Nishant Mohanchandra, Godfried T. Toussaint

Abstract:

Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.

Keywords: Machine learning, data mining, support vector machines, proximity graphs, relative-neighborhood graphs, k-nearestneighbor graphs, random sampling, training data condensation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1899
7001 Investigating the Individual Difference Antecedents of Perceived Enjoyment in the Acceptance of Blogging

Authors: Yi-Shun Wang, Hsin-Hui Lin, Yi-Wen Liao

Abstract:

With the proliferation of Weblogs (blogs) use in educational contexts, gaining a better understanding of why students are willing to utilize blog systems has become an important topic for practitioners and academics. While perceived enjoyment has been found to have a significant influence on behavioral intentions to use blogs or hedonic systems, few studies have investigated the antecedents of perceived enjoyment in the acceptance of blogging. The main purpose of the present study is to explore the individual difference antecedents of perceived enjoyment and examine how they influence behavioral intention to blog through the mediation of perceived enjoyment. Based on the previous literature, the Big Five personality traits (i.e., extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience), as well as computer self-efficacy and personal innovation in information technology (PIIT), are hypothesized as potential antecedents of perceived enjoyment in the acceptance of blogging. Data collected from 358 respondents in Taiwan are tested against the research model using the structural equation modeling approach. The results indicate that extraversion, agreeableness, conscientiousness, and PIIT have a significant influence on perceived enjoyment, which in turn significantly influences the behavioral intention to blog. These findings lead to several important implications for future research.

Keywords: Individual difference, Big Five personality traits, perceived enjoyment, blogging

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2088
7000 Wavelet and K-L Seperability Based Feature Extraction Method for Functional Data Classification

Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai

Abstract:

This paper proposes a novel feature extraction method, based on Discrete Wavelet Transform (DWT) and K-L Seperability (KLS), for the classification of Functional Data (FD). This method combines the decorrelation and reduction property of DWT and the additive independence property of KLS, which is helpful to extraction classification features of FD. It is an advanced approach of the popular wavelet based shrinkage method for functional data reduction and classification. A theory analysis is given in the paper to prove the consistent convergence property, and a simulation study is also done to compare the proposed method with the former shrinkage ones. The experiment results show that this method has advantages in improving classification efficiency, precision and robustness.

Keywords: classification, functional data, feature extraction, K-Lseperability, wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1436
6999 A Cuckoo Search with Differential Evolution for Clustering Microarray Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.

Keywords: DNA, Microarray, genomics, Cuckoo Search, Differential Evolution, Gene expression data, Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1463