Search results for: Web Usage Mining.
856 Analysis of Physicochemical Properties on Prediction of R5, X4 and R5X4 HIV-1 Coreceptor Usage
Authors: Kai-Ti Hsu, Hui-Ling Huang, Chun-Wei Tung, Yi-Hsiung Chen, Shinn-Ying Ho
Abstract:
Bioinformatics methods for predicting the T cell coreceptor usage from the array of membrane protein of HIV-1 are investigated. In this study, we aim to propose an effective prediction method for dealing with the three-class classification problem of CXCR4 (X4), CCR5 (R5) and CCR5/CXCR4 (R5X4). We made efforts in investigating the coreceptor prediction problem as follows: 1) proposing a feature set of informative physicochemical properties which is cooperated with SVM to achieve high prediction test accuracy of 81.48%, compared with the existing method with accuracy of 70.00%; 2) establishing a large up-to-date data set by increasing the size from 159 to 1225 sequences to verify the proposed prediction method where the mean test accuracy is 88.59%, and 3) analyzing the set of 14 informative physicochemical properties to further understand the characteristics of HIV-1coreceptors.Keywords: Coreceptor, genetic algorithm, HIV-1, SVM, physicochemical properties, prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2387855 The Conceptualization of Integrated Consumer Health Informatics Utilization Framework
Authors: Norfadzila, S.W.A., Balakrishnan, V., A. Abrizah
Abstract:
The purpose of this paper is to propose an integrated consumer health informatics utilization framework that can be used to gauge the online health information needs and usage patterns among Malaysian women. The proposed framework was developed based on four different theories/models: Use and Gratification Theory, Technology Acceptance 3 Model, Health Belief Model, and Multi-level Model of Information Seeking. The relevant constructs and research hypotheses are also presented in this paper. The framework will be tested in order for it to be used successfully to identify Malaysian women-s preferences of online health information resources and health information seeking activities.Keywords: Consumer Health Informatics, Consumer Preferences, Information Needs and Usage Patterns, Online Health Information, Women Studies
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727854 Feature Based Unsupervised Intrusion Detection
Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein
Abstract:
The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.
Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2778853 Proposal for Cost Calculation of Warehouse Processes and Its Usage for Setting Standards for Performance Evaluation
Authors: Tomas Cechura, Michal Simon
Abstract:
This paper describes a proposal for cost calculation of warehouse processes and its usage for setting standards for performance evaluation. One of the most common options of monitoring process performance is benchmarking. The typical outcome is whether the monitored object is better or worse than an average or standard. Traditional approaches, however, cannot find any specific opportunities to improve performance or eliminate inefficiencies in processes. Higher process efficiency can be achieved for example by cost reduction assuming that the same output is generated. However, costs can be reduced only if we know their structure and we are able to calculate them accurately. In the warehouse process area it is rather difficult because in most cases we have available only aggregated values with low explanatory ability. The aim of this paper is to create a suitable method for calculating the storage costs. At the end is shown a practical example of process calculation.
Keywords: Calculation, Costs, Performance, Process, Warehouse.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9326852 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2055851 Post Mining- Discovering Valid Rules from Different Sized Data Sources
Authors: R. Nedunchezhian, K. Anbumani
Abstract:
A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.
Keywords: Association rules, multiple data stores, synthesizing, valid rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1404850 The Study of Tourists’ Behavior in Water Usage in Hotel Business: Case Study of Phuket Province, Thailand
Authors: A. Pensiri, K. Nantaporn, P. Parichut
Abstract:
Tourism is very important to the economy of many countries due to the large contribution in the areas of employment and income generation. However, the rapid growth of tourism can also be considered as one of the major uses of water user, and therefore also have a significant and detrimental impact on the environment. Guest behavior in water usage can be used to manage water in hotels for sustainable water resources management. This research presents a study of hotel guest water usage behavior at two hotels, namely Hotel A (located in Kathu district) and Hotel B (located in Muang district) in Phuket Province, Thailand, as case studies. Primary and secondary data were collected from the hotel manager through interview and questionnaires. The water flow rate was measured in-situ from each water supply device in the standard room type at each hotel, including hand washing faucets, bathroom faucets, shower and toilet flush. For the interview, the majority of respondents (n = 204 for Hotel A and n = 244 for Hotel B) were aged between 21 years and 30 years (53% for Hotel A and 65% for Hotel B) and the majority were foreign (78% in Hotel A, and 92% in Hotel B) from American, France and Austria for purposes of tourism (63% in Hotel A, and 55% in Hotel B). The data showed that water consumption ranged from 188 litres to 507 liters, and 383 litres to 415 litres per overnight guest in Hotel A and Hotel B (n = 244), respectively. These figures exceed the water efficiency benchmark set for Tropical regions by the International Tourism Partnership (ITP). It is recommended that guest water saving initiatives should be implemented at hotels. Moreover, the results showed that guests have high satisfaction for the hotels, the front office service reveal the top rates of average score of 4.35 in Hotel A and 4.20 in Hotel B, respectively, while the luxury decoration and room cleanliness exhibited the second satisfaction scored by the guests in Hotel A and B, respectively. On the basis of this information, the findings can be very useful to improve customer service satisfaction and pay attention to this particular aspect for better hotel management.
Keywords: Hotel, tourism, Phuket, water usage.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2299849 An Algorithm Proposed for FIR Filter Coefficients Representation
Authors: Mohamed Al Mahdi Eshtawie, Masuri Bin Othman
Abstract:
Finite impulse response (FIR) filters have the advantage of linear phase, guaranteed stability, fewer finite precision errors, and efficient implementation. In contrast, they have a major disadvantage of high order need (more coefficients) than IIR counterpart with comparable performance. The high order demand imposes more hardware requirements, arithmetic operations, area usage, and power consumption when designing and fabricating the filter. Therefore, minimizing or reducing these parameters, is a major goal or target in digital filter design task. This paper presents an algorithm proposed for modifying values and the number of non-zero coefficients used to represent the FIR digital pulse shaping filter response. With this algorithm, the FIR filter frequency and phase response can be represented with a minimum number of non-zero coefficients. Therefore, reducing the arithmetic complexity needed to get the filter output. Consequently, the system characteristic i.e. power consumption, area usage, and processing time are also reduced. The proposed algorithm is more powerful when integrated with multiplierless algorithms such as distributed arithmetic (DA) in designing high order digital FIR filters. Here the DA usage eliminates the need for multipliers when implementing the multiply and accumulate unit (MAC) and the proposed algorithm will reduce the number of adders and addition operations needed through the minimization of the non-zero values coefficients to get the filter output.
Keywords: Pulse shaping Filter, Distributed Arithmetic, Optimization algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3177848 Monitoring of Spectrum Usage and Signal Identification Using Cognitive Radio
Authors: O. S. Omorogiuwa, E. J. Omozusi
Abstract:
The monitoring of spectrum usage and signal identification, using cognitive radio, is done to identify frequencies that are vacant for reuse. It has been established that ‘internet of things’ device uses secondary frequency which is free, thereby facing the challenge of interference from other users, where some primary frequencies are not being utilised. The design was done by analysing a specific frequency spectrum, checking if all the frequency stations that range from 87.5-108 MHz are presently being used in Benin City, Edo State, Nigeria. From the results, it was noticed that by using Software Defined Radio/Simulink, we were able to identify vacant frequencies in the range of frequency under consideration. Also, we were able to use the significance of energy detection threshold to reuse this vacant frequency spectrum, when the cognitive radio displays a zero output (that is decision H0), meaning that the channel is unoccupied. Hence, the analysis was able to find the spectrum hole and identify how it can be reused.
Keywords: Spectrum, interference, telecommunication, cognitive radio, frequency.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 870847 Content Based Sampling over Transactional Data Streams
Authors: Mansour Tarafdar, Mohammad Saniee Abade
Abstract:
This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.
Keywords: Sampling, data streams, closed frequent item set mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1710846 An Enhanced Slicing Algorithm Using Nearest Distance Analysis for Layer Manufacturing
Authors: M. Vatani, A. R. Rahimi, F. Brazandeh, A. Sanati nezhad
Abstract:
Although the STL (stereo lithography) file format is widely used as a de facto industry standard in the rapid prototyping industry due to its simplicity and ability to tessellation of almost all surfaces, but there are always some defects and shortcoming in their usage, which many of them are difficult to correct manually. In processing the complex models, size of the file and its defects grow extremely, therefore, correcting STL files become difficult. In this paper through optimizing the exiting algorithms, size of the files and memory usage of computers to process them will be reduced. In spite of type and extent of the errors in STL files, the tail-to-head searching method and analysis of the nearest distance between tails and heads techniques were used. As a result STL models sliced rapidly, and fully closed contours produced effectively and errorless.Keywords: Layer manufacturing, STL files, slicing algorithm, nearest distance analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4158845 Mining Implicit Knowledge to Predict Political Risk by Providing Novel Framework with Using Bayesian Network
Authors: Siavash Asadi Ghajarloo
Abstract:
Nowadays predicting political risk level of country has become a critical issue for investors who intend to achieve accurate information concerning stability of the business environments. Since, most of the times investors are layman and nonprofessional IT personnel; this paper aims to propose a framework named GECR in order to help nonexpert persons to discover political risk stability across time based on the political news and events. To achieve this goal, the Bayesian Networks approach was utilized for 186 political news of Pakistan as sample dataset. Bayesian Networks as an artificial intelligence approach has been employed in presented framework, since this is a powerful technique that can be applied to model uncertain domains. The results showed that our framework along with Bayesian Networks as decision support tool, predicted the political risk level with a high degree of accuracy.Keywords: Bayesian Networks, Data mining, GECRframework, Predicting political risk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2175844 Appraisal of Methods for Identifying, Mapping, and Modelling of Fluvial Erosion in a Mining Environment
Authors: F. F. Howard, I. Yakubu, C. B. Boye, J. S. Y. Kuma
Abstract:
Natural and human activities, such as mining operations, expose the natural soil to adverse environmental conditions, leading to contamination of soil, groundwater, and surface water, which has negative effects on humans, flora, and fauna. Bare or partly exposed soil is most liable to fluvial erosion. This paper enumerates various methods used to identify, map, and model fluvial erosion in a mining environment. Classical, Artificial Intelligence (AI), and GIS methods have been reviewed. One of the many classical methods used to estimate river erosion is the Revised Universal Soil Loss Equation (RUSLE) model. The RUSLE model is easy to use. Its reliance on empirical relationships that may not always be applicable to specific circumstances or locations is a flaw. Other classical models for estimating fluvial erosion are the Soil and Water Assessment Tool (SWAT) and the Universal Soil Loss Equation (USLE). These models offer a more complete understanding of the underlying physical processes and encompass a wider range of situations. Although more difficult to utilise, they depend on the availability and dependability of input data for correctness. AI can help deal with multivariate and complex difficulties and predict soil loss with higher accuracy than traditional methods, and also be used to build unique models for identifying degraded areas. AI techniques have become popular as an alternative predictor for degraded environments. However, this research proposed a hybrid of classical, AI, and GIS methods for efficient and effective modelling of fluvial erosion.
Keywords: Fluvial erosion, classical methods, Artificial Intelligence, Geographic Information System.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 189843 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events
Authors: Jaqueline M. R. Vieira
Abstract:
Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge dataset configurations.
Keywords: Brazil, classifiers, data-mining, Image Segmentation, oil well visualization, classifiers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2544842 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 780841 Iran’s Gas Flare Recovery Options Using MCDM
Authors: Halle Bakhteeyar, Azadeh Maroufmashat, Abbas Maleki, Sourena Sattari Khavas
Abstract:
In this paper, five options of Iran’s gas flare recovery have been compared via MCDM method. For developing the model, the weighing factor of each indicator an AHP method is used via the Expert-choice software. Several cases were considered in this analysis. They are defined where the priorities were defined always keeping one criterion in first position, while the priorities of the other criteria were defined by ordinal information defining the mutual relations of the criteria and the respective indicators. The results, show that amongst these cases, priority is obtained for CHP usage where availability indicator is highly weighted while the pipeline usage is obtained where environmental indicator highly weighted and the injection priority is obtained where economic indicator is highly weighted and also when the weighing factor of all the criteria are the same the Injection priority is obtained.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3449840 Genetic-based Anomaly Detection in Logs of Process Aware Systems
Authors: Hanieh Jalali, Ahmad Baraani
Abstract:
Nowaday-s, many organizations use systems that support business process as a whole or partially. However, in some application domains, like software development and health care processes, a normative Process Aware System (PAS) is not suitable, because a flexible support is needed to respond rapidly to new process models. On the other hand, a flexible Process Aware System may be vulnerable to undesirable and fraudulent executions, which imposes a tradeoff between flexibility and security. In order to make this tradeoff available, a genetic-based anomaly detection model for logs of Process Aware Systems is presented in this paper. The detection of an anomalous trace is based on discovering an appropriate process model by using genetic process mining and detecting traces that do not fit the appropriate model as anomalous trace; therefore, when used in PAS, this model is an automated solution that can support coexistence of flexibility and security.Keywords: Anomaly Detection, Genetic Algorithm, ProcessAware Systems, Process Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926839 Clustering Unstructured Text Documents Using Fading Function
Authors: Pallav Roxy, Durga Toshniwal
Abstract:
Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1985838 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data
Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara
Abstract:
This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.
Keywords: Text mining, sequential mining, time constraints, daily business reports.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488837 Knowledge Mining in Web-based Learning Environments
Authors: Nittaya Kerdprasop, Kittisak Kerdprasop
Abstract:
The state of the art in instructional design for computer-assisted learning has been strongly influenced by advances in information technology, Internet and Web-based systems. The emphasis of educational systems has shifted from training to learning. The course delivered has also been changed from large inflexible content to sequential small chunks of learning objects. The concepts of learning objects together with the advanced technologies of Web and communications support the reusability, interoperability, and accessibility design criteria currently exploited by most learning systems. These concepts enable just-in-time learning. We propose to extend theses design criteria further to include the learnability concept that will help adapting content to the needs of learners. The learnability concept offers a better personalization leading to the creation and delivery of course content more appropriate to performance and interest of each learner. In this paper we present a new framework of learning environments containing knowledge discovery as a tool to automatically learn patterns of learning behavior from learners' profiles and history.Keywords: Knowledge mining, Web-based learning, Learning environments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1787836 Customers’ Priority to Implement SSTs Using AHP Analysis
Authors: Mohammad Jafariahangari, Marjan Habibi, Miresmaeil Mirnabibaboli, Mirza Hassan Hosseini
Abstract:
Self-service technologies (SSTs) make an important contribution to the daily life of people nowadays. However, the introduction of SST does not lead to its usage. Thereby, this paper was an attempt on discovery of the most preferred SST in the customers’ point of view. To fulfill this aim, the Analytical Hierarchy Process (AHP) was applied based on Saaty’s questionnaire which was administered to the customers of e-banking services located in Golestan providence, northern Iran. This study used qualitative factors in association with the intention of consumers’ usage of SSTs to rank three SSTs: ATM, mobile banking and internet banking. The results showed that mobile banking get the highest weight in consumers’ point of view. This research can be useful both for managers and service providers and also for customers who intend to use e-banking.
Keywords: Analytical Hierarchy Process, Decision-making, Ebanking, Iran, Self-service technologies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2211835 Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems
Authors: Andrey V. Timofeev
Abstract:
This paper introduces an original method for guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers is relevant in multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.
Keywords: Lipschitz classifiers, confidence set, C-OTDR monitoring, classifiers accuracy, classifiers ensemble.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954834 A Hybrid Recommendation System Based On Association Rules
Authors: Ahmed Mohammed K. Alsalama
Abstract:
Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose1 a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.
Keywords: Data Mining, Association Rules, Recommendation Systems, Hybrid Systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3990833 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education
Authors: Eman AbuKhousa, Marwan Z. Bataineh
Abstract:
The 21st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.Keywords: Clustering analysis, community of practice, data mining, higher education, new faculty challenges, social networks, social influence, professional development.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974832 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance
Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar
Abstract:
Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.
Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2275831 Determination of the Bank's Customer Risk Profile: Data Mining Applications
Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge
Abstract:
In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.
Keywords: Client classification, loan suitability, risk rating, CART analysis, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1076830 How to Win Passengers and Influence Motorists? Lessons Learned from a Comparative Study of Global Transit Systems
Authors: Oliver F. Shyr, Yu-Hsuan Hsiao, David E. Andersson
Abstract:
Due to the call of global warming effects, city planners aim at actions for reducing carbon emission. One of the approaches is to promote the usage of public transportation system toward the transit-oriented-development. For example, rapid transit system in Taipei city and Kaohsiung city are opening. However, until November 2008 the average daily patronage counted only 113,774 passengers at Kaohsiung MRT systems, much less than which was expected. Now the crucial questions: how the public transport competes with private transport? And more importantly, what factors would enhance the use of public transport? To give the answers to those questions, our study first applied regression to analyze the factors attracting people to use public transport around cities in the world. It is shown in our study that the number of MRT stations, city population, cost of living, transit fare, density, gasoline price, and scooter being a major mode of transport are the major factors. Subsequently, our study identified successful and unsuccessful cities in regard of the public transport usage based on the diagnosis of regression residuals. Finally, by comparing transportation strategies adopted by those successful cities, our conclusion stated that Kaohsiung City could apply strategies such as increasing parking fees, reducing parking spaces in downtown area, and reducing transfer time by providing more bus services and public bikes to promote the usage of public transport.
Keywords: Public Transit System, Comparative Study, Transport Demand Management, Regression
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2094829 Usage of Internet Technology in Financial Education and Financial Inclusion by Students of Economics Universities
Authors: B. Frączek
Abstract:
The paper analyses the usage of the Internet by university students in Visegrad Countries (4V Countries) who study economic fields in their formal and informal financial education and captures the areas of untapped potential of Internet in educational processes. Higher education and training, technological readiness, and the financial market development are in the group of pillars, that are key for efficiency driven economies. These three pillars have become an inspiration to the research on using the Internet in the financial education among economic university students as the group of the best educated people in finance. The financial education is a process that allows for improving the level of financial literacy. In turn, the financial literacy it is the set of financial knowledge, skills, awareness and patterns influencing the financial decisions. The level of financial literacy influences the level of financial well-being of individuals, determines the scale of saving of households and at the same time gives the greater chance for sustainable and more predictable development of the financial market with the positive impact on economy. The financial literacy is necessary for each group of society but its appropriate level is desirable especially in respect of economics students as future participants of financial markets as well as the experts and advisors in financial decision making. The low level of financial literacy is the great problem of many target groups in both developing and developed countries and the financial education is seen as the best way of improving this situation. Also the financial inclusion plays the special role in enhancing the level of financial literacy in the aspect of education by practice as well as due to interrelation between level of financial literacy and degree of financial inclusion. Despite many initiatives under financial education, the level of financial literacy is still very low. Scientists still search for new ways of solving this problem. One of the proposal is more effective usage of the new technology in financial education, especially the Internet, because of the growing popularity of e-learning and the increasing number of Internet users, especially among young people who are called the Generation Net. Due to special role of the university students studying the economics fields for the future financial markets, students of four universities from Visegrad Countries (Czech Republic, Hungary, Poland and Slovakia) were invited to participate in the survey. The aim of the article is to present the level and ways of using the Internet technology in financial education and indicating the so far unused or underused opportunities.
Keywords: Financial education, financial inclusion, financial literacy, usage of Internet in education.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543828 Improving University Operations with Data Mining: Predicting Student Performance
Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević
Abstract:
The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.
Keywords: Data mining, knowledge discovery in databases, prediction models, student success.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2541827 Data Mining on the Router Logs for Statistical Application Classification
Authors: M. Rahmati, S.M. Mirzababaei
Abstract:
With the advance of information technology in the new era the applications of Internet to access data resources has steadily increased and huge amount of data have become accessible in various forms. Obviously, the network providers and agencies, look after to prevent electronic attacks that may be harmful or may be related to terrorist applications. Thus, these have facilitated the authorities to under take a variety of methods to protect the special regions from harmful data. One of the most important approaches is to use firewall in the network facilities. The main objectives of firewalls are to stop the transfer of suspicious packets in several ways. However because of its blind packet stopping, high process power requirements and expensive prices some of the providers are reluctant to use the firewall. In this paper we proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. By discriminating these data, an administrator may take an approach action against the user. This method is very fast and can be used simply in adjacent with the Internet routers.Keywords: Data Mining, Firewall, Optimization, Packetclassification, Statistical Pattern Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1659