Search results for: Distributed Data Mining
7403 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection
Authors: Yaojun Wang, Yaoqing Wang
Abstract:
Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.Keywords: Case-based reasoning, decision tree, stock selection, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17057402 A Methodology for Automatic Diversification of Document Categories
Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim
Abstract:
Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17457401 A Study on the Nostalgia Contents Analysis of Hometown Alumni in the Online Community
Authors: Heejin Yun, Juanjuan Zang
Abstract:
This study aims to analyze the text terms posted on an online community of people from the same hometown and to understand the topic and trend of nostalgia composed online. For this purpose, this study collected 144 writings which the natives of Yeongjong Island, Incheon, South-Korea have posted on an online community. And it analyzed association relations. As a result, online community texts means that just defining nostalgia as ‘a mind longing for hometown’ is not an enough explanation. Second, texts composed online have abstractness rather than persons’ individual stories. This study figured out the relationship that had the most critical and closest mutual association among the terms that constituted nostalgia through literature research and association rule concerning nostalgia. The result of this study has a characteristic that it summed up the core terms and emotions related to nostalgia.
Keywords: Nostalgia, cultural memory, data mining, online community.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10447400 Adaptive Shape Parameter (ASP) Technique for Local Radial Basis Functions (RBFs) and Their Application for Solution of Navier Strokes Equations
Authors: A. Javed, K. Djidjeli, J. T. Xing
Abstract:
The concept of adaptive shape parameters (ASP) has been presented for solution of incompressible Navier Strokes equations using mesh-free local Radial Basis Functions (RBF). The aim is to avoid ill-conditioning of coefficient matrices of RBF weights and inaccuracies in RBF interpolation resulting from non-optimized shape of basis functions for the cases where data points (or nodes) are not distributed uniformly throughout the domain. Unlike conventional approaches which assume globally similar values of RBF shape parameters, the presented ASP technique suggests that shape parameter be calculated exclusively for each data point (or node) based on the distribution of data points within its own influence domain. This will ensure interpolation accuracy while still maintaining well conditioned system of equations for RBF weights. Performance and accuracy of ASP technique has been tested by evaluating derivatives and laplacian of a known function using RBF in Finite difference mode (RBFFD), with and without the use of adaptivity in shape parameters. Application of adaptive shape parameters (ASP) for solution of incompressible Navier Strokes equations has been presented by solving lid driven cavity flow problem on mesh-free domain using RBF-FD. The results have been compared for fixed and adaptive shape parameters. Improved accuracy has been achieved with the use of ASP in RBF-FD especially at regions where larger gradients of field variables exist.
Keywords: CFD, Meshless Particle Method, Radial Basis Functions, Shape Parameters
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28307399 Overcoming the Obstacles to Green Campus Implementation in Indonesia
Authors: Mia Wimala, Emma Akmalah, Ira Irawati, M. Rangga Sururi
Abstract:
One way that has been aggressively implemented in creating a sustainable environment nowadays is through the implementation of green building concept. In order to ensure the success of its implementation, the support and initiation from educational institutions, especially higher education institutions are indispensable. This research was conducted to figure out the obstacles restraining the success of green campus implementation in Indonesia, as well as to propose strategies to overcome those obstacles. The data presented in this paper are mainly derived from interview and questionnaire distributed randomly to the staffs and students in 10 (ten) major institutions around Jakarta and West Java area. The data were further analyzed using ANOVA and SWOT analysis. According to 182 respondents, it is found that resistance to change, inadequate knowledge, information and understanding, no penalty for any environmental violation, lack of reward for green campus practices, lack of stringent regulations/laws, lack of management commitment, insufficient funds are the obstacles to the green campus movement in Indonesia. In addition, out of 6 criteria considered in UI GreenMetric World Ranking, education was the only criteria that had no significant difference between public and private universities in generating the green campus performance. The work concludes with recommendation of strategies to improve the implementation of green campus in the future.
Keywords: Green campus, obstacles, sustainable, higher education institutions.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15787398 Fuzzy Clustering Analysis in Real Estate Companies in China
Authors: Jianfeng Li, Feng Jin, Xiaoyu Yang
Abstract:
This paper applies fuzzy clustering algorithm in classifying real estate companies in China according to some general financial indexes, such as income per share, share accumulation fund, net profit margins, weighted net assets yield and shareholders' equity. By constructing and normalizing initial partition matrix, getting fuzzy similar matrix with Minkowski metric and gaining the transitive closure, the dynamic fuzzy clustering analysis for real estate companies is shown clearly that different clustered result change gradually with the threshold reducing, and then, it-s shown there is the similar relationship with the prices of those companies in stock market. In this way, it-s great valuable in contrasting the real estate companies- financial condition in order to grasp some good chances of investment, and so on.
Keywords: Fuzzy clustering algorithm, data mining, real estate company, financial analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19177397 Recommender Systems Using Ensemble Techniques
Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim
Abstract:
This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.
Keywords: Product recommender system, Ensemble technique, Association rules, Decision tree, Artificial neural networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 42227396 Imputation Technique for Feature Selection in Microarray Data Set
Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam
Abstract:
Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.
Keywords: DNA microarray, feature selection, missing data, bioinformatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27917395 Models of State Organization and Influence over Collective Identity and Nationalism in Spain
Authors: Muñoz-Sanchez, Victor Manuel, Perez-Flores, Antonio Manuel
Abstract:
The main objective of this paper is to establish the relationship between models of state organization and the various types of collective identity expressed by the Spanish. The question of nationalism and identity ascription in Spain has always been a topic of special importance due to the presence in that country of territories where the population emits very different opinions of nationalist sentiment than the rest of Spain. The current situation of sovereignty challenge of Catalonia to the central government exemplifies the importance of the subject matter. In order to analyze this process of interrelation, we use a secondary data mining by applying the multiple correspondence analysis technique (MCA). As a main result a typology of four types of expression of collective identity based on models of State organization are shown, which are connected with the party position on this issue.Keywords: Models of organization of the state, nationalism, collective identity, Spain, political parties.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16887394 Automatic Real-Patient Medical Data De-Identification for Research Purposes
Authors: Petr Vcelak, Jana Kleckova
Abstract:
Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16427393 Distributed Cost-Based Scheduling in Cloud Computing Environment
Authors: Rupali, Anil Kumar Jaiswal
Abstract:
Cloud computing can be defined as one of the prominent technologies that lets a user change, configure and access the services online. it can be said that this is a prototype of computing that helps in saving cost and time of a user practically the use of cloud computing can be found in various fields like education, health, banking etc. Cloud computing is an internet dependent technology thus it is the major responsibility of Cloud Service Providers(CSPs) to care of data stored by user at data centers. Scheduling in cloud computing environment plays a vital role as to achieve maximum utilization and user satisfaction cloud providers need to schedule resources effectively. Job scheduling for cloud computing is analyzed in the following work. To complete, recreate the task calculation, and conveyed scheduling methods CloudSim3.0.3 is utilized. This research work discusses the job scheduling for circulated processing condition also by exploring on this issue we find it works with minimum time and less cost. In this work two load balancing techniques have been employed: ‘Throttled stack adjustment policy’ and ‘Active VM load balancing policy’ with two brokerage services ‘Advanced Response Time’ and ‘Reconfigure Dynamically’ to evaluate the VM_Cost, DC_Cost, Response Time, and Data Processing Time. The proposed techniques are compared with Round Robin scheduling policy.
Keywords: Physical machines, virtual machines, support for repetition, self-healing, highly scalable programming model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8507392 Fuzzy Wavelet Packet based Feature Extraction Method for Multifunction Myoelectric Control
Authors: Rami N. Khushaba, Adel Al-Jumaily
Abstract:
The myoelectric signal (MES) is one of the Biosignals utilized in helping humans to control equipments. Recent approaches in MES classification to control prosthetic devices employing pattern recognition techniques revealed two problems, first, the classification performance of the system starts degrading when the number of motion classes to be classified increases, second, in order to solve the first problem, additional complicated methods were utilized which increase the computational cost of a multifunction myoelectric control system. In an effort to solve these problems and to achieve a feasible design for real time implementation with high overall accuracy, this paper presents a new method for feature extraction in MES recognition systems. The method works by extracting features using Wavelet Packet Transform (WPT) applied on the MES from multiple channels, and then employs Fuzzy c-means (FCM) algorithm to generate a measure that judges on features suitability for classification. Finally, Principle Component Analysis (PCA) is utilized to reduce the size of the data before computing the classification accuracy with a multilayer perceptron neural network. The proposed system produces powerful classification results (99% accuracy) by using only a small portion of the original feature set.Keywords: Biomedical Signal Processing, Data mining andInformation Extraction, Machine Learning, Rehabilitation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17377391 Improvement of Voltage Profile of Grid Integrated Wind Distributed Generation by SVC
Authors: Fariba Shavakhi Zavareh, Hadi Fotoohabadi, Reza Sedaghati
Abstract:
Due to the continuous increment of the load demand, identification of weaker buses, improvement of voltage profile and power losses in the context of the voltage stability problems has become one of the major concerns for the larger, complex, interconnected power systems. The objective of this paper is to review the impact of Flexible AC Transmission System (FACTS) controller in Wind generators connected electrical network for maintaining voltage stability. Wind energy could be the growing renewable energy due to several advantages. The influence of wind generators on power quality is a significant issue; non uniform power production causes variations in system voltage and frequency. Therefore, wind farm requires high reactive power compensation; the advances in high power semiconducting devices have led to the development of FACTS. The FACTS devices such as for example SVC inject reactive power into the system which helps in maintaining a better voltage profile. The performance is evaluated on an IEEE 14 bus system, two wind generators are connected at low voltage buses to meet the increased load demand and SVC devices are integrated at the buses with wind generators to keep voltage stability. Power flows, nodal voltage magnitudes and angles of the power network are obtained by iterative solutions using MIPOWER.Keywords: Voltage Profile, FACTS Device, SVC, Distributed Generation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26637390 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range
Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu
Abstract:
Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12087389 Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering
Authors: Mohamed A. Mahfouz, M. A. Ismail
Abstract:
This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.Keywords: Data Mining, Fuzzy Clustering, Relational Clustering, Medoid-Based Clustering, Cluster Analysis, Unsupervised Learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24027388 Vibration Control of Building Using Multiple Tuned Mass Dampers Considering Real Earthquake Time History
Authors: Rama Debbarma, Debanjan Das
Abstract:
The performance of multiple tuned mass dampers to mitigate the seismic vibration of structures considering real time history data is investigated in this paper. Three different real earthquake time history data like Kobe, Imperial Valley and Mammoth Lake are taken in the present study. The multiple tuned mass dampers (MTMD) are distributed at each storey. For comparative study, single tuned mass damper (STMD) is installed at top of the similar structure. This study is conducted for a fixed mass ratio (5%) and fixed damping ratio (5%) of structures. Numerical study is performed to evaluate the effectiveness of MTMDs and overall system performance. The displacement, acceleration, base shear and storey drift are obtained for both combined system (structure with MTMD and structure with STMD) for all earthquakes. The same responses are also obtained for structure without damper system. From obtained results, it is investigated that the MTMD configuration is more effective for controlling the seismic response of the primary system with compare to STMD configuration.Keywords: Earthquake, multiple tuned mass dampers, single tuned mass damper, time history.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19947387 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling
Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal
Abstract:
Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.
Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8377386 Designing of Virtual Laboratories Based on Extended Event Driving Simulation Method
Abstract:
Here are many methods for designing and implementation of virtual laboratories, because of their special features. The most famous architectural designs are based on the events. This model of architecting is so efficient for virtual laboratories implemented on a local network. Later, serviceoriented architecture, gave the remote access ability to them and Peer-To-Peer architecture, hired to exchanging data with higher quality and more speed. Other methods, such as Agent- Based architecting, are trying to solve the problems of distributed processing in a complicated laboratory system. This study, at first, reviews the general principles of designing a virtual laboratory, and then compares the different methods based on EDA, SOA and Agent-Based architecting to present weaknesses and strengths of each method. At the end, we make the best choice for design, based on existing conditions and requirements.Keywords: Virtual Laboratory, Software Engineering, Simulation, EDA, SOA, Agent-Based Architecting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16007385 Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules
Authors: Suraiya Jabin, Kamal K. Bharadwaj
Abstract:
This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.Keywords: Hierarchical Production Rule, Data Mining, Learning Classifier System, Fuzzy Subsumption Relation, Subsumption matrix, Reinforcement Learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14567384 An Efficient Run Time Interface for Heterogeneous Architecture of Large Scale Supercomputing System
Authors: Prabu D., Andrew Aaron James, Vanamala V., Vineeth Simon, Sanjeeb Kumar Deka, Sridharan R., Prahlada Rao B.B., Mohanram N.
Abstract:
In this paper we propose a novel Run Time Interface (RTI) technique to provide an efficient environment for MPI jobs on the heterogeneous architecture of PARAM Padma. It suggests an innovative, unified framework for the job management interface system in parallel and distributed computing. This approach employs proxy scheme. The implementation shows that the proposed RTI is highly scalable and stable. Moreover RTI provides the storage access for the MPI jobs in various operating system platforms and improve the data access performance through high performance C-DAC Parallel File System (C-PFS). The performance of the RTI is evaluated by using the standard HPC benchmark suites and the simulation results show that the proposed RTI gives good performance on large scale supercomputing system.Keywords: RTI, C-MPI, C-PFS, Scheduler Interface.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14417383 A Rough Sets Approach for Relevant Internet/Web Online Searching
Authors: Erika Martinez Ramirez, Rene V. Mayorga
Abstract:
The internet is constantly expanding. Identifying web links of interest from web browsers requires users to visit each of the links listed, individually until a satisfactory link is found, therefore those users need to evaluate a considerable amount of links before finding their link of interest; this can be tedious and even unproductive. By incorporating web assistance, web users could be benefited from reduced time searching on relevant websites. In this paper, a rough set approach is presented, which facilitates classification of unlimited available e-vocabulary, to assist web users in reducing search times looking for relevant web sites. This approach includes two methods for identifying relevance data on web links based on the priority and percentage of relevance. As a result of these methods, a list of web sites is generated in priority sequence with an emphasis of the search criteria.Keywords: Web search, Web Mining, Rough Sets, Web Intelligence, Intelligent Portals, Relevance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15507382 A Survey of Job Scheduling and Resource Management in Grid Computing
Authors: Raksha Sharma, Vishnu Kant Soni, Manoj Kumar Mishra, Prachet Bhuyan
Abstract:
Grid computing is a form of distributed computing that involves coordinating and sharing computational power, data storage and network resources across dynamic and geographically dispersed organizations. Scheduling onto the Grid is NP-complete, so there is no best scheduling algorithm for all grid computing systems. An alternative is to select an appropriate scheduling algorithm to use in a given grid environment because of the characteristics of the tasks, machines and network connectivity. Job and resource scheduling is one of the key research area in grid computing. The goal of scheduling is to achieve highest possible system throughput and to match the application need with the available computing resources. Motivation of the survey is to encourage the amateur researcher in the field of grid computing, so that they can understand easily the concept of scheduling and can contribute in developing more efficient scheduling algorithm. This will benefit interested researchers to carry out further work in this thrust area of research.Keywords: Grid Computing, Job Scheduling, ResourceScheduling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34107381 Using Multi-Arm Bandits to Optimize Game Play Metrics and Effective Game Design
Authors: Kenny Raharjo, Ramon Lawrence
Abstract:
Game designers have the challenging task of building games that engage players to spend their time and money on the game. There are an infinite number of game variations and design choices, and it is hard to systematically determine game design choices that will have positive experiences for players. In this work, we demonstrate how multi-arm bandits can be used to automatically explore game design variations to achieve improved player metrics. The advantage of multi-arm bandits is that they allow for continuous experimentation and variation, intrinsically converge to the best solution, and require no special infrastructure to use beyond allowing minor game variations to be deployed to users for evaluation. A user study confirms that applying multi-arm bandits was successful in determining the preferred game variation with highest play time metrics and can be a useful technique in a game designer's toolkit.Keywords: Game design, multi-arm bandit, design exploration and data mining, player metric optimization and analytics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15357380 Object Identification with Color, Texture, and Object-Correlation in CBIR System
Authors: Awais Adnan, Muhammad Nawaz, Sajid Anwar, Tamleek Ali, Muhammad Ali
Abstract:
Needs of an efficient information retrieval in recent years in increased more then ever because of the frequent use of digital information in our life. We see a lot of work in the area of textual information but in multimedia information, we cannot find much progress. In text based information, new technology of data mining and data marts are now in working that were started from the basic concept of database some where in 1960. In image search and especially in image identification, computerized system at very initial stages. Even in the area of image search we cannot see much progress as in the case of text based search techniques. One main reason for this is the wide spread roots of image search where many area like artificial intelligence, statistics, image processing, pattern recognition play their role. Even human psychology and perception and cultural diversity also have their share for the design of a good and efficient image recognition and retrieval system. A new object based search technique is presented in this paper where object in the image are identified on the basis of their geometrical shapes and other features like color and texture where object-co-relation augments this search process. To be more focused on objects identification, simple images are selected for the work to reduce the role of segmentation in overall process however same technique can also be applied for other images.Keywords: Object correlation, Geometrical shape, Color, texture, features, contents.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20287379 Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran
Authors: Samira Malekmohammadi Golsefid, Mehdi Ghazanfari, Somayeh Alizadeh
Abstract:
The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.Keywords: Customers segmentation, Customer relationship management, Clustering, Data Mining
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22877378 Post Occupancy Life Cycle Analysis of a Green Building Energy Consumption at the University of Western Ontario in London - Canada
Authors: M. Bittencourt, E. K. Yanful, D. Velasquez, A. E. Jungles
Abstract:
The CMLP building was developed to be a model for sustainability with strategies to reduce water, energy and pollution, and to provide a healthy environment for the building occupants. The aim of this paper is to investigate the environmental effects of energy used by this building. A LCA (life cycle analysis) was led to measure the real environmental effects produced by the use of energy. The impact categories most affected by the energy use were found to be the human health effects, as well as ecotoxicity. Natural gas extraction, uranium milling for nuclear energy production, and the blasting for mining and infrastructure construction are the processes contributing the most to emissions in the human health effect. Data comparing LCA results of CMLP building with a conventional building results showed that energy used by the CMLP building has less damage for the environment and human health than a conventional building.Keywords: Environmental Impacts, Green buildings, Life CycleAnalysis, Sustainability
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17747377 Optimal Simultaneous Sizing and Siting of DGs and Smart Meters Considering Voltage Profile Improvement in Active Distribution Networks
Authors: T. Sattarpour, D. Nazarpour
Abstract:
This paper investigates the effect of simultaneous placement of DGs and smart meters (SMs), on voltage profile improvement in active distribution networks (ADNs). A substantial center of attention has recently been on responsive loads initiated in power system problem studies such as distributed generations (DGs). Existence of responsive loads in active distribution networks (ADNs) would have undeniable effect on sizing and siting of DGs. For this reason, an optimal framework is proposed for sizing and siting of DGs and SMs in ADNs. SMs are taken into consideration for the sake of successful implementing of demand response programs (DRPs) such as direct load control (DLC) with end-side consumers. Looking for voltage profile improvement, the optimization procedure is solved by genetic algorithm (GA) and tested on IEEE 33-bus distribution test system. Different scenarios with variations in the number of DG units, individual or simultaneous placing of DGs and SMs, and adaptive power factor (APF) mode for DGs to support reactive power have been established. The obtained results confirm the significant effect of DRPs and APF mode in determining the optimal size and site of DGs to be connected in ADN resulting to the improvement of voltage profile as well.
Keywords: Active distribution network (ADN), distributed generations (DGs), smart meters (SMs), demand response programs (DRPs), adaptive power factor (APF).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17707376 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.
Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9487375 Steganalysis of Data Hiding via Halftoning and Coordinate Projection
Authors: Woong Hee Kim, Ilhwan Park
Abstract:
Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.Keywords: Steganography, steganalysis, digital halftoning, data hiding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16007374 Biological Data Integration using SOA
Authors: Noura Meshaan Al-Otaibi, Amin Yousef Noaman
Abstract:
Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. This research suggests the use of Service Oriented Architecture (SOA) to integrate biological data from different data sources. This work shows SOA will solve the problems that facing integration process and if the biologist scientists can access the biological data in easier way. There are several methods to implement SOA but web service is the most popular method. The Microsoft .Net Framework used to implement proposed architecture.Keywords: Bioinformatics, Biological data, Data Integration, SOA and Web Services.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2473