Search results for: distributed data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8082

Search results for: distributed data mining

7812 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1448
7811 Benefits and Issues of Open-Cut Coal Mining on the Socio-Economic Environment - The Iban Community in Mukah, Sarawak, Malaysia

Authors: Edward Lim

Abstract:

This paper deals principally with the socio-economic impact on the local Iban community in Mukah Division, Sarawak; with the commencement of the open-cut coal mining industry since 2003. To-date there are no actual studies being carried out by either the public or private sector to truly analyze how the Iban community is coping with the advent of a large influx of cash into their society. The Iban community has traditionally been practicing shifting cultivation and farming of domesticated animals; with a portion of the younger generation working as laborers and professional. This paper represents the views and observations of the author supported by some statistical facts extracted from published articles and non-published reports. The paper deals primarily in the following areas: • Background of the coal mining industry in Mukah Division, Sarawak; • Benefits of the coal mining industry towards the Iban community; • Issues / Problems arise in the Iban community because of the presence of the coal mining industry; and • Possible actions that need to be taken to overcome these issues/ problems.

Keywords: Coal Mining, Iban Community, Malaysia, Sub-Bituminous Coal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2402
7810 Replicating Data Objects in Large-scale Distributed Computing Systems using Extended Vickrey Auction

Authors: Samee Ullah Khan, Ishfaq Ahmad

Abstract:

This paper proposes a novel game theoretical technique to address the problem of data object replication in largescale distributed computing systems. The proposed technique draws inspiration from computational economic theory and employs the extended Vickrey auction. Specifically, players in a non-cooperative environment compete for server-side scarce memory space to replicate data objects so as to minimize the total network object transfer cost, while maintaining object concurrency. Optimization of such a cost in turn leads to load balancing, fault-tolerance and reduced user access time. The method is experimentally evaluated against four well-known techniques from the literature: branch and bound, greedy, bin-packing and genetic algorithms. The experimental results reveal that the proposed approach outperforms the four techniques in both the execution time and solution quality.

Keywords: Auctions, data replication, pricing, static allocation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1424
7809 Distributed Case Based Reasoning for Intelligent Tutoring System: An Agent Based Student Modeling Paradigm

Authors: O. P. Rishi, Rekha Govil, Madhavi Sinha

Abstract:

Online learning with Intelligent Tutoring System (ITS) is becoming very popular where the system models the student-s learning behavior and presents to the student the learning material (content, questions-answers, assignments) accordingly. In today-s distributed computing environment, the tutoring system can take advantage of networking to utilize the model for a student for students from other similar groups. In the present paper we present a methodology where using Case Based Reasoning (CBR), ITS provides student modeling for online learning in a distributed environment with the help of agents. The paper describes the approach, the architecture, and the agent characteristics for such system. This concept can be deployed to develop ITS where the tutor can author and the students can learn locally whereas the ITS can model the students- learning globally in a distributed environment. The advantage of such an approach is that both the learning material (domain knowledge) and student model can be globally distributed thus enhancing the efficiency of ITS with reducing the bandwidth requirement and complexity of the system.

Keywords: CBR, ITS, student modeling, distributed system, intelligent agent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2123
7808 Harnessing Replication in Object Allocation

Authors: H. T. Barney, G. C. Low

Abstract:

The design of distributed systems involves the partitioning of the system into components or partitions and the allocation of these components to physical nodes. Techniques have been proposed for both the partitioning and allocation process. However these techniques suffer from a number of limitations. For instance object replication has the potential to greatly improve the performance of an object orientated distributed system but can be difficult to use effectively and there are few techniques that support the developer in harnessing object replication. This paper presents a methodological technique that helps developers decide how objects should be allocated in order to improve performance in a distributed system that supports replication. The performance of the proposed technique is demonstrated and tested on an example system.

Keywords: Allocation, Distributed Systems, Replication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1399
7807 An Application for Web Mining Systems with Services Oriented Architecture

Authors: Thiago M. R. Dias, Gray F. Moita, Paulo E. M. Almeida

Abstract:

Although the World Wide Web is considered the largest source of information there exists nowadays, due to its inherent dynamic characteristics, the task of finding useful and qualified information can become a very frustrating experience. This study presents a research on the information mining systems in the Web; and proposes an implementation of these systems by means of components that can be built using the technology of Web services. This implies that they can encompass features offered by a services oriented architecture (SOA) and specific components may be used by other tools, independent of platforms or programming languages. Hence, the main objective of this work is to provide an architecture to Web mining systems, divided into stages, where each step is a component that will incorporate the characteristics of SOA. The separation of these steps was designed based upon the existing literature. Interesting results were obtained and are shown here.

Keywords: Web Mining, Service Oriented Architecture, WebServices.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430
7806 Analysis of Road Repairs in Undermined Areas

Authors: Tomáš Seidler, Marek Mihola, Denisa Cihlarova

Abstract:

The article presents analysis results of maps of expected subsidence in undermined areas for road repair management. The analysis was done in the area of Karvina district in the Czech Republic, including undermined areas with ongoing deep mining activities or finished deep mining in years 2003 - 2009. The article discusses the possibilities of local road maintenance authorities to determine areas that will need most repairs in the future with limited data available. Using the expected subsidence maps new map of surface curvature was calculated. Combined with road maps and historical data about repairs the result came for five main categories of undermined areas, proving very simple tool for management.

Keywords: GIS, Map of Subsidence, Road, Undermined Area

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1266
7805 Parallel-Distributed Software Implementation of Buchberger Algorithm

Authors: Praloy Kumar Biswas, Prof. Dipanwita Roy Chowdhury

Abstract:

Grobner basis calculation forms a key part of computational commutative algebra and many other areas. One important ramification of the theory of Grobner basis provides a means to solve a system of non-linear equations. This is why it has become very important in the areas where the solution of non-linear equations is needed, for instance in algebraic cryptanalysis and coding theory. This paper explores on a parallel-distributed implementation for Grobner basis calculation over GF(2). For doing so Buchberger algorithm is used. OpenMP and MPI-C language constructs have been used to implement the scheme. Some relevant results have been furnished to compare the performances between the standalone and hybrid (parallel-distributed) implementation.

Keywords: Grobner basis, Buchberger Algorithm, Distributed- Parallel Computation, OpenMP, MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790
7804 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline M. R. Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge dataset configurations.

Keywords: Brazil, classifiers, data-mining, Image Segmentation, oil well visualization, classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2509
7803 Metadata Update Mechanism Improvements in Data Grid

Authors: S. Farokhzad, M. Reza Salehnamadi

Abstract:

Grid environments include aggregation of geographical distributed resources. Grid is put forward in three types of computational, data and storage. This paper presents a research on data grid. Data grid is used for covering and securing accessibility to data from among many heterogeneous sources. Users are not worry on the place where data is located in it, provided that, they should get access to the data. Metadata is used for getting access to data in data grid. Presently, application metadata catalogue and SRB middle-ware package are used in data grids for management of metadata. At this paper, possibility of updating, streamlining and searching is provided simultaneously and rapidly through classified table of preserving metadata and conversion of each table to numerous tables. Meanwhile, with regard to the specific application, the most appropriate and best division is set and determined. Concurrency of implementation of some of requests and execution of pipeline is adaptability as a result of this technique.

Keywords: Grids, data grid, metadata, update.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1654
7802 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2069
7801 Distributed System Computing Resource Scheduling Algorithm Based on Deep Reinforcement Learning

Authors: Yitao Lei, Xingxiang Zhai, Burra Venkata Durga Kumar

Abstract:

As the quantity and complexity of computing in large-scale software systems increase, distributed system computing becomes increasingly important. The distributed system realizes high-performance computing by collaboration between different computing resources. If there are no efficient resource scheduling resources, the abuse of distributed computing may cause resource waste and high costs. However, resource scheduling is usually an NP-hard problem, so we cannot find a general solution. However, some optimization algorithms exist like genetic algorithm, ant colony optimization, etc. The large scale of distributed systems makes this traditional optimization algorithm challenging to work with. Heuristic and machine learning algorithms are usually applied in this situation to ease the computing load. As a result, we do a review of traditional resource scheduling optimization algorithms and try to introduce a deep reinforcement learning method that utilizes the perceptual ability of neural networks and the decision-making ability of reinforcement learning. Using the machine learning method, we try to find important factors that influence the performance of distributed system computing and help the distributed system do an efficient computing resource scheduling. This paper surveys the application of deep reinforcement learning on distributed system computing resource scheduling. The research proposes a deep reinforcement learning method that uses a recurrent neural network to optimize the resource scheduling. The paper concludes the challenges and improvement directions for Deep Reinforcement Learning-based resource scheduling algorithms.

Keywords: Resource scheduling, deep reinforcement learning, distributed system, artificial intelligence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 406
7800 The Link between Distributed Leadership and Educational Outcomes: An Overview of Research

Authors: Maria Eliophotou Menon

Abstract:

School leadership is commonly considered to have a significant influence on school effectiveness and improvement. Effective school leaders are expected to successfully introduce and support change and innovation at the school unit. Despite an abundance of studies on educational leadership, very few studies have provided evidence on the link between leadership models, and specific educational and school outcomes. This is true of a popular contemporary approach to leadership, namely, distributed leadership. The paper provides an overview of research findings on the effect of distributed leadership on educational outcomes. The theoretical basis for this approach to leadership is presented, with reference to methodological and research limitations. The paper discusses research findings and draws their implications for educational research on school leadership.

Keywords: Distributed leadership, educational outcomes, leadership research.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3664
7799 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance

Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar

Abstract:

Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.

Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2231
7798 Parallel Querying of Distributed Ontologies with Shared Vocabulary

Authors: Sharjeel Aslam, Vassil Vassilev, Karim Ouazzane

Abstract:

Ontologies and various semantic repositories became a convenient approach for implementing model-driven architectures of distributed systems on the Web. SPARQL is the standard query language for querying such. However, although SPARQL is well-established standard for querying semantic repositories in RDF and OWL format and there are commonly used APIs which supports it, like Jena for Java, its parallel option is not incorporated in them. This article presents a complete framework consisting of an object algebra for parallel RDF and an index-based implementation of the parallel query engine capable of dealing with the distributed RDF ontologies which share common vocabulary. It has been implemented in Java, and for validation of the algorithms has been applied to the problem of organizing virtual exhibitions on the Web.

Keywords: Distributed ontologies, parallel querying, semantic indexing, shared vocabulary, SPARQL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 610
7797 Heuristics Analysis for Distributed Scheduling using MONARC Simulation Tool

Authors: Florin Pop

Abstract:

Simulation is a very powerful method used for highperformance and high-quality design in distributed system, and now maybe the only one, considering the heterogeneity, complexity and cost of distributed systems. In Grid environments, foe example, it is hard and even impossible to perform scheduler performance evaluation in a repeatable and controllable manner as resources and users are distributed across multiple organizations with their own policies. In addition, Grid test-beds are limited and creating an adequately-sized test-bed is expensive and time consuming. Scalability, reliability and fault-tolerance become important requirements for distributed systems in order to support distributed computation. A distributed system with such characteristics is called dependable. Large environments, like Cloud, offer unique advantages, such as low cost, dependability and satisfy QoS for all users. Resource management in large environments address performant scheduling algorithm guided by QoS constrains. This paper presents the performance evaluation of scheduling heuristics guided by different optimization criteria. The algorithms for distributed scheduling are analyzed in order to satisfy users constrains considering in the same time independent capabilities of resources. This analysis acts like a profiling step for algorithm calibration. The performance evaluation is based on simulation. The simulator is MONARC, a powerful tool for large scale distributed systems simulation. The novelty of this paper consists in synthetic analysis results that offer guidelines for scheduler service configuration and sustain the empirical-based decision. The results could be used in decisions regarding optimizations to existing Grid DAG Scheduling and for selecting the proper algorithm for DAG scheduling in various actual situations.

Keywords: Scheduling, Simulation, Performance Evaluation, QoS, Distributed Systems, MONARC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708
7796 An Efficient Algorithm for Reliability Lower Bound of Distributed Systems

Authors: Mohamed H. S. Mohamed, Yang Xiao-zong, Liu Hong-wei, Wu Zhi-bo

Abstract:

The reliability of distributed systems and computer networks have been modeled by a probabilistic network or a graph G. Computing the residual connectedness reliability (RCR), denoted by R(G), under the node fault model is very useful, but is an NP-hard problem. Since it may need exponential time of the network size to compute the exact value of R(G), it is important to calculate its tight approximate value, especially its lower bound, at a moderate calculation time. In this paper, we propose an efficient algorithm for reliability lower bound of distributed systems with unreliable nodes. We also applied our algorithm to several typical classes of networks to evaluate the lower bounds and show the effectiveness of our algorithm.

Keywords: Distributed systems, probabilistic network, residual connectedness reliability, lower bound.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1630
7795 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2555
7794 An HCI Template for Distributed Applications

Authors: Xizhi Li

Abstract:

Both software applications and their development environment are becoming more and more distributed. This trend impacts not only the way software computes, but also how it looks. This article proposes a Human Computer Interface (HCI) template from three representative applications we have developed. These applications include a Multi-Agent System based software, a 3D Internet computer game with distributed game world logic, and a programming language environment used in constructing distributed neural network and its visualizations. HCI concepts that are common to these applications are described in abstract terms in the template. These include off-line presentation of global entities, entities inside a hierarchical namespace, communication and languages, reconfiguration of entity references in a graph, impersonation and access right, etc. We believe the metaphor that underlies an HCI concept as well as the relationships between a bunch of HCI concepts are crucial to the design of software systems and vice versa.

Keywords: HCI, MAS, computer game, programming language

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1483
7793 Time Comparative Simulator for Distributed Process Scheduling Algorithms

Authors: Nazleeni Samiha Haron, Anang Hudaya Muhamad Amin, Mohd Hilmi Hasan, Izzatdin Abdul Aziz, Wirdhayu Mohd Wahid

Abstract:

In any distributed systems, process scheduling plays a vital role in determining the efficiency of the system. Process scheduling algorithms are used to ensure that the components of the system would be able to maximize its utilization and able to complete all the processes assigned in a specified period of time. This paper focuses on the development of comparative simulator for distributed process scheduling algorithms. The objectives of the works that have been carried out include the development of the comparative simulator, as well as to implement a comparative study between three distributed process scheduling algorithms; senderinitiated, receiver-initiated and hybrid sender-receiver-initiated algorithms. The comparative study was done based on the Average Waiting Time (AWT) and Average Turnaround Time (ATT) of the processes involved. The simulation results show that the performance of the algorithms depends on the number of nodes in the system.

Keywords: Distributed Systems, Load Sharing, Process Scheduling, AWT and ATT

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576
7792 Revisiting Distributed Protocols for Mobility at the Application Layer

Authors: N. Nouali, H. Drias, A. Doucet

Abstract:

During more than a decade, many proposals and standards have been designed to deal with the mobility issues; however, there are still some serious limitations in basing solutions on them. In this paper we discuss the possibility of handling mobility at the application layer. We do this while revisiting the conventional implementation of the Two Phase Commit (2PC) protocol which is a fundamental asset of transactional technology for ensuring the consistent commitment of distributed transactions. The solution is based on an execution framework providing an efficient extension that is aware of the mobility and preserves the 2PC principle.

Keywords: Application layer, distributed mobile protocols, mobility management, mobile transaction processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1569
7791 Distribution Feeder Reconfiguration Considering Distributed Generators

Authors: R. Khorshidi , T. Niknam, M. Nayeripour

Abstract:

Recently, distributed generation technologies have received much attention for the potential energy savings and reliability assurances that might be achieved as a result of their widespread adoption. Fueling the attention have been the possibilities of international agreements to reduce greenhouse gas emissions, electricity sector restructuring, high power reliability requirements for certain activities, and concern about easing transmission and distribution capacity bottlenecks and congestion. So it is necessary that impact of these kinds of generators on distribution feeder reconfiguration would be investigated. This paper presents an approach for distribution reconfiguration considering Distributed Generators (DGs). The objective function is summation of electrical power losses A Tabu search optimization is used to solve the optimal operation problem. The approach is tested on a real distribution feeder.

Keywords: Distributed Generator, Daily Optimal Operation, Genetic Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1659
7790 Optimal Planning of Dispatchable Distributed Generators for Power Loss Reduction in Unbalanced Distribution Networks

Authors: Mahmoud M. Othman, Y. G. Hegazy, A. Y. Abdelaziz

Abstract:

This paper proposes a novel heuristic algorithm that aims to determine the best size and location of distributed generators in unbalanced distribution networks. The proposed heuristic algorithm can deal with the planning cases where power loss is to be optimized without violating the system practical constraints. The distributed generation units in the proposed algorithm is modeled as voltage controlled node with the flexibility to be converted to constant power factor node in case of reactive power limit violation. The proposed algorithm is implemented in MATLAB and tested on the IEEE 37 -node feeder. The results obtained show the effectiveness of the proposed algorithm. 

Keywords: Distributed generation, heuristic approach, Optimization, planning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1768
7789 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: Data mining, knowledge discovery in databases, prediction models, student success.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2455
7788 Research on IBR-Driven Distributed Collaborative Visualization System

Authors: Yin Runmin, Song Changfeng

Abstract:

Image-based Rendering(IBR) techniques recently reached in broad fields which leads to a critical challenge to build up IBR-Driven visualization platform where meets requirement of high performance, large bounds of distributed visualization resource aggregation and concentration, multiple operators deploying and CSCW design employing. This paper presents an unique IBR-based visualization dataflow model refer to specific characters of IBR techniques and then discusses prominent feature of IBR-Driven distributed collaborative visualization (DCV) system before finally proposing an novel prototype. The prototype provides a well-defined three level modules especially work as Central Visualization Server, Local Proxy Server and Visualization Aid Environment, by which data and control for collaboration move through them followed the previous dataflow model. With aid of this triple hierarchy architecture of that, IBR oriented application construction turns to be easy. The employed augmented collaboration strategy not only achieve convenient multiple users synchronous control and stable processing management, but also is extendable and scalable.

Keywords: Image-Based Rendering, Distributed CollaborativeVisualization, Computer Supported Cooperative Work, Model andSimulation, Modular Visualization Environment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1438
7787 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1781
7786 Optimization of Air Pollution Control Model for Mining

Authors: Zunaira Asif, Zhi Chen

Abstract:

The sustainable measures on air quality management are recognized as one of the most serious environmental concerns in the mining region. The mining operations emit various types of pollutants which have significant impacts on the environment. This study presents a stochastic control strategy by developing the air pollution control model to achieve a cost-effective solution. The optimization method is formulated to predict the cost of treatment using linear programming with an objective function and multi-constraints. The constraints mainly focus on two factors which are: production of metal should not exceed the available resources, and air quality should meet the standard criteria of the pollutant. The applicability of this model is explored through a case study of an open pit metal mine, Utah, USA. This method simultaneously uses meteorological data as a dispersion transfer function to support the practical local conditions. The probabilistic analysis and the uncertainties in the meteorological conditions are accomplished by Monte Carlo simulation. Reasonable results have been obtained to select the optimized treatment technology for PM2.5, PM10, NOx, and SO2. Additional comparison analysis shows that baghouse is the least cost option as compared to electrostatic precipitator and wet scrubbers for particulate matter, whereas non-selective catalytical reduction and dry-flue gas desulfurization are suitable for NOx and SO2 reduction respectively. Thus, this model can aid planners to reduce these pollutants at a marginal cost by suggesting control pollution devices, while accounting for dynamic meteorological conditions and mining activities.

Keywords: Air pollution, linear programming, mining, optimization, treatment technologies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1535
7785 Does Practice Reflect Theory? An Exploratory Study of a Successful Knowledge Management System

Authors: Janet L. Kourik, Peter E. Maher

Abstract:

To investigate the correspondence of theory and practice, a successfully implemented Knowledge Management System (KMS) is explored through the lens of Alavi and Leidner-s proposed KMS framework for the analysis of an information system in knowledge management (Framework-AISKM). The applied KMS system was designed to manage curricular knowledge in a distributed university environment. The motivation for the KMS is discussed along with the types of knowledge necessary in an academic setting. Elements of the KMS involved in all phases of capturing and disseminating knowledge are described. As the KMS matures the resulting data stores form the precursor to and the potential for knowledge mining. The findings from this exploratory study indicate substantial correspondence between the successful KMS and the theory-based framework providing provisional confirmation for the framework while suggesting factors that contributed to the system-s success. Avenues for future work are described.

Keywords: Applied KMS, education, knowledge management (KM), KM framework, knowledge management system (KMS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 996
7784 A Graph-Based Approach for Placement of No-Replicated Databases in Grid

Authors: Cherif Haddad, Faouzi Ben Charrada

Abstract:

On a such wide-area environment as a Grid, data placement is an important aspect of distributed database systems. In this paper, we address the problem of initial placement of database no-replicated fragments in Grid architecture. We propose a graph based approach that considers resource restrictions. The goal is to optimize the use of computing, storage and communication resources. The proposed approach is developed in two phases: in the first phase, we perform fragment grouping using knowledge about fragments dependency and, in the second phase, we determine an efficient placement of the fragment groups on the Grid. We also show, via experimental analysis that our approach gives solutions that are close to being optimal for different databases and Grid configurations.

Keywords: Grid computing, Distributed systems, Data resourcesmanagement, Database systems, Database placement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591
7783 Cross-Search Technique and its Visualization of Peer-to-Peer Distributed Clinical Documents

Authors: Yong Jun Choi, Juman Byun, Simon Berkovich

Abstract:

One of the ubiquitous routines in medical practice is searching through voluminous piles of clinical documents. In this paper we introduce a distributed system to search and exchange clinical documents. Clinical documents are distributed peer-to-peer. Relevant information is found in multiple iterations of cross-searches between the clinical text and its domain encyclopedia.

Keywords: Clinical documents, cross-search, document exchange, information retrieval, peer-to-peer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1254