Search results for: mining software repositories
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2627

Search results for: mining software repositories

2477 Explorative Data Mining of Constructivist Learning Experiences and Activities with Multiple Dimensions

Authors: Patrick Wessa, Bart Baesens

Abstract:

This paper discusses the use of explorative data mining tools that allow the educator to explore new relationships between reported learning experiences and actual activities, even if there are multiple dimensions with a large number of measured items. The underlying technology is based on the so-called Compendium Platform for Reproducible Computing (http://www.freestatistics.org) which was built on top the computational R Framework (http://www.wessa.net).

Keywords: Reproducible computing, data mining, explorative data analysis, compendium technology, computer assisted education

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
2476 Genetic-based Anomaly Detection in Logs of Process Aware Systems

Authors: Hanieh Jalali, Ahmad Baraani

Abstract:

Nowaday-s, many organizations use systems that support business process as a whole or partially. However, in some application domains, like software development and health care processes, a normative Process Aware System (PAS) is not suitable, because a flexible support is needed to respond rapidly to new process models. On the other hand, a flexible Process Aware System may be vulnerable to undesirable and fraudulent executions, which imposes a tradeoff between flexibility and security. In order to make this tradeoff available, a genetic-based anomaly detection model for logs of Process Aware Systems is presented in this paper. The detection of an anomalous trace is based on discovering an appropriate process model by using genetic process mining and detecting traces that do not fit the appropriate model as anomalous trace; therefore, when used in PAS, this model is an automated solution that can support coexistence of flexibility and security.

Keywords: Anomaly Detection, Genetic Algorithm, ProcessAware Systems, Process Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1924
2475 A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example

Authors: Min-Hsun Kuo, Yun-Shiow Chen

Abstract:

The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.

Keywords: process mining, process similarity, artificial intelligence, process conformance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1442
2474 Mine Production Index (MPI): New Method to Evaluate Effectiveness of Mining Machinery

Authors: Amol Lanke, Hadi Hoseinie, Behzad Ghodrati

Abstract:

OEE has been used in many industries as measure of performance. However due to limitations of original OEE, it has been modified by various researchers. OEE for mining application is special version of classic equation, carries these limitation over. In this paper it has been aimed to modify the OEE for mining application by introducing the weights to the elements of it and termed as Mine Production index (MPi). As a special application of new index MPishovel has been developed by authors. This can be used for evaluating the shovel effectiveness. Based on analysis, utilization followed by performance and availability were ranked in this order. To check the applicability of this index, a case study was done on four electrical and one hydraulic shovel in a Swedish mine. The results shows that MPishovel can evaluate production effectiveness of shovels and can determine effectiveness values in optimistic view compared to OEE. MPi with calculation not only give the effectiveness but also can predict which elements should be focused for improving the productivity.

Keywords: Mining, Overall equipment efficiency (OEE), Mine Production index, Shovels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4744
2473 Web Content Mining: A Solution to Consumer's Product Hunt

Authors: Syed Salman Ahmed, Zahid Halim, Rauf Baig, Shariq Bashir

Abstract:

With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.

Keywords: Data mining, web mining, search engines, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2052
2472 A Framework to Support Reuse in Object-Oriented Software Development

Authors: Fathi Taibi

Abstract:

Reusability is a quality desired attribute in software products. Generally, it could be achieved through adopting development methods that promote it and achieving software qualities that have been linked with high reusability proneness. With the exponential growth in mobile application development, software reuse became an integral part in a substantial number of projects. Similarly, software reuse has become widely practiced in start-up companies. However, this has led to new emerging problems. Firstly, the reused code does not meet the required quality and secondly, the reuse intentions are dubious. This work aims to propose a framework to support reuse in Object-Oriented (OO) software development. The framework comprises a process that uses a proposed reusability assessment metric and a formal foundation to specify the elements of the reused code and the relationships between them. The framework is empirically evaluated using a wide range of open-source projects and mobile applications. The results are analyzed to help understand the reusability proneness of OO software and the possible means to improve it.

Keywords: Software reusability, software metrics, object-oriented software, modularity, low complexity, understandability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 379
2471 Static and Dynamic Complexity Analysis of Software Metrics

Authors: Kamaljit Kaur, Kirti Minhas, Neha Mehan, Namita Kakkar

Abstract:

Software complexity metrics are used to predict critical information about reliability and maintainability of software systems. Object oriented software development requires a different approach to software complexity metrics. Object Oriented Software Metrics can be broadly classified into static and dynamic metrics. Static Metrics give information at the code level whereas dynamic metrics provide information on the actual runtime. In this paper we will discuss the various complexity metrics, and the comparison between static and dynamic complexity.

Keywords: Static Complexity, Dynamic Complexity, Halstead Metric, Mc Cabe's Metric.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3213
2470 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: A classifier, Algorithms decision tree, knowledge extraction, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870
2469 Social and Economic Effects of Mining Industry Restructuring in Romania -Case Studies

Authors: Andra Costache, Gica Pehoiu

Abstract:

As in other countries from Central and Eastern Europe, the economic restructuring occurred in the last decade of the twentieth century affected the mining industry in Romania, an oversize and heavily subsidized sector before 1989. After more than a decade since the beginning of mining restructuring, an evaluation of current social implications of the process it is required, together with an efficiency analysis of the adaptation mechanisms developed at governmental level. This article aims to provide an insight into these issues through case studies conducted in the most important coal basin of Romania, Petroşani Depression.

Keywords: case studies, government programs, miningrestructuring, social effects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2659
2468 Designing a Tool for Software Maintenance

Authors: Amir Ngah, Masita Abdul Jalil, Zailani Abdullah

Abstract:

The aim of software maintenance is to maintain the software system in accordance with advancement in software and hardware technology. One of the early works on software maintenance is to extract information at higher level of abstraction. In this paper, we present the process of how to design an information extraction tool for software maintenance. The tool can extract the basic information from old programs such as about variables, based classes, derived classes, objects of classes, and functions. The tool have two main parts; the lexical analyzer module that can read the input file character by character, and the searching module which users can get the basic information from the existing programs. We implemented this tool for a patterned sub-C++ language as an input file.

Keywords: Extraction tool, software maintenance, reverse engineering, C++.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2398
2467 Identification of Reusable Software Modules in Function Oriented Software Systems using Neural Network Based Technique

Authors: Sonia Manhas, Parvinder S. Sandhu, Vinay Chopra, Nirvair Neeru

Abstract:

The cost of developing the software from scratch can be saved by identifying and extracting the reusable components from already developed and existing software systems or legacy systems [6]. But the issue of how to identify reusable components from existing systems has remained relatively unexplored. We have used metric based approach for characterizing a software module. In this present work, the metrics McCabe-s Cyclometric Complexity Measure for Complexity measurement, Regularity Metric, Halstead Software Science Indicator for Volume indication, Reuse Frequency metric and Coupling Metric values of the software component are used as input attributes to the different types of Neural Network system and reusability of the software component is calculated. The results are recorded in terms of Accuracy, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

Keywords: Software reusability, Neural Networks, MAE, RMSE, Accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1867
2466 A New Type of Integration Error and its Influence on Integration Testing Techniques

Authors: P. Prema, B. Ramadoss

Abstract:

Testing is an activity that is required both in the development and maintenance of the software development life cycle in which Integration Testing is an important activity. Integration testing is based on the specification and functionality of the software and thus could be called black-box testing technique. The purpose of integration testing is testing integration between software components. In function or system testing, the concern is with overall behavior and whether the software meets its functional specifications or performance characteristics or how well the software and hardware work together. This explains the importance and necessity of IT for which the emphasis is on interactions between modules and their interfaces. Software errors should be discovered early during IT to reduce the costs of correction. This paper introduces a new type of integration error, presenting an overview of Integration Testing techniques with comparison of each technique and also identifying which technique detects what type of error.

Keywords: Integration Error, Integration Error Types, Integration Testing Techniques, Software Testing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2216
2465 Forest Risk and Vulnerability Assessment: A Case Study from East Bokaro Coal Mining Area in India

Authors: Sujata Upgupta, Prasoon Kumar Singh

Abstract:

The expansion of large scale coal mining into forest areas is a potential hazard for the local biodiversity and wildlife. The objective of this study is to provide a picture of the threat that coal mining poses to the forests of the East Bokaro landscape. The vulnerable forest areas at risk have been assessed and the priority areas for conservation have been presented. The forested areas at risk in the current scenario have been assessed and compared with the past conditions using classification and buffer based overlay approach. Forest vulnerability has been assessed using an analytical framework based on systematic indicators and composite vulnerability index values. The results indicate that more than 4 km2 of forests have been lost from 1973 to 2016. Large patches of forests have been diverted for coal mining projects. Forests in the northern part of the coal field within 1-3 km radius around the coal mines are at immediate risk. The original contiguous forests have been converted into fragmented and degraded forest patches. Most of the collieries are located within or very close to the forests thus threatening the biodiversity and hydrology of the surrounding regions. Based on the vulnerability values estimated, it was concluded that more than 90% of the forested grids in East Bokaro are highly vulnerable to mining. The forests in the sub-districts of Bermo and Chandrapura have been identified as the most vulnerable to coal mining activities. This case study would add to the capacity of the forest managers and mine managers to address the risk and vulnerability of forests at a small landscape level in order to achieve sustainable development.

Keywords: Coal mining, forest, indicators, vulnerability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1160
2464 Software Reliability Prediction Model Analysis

Authors: L. Mirtskhulava, M. Khunjgurua, N. Lomineishvili, K. Bakuria

Abstract:

Software reliability prediction gives a great opportunity to measure the software failure rate at any point throughout system test. A software reliability prediction model provides with the technique for improving reliability. Software reliability is very important factor for estimating overall system reliability, which depends on the individual component reliabilities. It differs from hardware reliability in that it reflects the design perfection. Main reason of software reliability problems is high complexity of software. Various approaches can be used to improve the reliability of software. We focus on software reliability model in this article, assuming that there is a time redundancy, the value of which (the number of repeated transmission of basic blocks) can be an optimization parameter. We consider given mathematical model in the assumption that in the system may occur not only irreversible failures, but also a failure that can be taken as self-repairing failures that significantly affect the reliability and accuracy of information transfer. Main task of the given paper is to find a time distribution function (DF) of instructions sequence transmission, which consists of random number of basic blocks. We consider the system software unreliable; the time between adjacent failures has exponential distribution.

Keywords: Exponential distribution, conditional mean time to failure, distribution function, mathematical model, software reliability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1680
2463 Text-Mining Approach for Evaluation of Affective Management Practices

Authors: Masaaki Saito, Qin Tang, Hiroyuki Umemuro

Abstract:

The purpose of this paper is to propose a text mining approach to evaluate companies- practices on affective management. Affective management argues that it is critical to take stakeholders- affects into consideration during decision-making process, along with the traditional numerical and rational indices. CSR reports published by companies were collected as source information. Indices were proposed based on the frequency and collocation of words relevant to affective management concept using text mining approach to analyze the text information of CSR reports. In addition, the relationships between the results obtained using proposed indices and traditional indicators of business performance were investigated using correlation analysis. Those correlations were also compared between manufacturing and non-manufacturing companies. The results of this study revealed the possibility to evaluate affective management practices of companies based on publicly available text documents.

Keywords: Affective management, Affect, Stakeholder, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1845
2462 A New Dimension in Software Risk Managment

Authors: Masood Uzzafer

Abstract:

A dynamic risk management framework for software projects is presented. Currently available software risk management frameworks and risk assessment models are static in nature and lacks feedback capability. Such risk management frameworks are not capable of providing the risk assessment of futuristic changes in risk events. A dynamic risk management framework for software project is needed that provides futuristic assessment of risk events.

Keywords: Software Risk Management, Dynamic Models, Software Project Managment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1739
2461 Mining of Interesting Prediction Rules with Uniform Two-Level Genetic Algorithm

Authors: Bilal Alatas, Ahmet Arslan

Abstract:

The main goal of data mining is to extract accurate, comprehensible and interesting knowledge from databases that may be considered as large search spaces. In this paper, a new, efficient type of Genetic Algorithm (GA) called uniform two-level GA is proposed as a search strategy to discover truly interesting, high-level prediction rules, a difficult problem and relatively little researched, rather than discovering classification knowledge as usual in the literatures. The proposed method uses the advantage of uniform population method and addresses the task of generalized rule induction that can be regarded as a generalization of the task of classification. Although the task of generalized rule induction requires a lot of computations, which is usually not satisfied with the normal algorithms, it was demonstrated that this method increased the performance of GAs and rapidly found interesting rules.

Keywords: Classification rule mining, data mining, genetic algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1592
2460 Automata Theory Approach for Solving Frequent Pattern Discovery Problems

Authors: Renáta Iváncsy, István Vajk

Abstract:

The various types of frequent pattern discovery problem, namely, the frequent itemset, sequence and graph mining problems are solved in different ways which are, however, in certain aspects similar. The main approach of discovering such patterns can be classified into two main classes, namely, in the class of the levelwise methods and in that of the database projection-based methods. The level-wise algorithms use in general clever indexing structures for discovering the patterns. In this paper a new approach is proposed for discovering frequent sequences and tree-like patterns efficiently that is based on the level-wise issue. Because the level-wise algorithms spend a lot of time for the subpattern testing problem, the new approach introduces the idea of using automaton theory to solve this problem.

Keywords: Frequent pattern discovery, graph mining, pushdownautomaton, sequence mining, state machine, tree mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1626
2459 Modeling Metrics for Monitoring Software Project Performance Based On the GQM Model

Authors: Mariayee Doraisamy, Suhaimi Bin Ibrahim, Mohd Naz’ri Mahrin

Abstract:

There are several methods to monitor software projects and the objective for monitoring is to ensure that the software projects are developed and delivered successfully. A performance measurement is a method that is closely associated with monitoring and it can be scrutinized by looking at two important attributes which are efficiency and effectiveness both of which are factors that are important for the success of a software project. Consequently, a successful steering is achieved by monitoring and controlling a software project via the performance measurement criteria and metrics. Hence, this paper is aimed at identifying the performance measurement criteria and the metrics for monitoring the performance of a software project by using the Goal Question Metrics (GQM) approach. The GQM approach is utilized to ensure that the identified metrics are reliable and useful. These identified metrics are useful guidelines for project managers to monitor the performance of their software projects.

Keywords: Software project performance, Goal Question Metrics, Performance Measurement Criteria, Metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2525
2458 Towards a Suitable and Systematic Approach for Component Based Software Development

Authors: Kuljit Kaur, Parminder Kaur, Jaspreet Bedi, Hardeep Singh

Abstract:

Software crisis refers to the situation in which the developers are not able to complete the projects within time and budget constraints and moreover these overscheduled and over budget projects are of low quality as well. Several methodologies have been adopted form time to time to overcome this situation and now in the focus is component based software engineering. In this approach, emphasis is on reuse of already existing software artifacts. But the results can not be achieved just by preaching the principles; they need to be practiced as well. This paper highlights some of the very basic elements of this approach, which has to be in place to get the desired goals of high quality, low cost with shorter time-to-market software products.

Keywords: Component Model, Software Components, SoftwareRepository, Process Models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1764
2457 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

Authors: K. Thangavel, R. Rathipriya

Abstract:

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
2456 Performance Analysis of Software Reliability Models using Matrix Method

Authors: RajPal Garg, Kapil Sharma, Rajive Kumar, R. K. Garg

Abstract:

This paper presents a computational methodology based on matrix operations for a computer based solution to the problem of performance analysis of software reliability models (SRMs). A set of seven comparison criteria have been formulated to rank various non-homogenous Poisson process software reliability models proposed during the past 30 years to estimate software reliability measures such as the number of remaining faults, software failure rate, and software reliability. Selection of optimal SRM for use in a particular case has been an area of interest for researchers in the field of software reliability. Tools and techniques for software reliability model selection found in the literature cannot be used with high level of confidence as they use a limited number of model selection criteria. A real data set of middle size software project from published papers has been used for demonstration of matrix method. The result of this study will be a ranking of SRMs based on the Permanent value of the criteria matrix formed for each model based on the comparison criteria. The software reliability model with highest value of the Permanent is ranked at number – 1 and so on.

Keywords: Matrix method, Model ranking, Model selection, Model selection criteria, Software reliability models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2316
2455 Computer Software for Calculating Electron Mobility of Semiconductors Compounds; Case Study for N-Gan

Authors: Emad A. Ahmed

Abstract:

Computer software to calculate electron mobility with respect to different scattering mechanism has been developed. This software is adopted completely Graphical User Interface (GUI) technique and its interface has been designed by Microsoft Visual basic 6.0. As a case study the electron mobility of n-GaN was performed using this software. The behavior of the mobility for n-GaN due to elastic scattering processes and its relation to temperature and doping concentration were discussed. The results agree with other available theoretical and experimental data.

Keywords: Electron mobility, relaxation time, GaN, Scattering, Computer software, computation physics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3862
2454 A Sequential Pattern Mining Method Based On Sequential Interestingness

Authors: Shigeaki Sakurai, Youichi Kitahara, Ryohei Orihara

Abstract:

Sequential mining methods efficiently discover all frequent sequential patterns included in sequential data. These methods use the support, which is the previous criterion that satisfies the Apriori property, to evaluate the frequency. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and the analysts cannot get new knowledge from the patterns. The paper proposes a new criterion, namely, the sequential interestingness, to discover sequential patterns that are more attractive for the analysts. The paper shows that the criterion satisfies the Apriori property and how the criterion is related to the support. Also, the paper proposes an efficient sequential mining method based on the proposed criterion. Lastly, the paper shows the effectiveness of the proposed method by applying the method to two kinds of sequential data.

Keywords: Sequential mining, Support, Confidence, Apriori property

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1275
2453 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1478
2452 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2776
2451 Multiple-Level Sequential Pattern Discovery from Customer Transaction Databases

Authors: An Chen, Huilin Ye

Abstract:

Mining sequential patterns from large customer transaction databases has been recognized as a key research topic in database systems. However, the previous works more focused on mining sequential patterns at a single concept level. In this study, we introduced concept hierarchies into this problem and present several algorithms for discovering multiple-level sequential patterns based on the hierarchies. An experiment was conducted to assess the performance of the proposed algorithms. The performances of the algorithms were measured by the relative time spent on completing the mining tasks on two different datasets. The experimental results showed that the performance depends on the characteristics of the datasets and the pre-defined threshold of minimal support for each level of the concept hierarchy. Based on the experimental results, some suggestions were also given for how to select appropriate algorithm for a certain datasets.

Keywords: Data Mining, Multiple-Level Sequential Pattern, Concept Hierarchy, Customer Transaction Database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453
2450 A Review on Factors Influencing Implementation of Secure Software Development Practices

Authors: Sri Lakshmi Kanniah, Mohd Naz’ri Mahrin

Abstract:

More and more businesses and services are depending on software to run their daily operations and business services. At the same time, cyber-attacks are becoming more covert and sophisticated, posing threats to software. Vulnerabilities exist in the software due to the lack of security practices during the phases of software development. Implementation of secure software development practices can improve the resistance to attacks. Many methods, models and standards for secure software development have been developed. However, despite the efforts, they still come up against difficulties in their deployment and the processes are not institutionalized. There is a set of factors that influence the successful deployment of secure software development processes. In this study, the methodology and results from a systematic literature review of factors influencing the implementation of secure software development practices is described. A total of 44 primary studies were analysed as a result of the systematic review. As a result of the study, a list of twenty factors has been identified. Some of factors that affect implementation of secure software development practices are: Involvement of the security expert, integration between security and development team, developer’s skill and expertise, development time and communication between stakeholders. The factors were further classified into four categories which are institutional context, people and action, project content and system development process. The results obtained show that it is important to take into account organizational, technical and people issues in order to implement secure software development initiatives.

Keywords: Secure software development, software development, software security, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2493
2449 Generator of Hypotheses an Approach of Data Mining Based on Monotone Systems Theory

Authors: Rein Kuusik, Grete Lind

Abstract:

Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).

Keywords: data mining, monotone systems, pattern, rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1255
2448 Plant Varieties Selection System

Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh

Abstract:

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.

Keywords: Plant varieties selection system, decision tree, expert recommendation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1792