Search results for: mining software repositories
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2627

Search results for: mining software repositories

2357 The Cost of Innovation in Software Development Projects

Authors: Mihai Liviu Despa

Abstract:

The paper tackles the topic of determining the cost of innovation in software development projects. Innovation can be achieved either in a planned or unplanned manner. The paper approaches the scenarios were innovation is planned for. As a starting point an innovative software development project is analyzed. The project is depicted step by step as it was implemented, from inception to delivery. Costs that are proprietary to innovation in software development are isolated based on the author’s personal experience in managing the above mentioned project. Innovation costs components identified by the author are then validated using open discussions with software development professionals and projects managers on LinkedIn groups. In order to receive relevant feedback only groups that focus on software development and innovation management are targeted. Additional innovation cost components suggested by software development professionals and projects managers are also considered. Based on the identified cost components an indicator is built. The indicator is meant to formalize the process of determining the cost of innovation in a software development project. The indicator aggregates all the innovation cost components that are identified in the research process. The process of calculating each cost component is also described. Conclusions are formulated and new related research topics are submitted for debate.

Keywords: Innovation cost, IT project management, software development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2071
2356 Augmenting Use Case View for Modeling

Authors: Pradip Peter Dey, Bhaskar Raj Sinha, Mohammad Amin, Hassan Badkoobehi

Abstract:

Mathematical, graphical and intuitive models are often constructed in the development process of computational systems. The Unified Modeling Language (UML) is one of the most popular modeling languages used by practicing software engineers. This paper critically examines UML models and suggests an augmented use case view with the addition of new constructs for modeling software. It also shows how a use case diagram can be enhanced. The improved modeling constructs are presented with examples for clarifying important design and implementation issues.

Keywords: Software architecture, software design, Unified Modeling Language (UML), user interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1945
2355 An Assessment of Software Process Optimization Compared to International Best Practice in Bangladesh

Authors: Mohammad Shahadat Hossain Chowdhury, Tania Taharima Chowdhary, Hasan Sarwar

Abstract:

The challenge for software development house in Bangladesh is to find a path of using minimum process rather than CMMI or ISO type gigantic practice and process area. The small and medium size organization in Bangladesh wants to ensure minimum basic Software Process Improvement (SPI) in day to day operational activities. Perhaps, the basic practices will ensure to realize their company's improvement goals. This paper focuses on the key issues in basic software practices for small and medium size software organizations, who are unable to effort the CMMI, ISO, ITIL etc. compliance certifications. This research also suggests a basic software process practices model for Bangladesh and it will show the mapping of our suggestions with international best practice. In this IT competitive world for software process improvement, Small and medium size software companies that require collaboration and strengthening to transform their current perspective into inseparable global IT scenario. This research performed some investigations and analysis on some projects- life cycle, current good practice, effective approach, reality and pain area of practitioners, etc. We did some reasoning, root cause analysis, comparative analysis of various approach, method, practice and justifications of CMMI and real life. We did avoid reinventing the wheel, where our focus is for minimal practice, which will ensure a dignified satisfaction between organizations and software customer.

Keywords: Compare with CMMI practices, Key success factors, Small and medium software house, Software process improvement; Software process optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1855
2354 Applying Fuzzy FP-Growth to Mine Fuzzy Association Rules

Authors: Chien-Hua Wang, Wei-Hsuan Lee, Chin-Tzong Pang

Abstract:

In data mining, the association rules are used to find for the associations between the different items of the transactions database. As the data collected and stored, rules of value can be found through association rules, which can be applied to help managers execute marketing strategies and establish sound market frameworks. This paper aims to use Fuzzy Frequent Pattern growth (FFP-growth) to derive from fuzzy association rules. At first, we apply fuzzy partition methods and decide a membership function of quantitative value for each transaction item. Next, we implement FFP-growth to deal with the process of data mining. In addition, in order to understand the impact of Apriori algorithm and FFP-growth algorithm on the execution time and the number of generated association rules, the experiment will be performed by using different sizes of databases and thresholds. Lastly, the experiment results show FFPgrowth algorithm is more efficient than other existing methods.

Keywords: Data mining, association rule, fuzzy frequent patterngrowth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1799
2353 Applications of Genetic Programming in Data Mining

Authors: Saleh Mesbah Elkaffas, Ahmed A. Toony

Abstract:

This paper details the application of a genetic programming framework for induction of useful classification rules from a database of income statements, balance sheets, and cash flow statements for North American public companies. Potentially interesting classification rules are discovered. Anomalies in the discovery process merit further investigation of the application of genetic programming to the dataset for the problem domain.

Keywords: Genetic programming, data mining classification rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544
2352 Similar Cultural Factors Compensate for Communication Problems in Japan's Software Globalization Business

Authors: Phong Tran

Abstract:

A research effort to find the reality of the business of Japan-s software globalization of enterprise-level business software systems has found that while the number of Japan-made enterpriselevel software systems is comparable with those of the other G7 countries, the business is limited to the East and Southeast Asian markets. This indicates that this business has a problem in the European and USA markets. Based on the knowledge that the research has established, the research concludes that the communication problems arise from the lack of individualists' communication styles and foreign language skills in Japan's software globalization is compensated by similarities in certain Japanese cultural factors and Japan's cultural power in the East and Southeast Asian markets and that this business does not have this compensation factor in the European and American markets due to dissimilarities and no cultural power.

Keywords: Cultural factors, global business, Japan, software globalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1572
2351 Software Industrialization in Systems Integration

Authors: Matthias Minich, B. Harriehausen-Muehlbauer, C. Wentzel

Abstract:

Today-s economy is in a permanent change, causing merger and acquisitions and co operations between enterprises. As a consequence, process adaptations and realignments result in systems integration and software development projects. Processes and procedures to execute such projects are still reliant on craftsman-ship of highly skilled workers. A generally accepted, industrialized production, characterized by high efficiency and quality, seems inevitable. In spite of this, current concepts of software industrialization are aimed at traditional software engineering and do not consider the characteristics of systems integration. The present work points out these particularities and discusses the applicability of existing industrial concepts in the systems integration domain. Consequently it defines further areas of research necessary to bring the field of systems integration closer to an industrialized production, allowing a higher efficiency, quality and return on investment.

Keywords: Software Industrialization, Systems Integration, Software Product Lines, Component Based Development, Model Driven Development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2528
2350 A Proposal for Systematic Mapping Study of Software Security Testing, Verification and Validation

Authors: Adriano Bessa Albuquerque, Francisco Jose Barreto Nunes

Abstract:

Software vulnerabilities are increasing and not only impact services and processes availability as well as information confidentiality, integrity and privacy, but also cause changes that interfere in the development process. Security test could be a solution to reduce vulnerabilities. However, the variety of test techniques with the lack of real case studies of applying tests focusing on software development life cycle compromise its effective use. This paper offers an overview of how a Systematic Mapping Study (MS) about security verification, validation and test (VVT) was performed, besides presenting general results about this study.

Keywords: Software test, software security verification validation and test, security test institutionalization, systematic mapping study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1624
2349 Architecture, Implementation and Application of Tools for Experimental Analysis

Authors: Tom Dowling, Adam Duffy

Abstract:

This paper presents an architecture to assist in the development of tools to perform experimental analysis. Existing implementations of tools based on this architecture are also described in this paper. These tools are applied to the real world problem of fault attack emulation and detection in cryptographic algorithms.

Keywords: Software Architectures and Design, Software Componentsand Reuse, Engineering Secure Software.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1400
2348 Agile Software Development Implementation in Developing a Diet Tracker Mobile Application

Authors: Dwi Puspita Sari, Gulnur Baltabayeva, Nadia Salman, Maxut Toleuov, Vijay Kanabar

Abstract:

Technology era drives people to use mobile phone to support their daily life activities. Technology development has a rapid phase which pushes the IT company to adjust any technology changes in order to fulfill customer’s satisfaction. As a result of that, many companies in the USA emerged from systematics software development approach to agile software development approach in developing systems and applications to develop many mobile phone applications in a short phase to fulfill user’s needs. As a systematic approach is considered as time consuming, costly, and too risky, agile software development has become a more popular approach to use for developing software including mobile applications. This paper reflects a short-term project to develop a diet tracker mobile application using agile software development that focused on applying scrum framework in the development process.

Keywords: Agile software development, scrum, diet tracker, mobile application.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 898
2347 Novelty as a Measure of Interestingness in Knowledge Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Keywords: Knowledge Discovery in Databases (KDD), Interestingness, Subjective Measures, Novelty Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1806
2346 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal

Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha

Abstract:

Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.

Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 986
2345 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
2344 Cloud Computing-s Software-as-a-Service (SaaS) Delivery Model Benefits Technical Courses in Higher Education

Authors: Janet L. Kourik, Jiangping Wang

Abstract:

Software-as-a-Service (SaaS) is a form of cloud computing that relieves the user of the burden of hardware and software installation and management. SaaS can be used at the course level to enhance curricula and student experience. When cloud computing and SaaS are included in educational literature, the focus is typically on implementing administrative functions. Yet, SaaS can make more immediate and substantial contributions to the technical course content in educational offerings. This paper explores cloud computing and SaaS, provides examples, reports on experiences using SaaS to offer specialized software in courses, and analyzes the advantages and disadvantages of using SaaS at the course level. The paper contributes to the literature in higher education by analyzing the major technical concepts, potential, and constraints for using SaaS to deliver specialized software at the course level. Further it may enable more educators and students to benefit from this emerging technology.

Keywords: Cloud computing, software-as-a-service, e-service, higher education.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2424
2343 A Review on WEB Resources in Teaching of Geotechnical Engineering

Authors: Amin Chegenizadeh, Hamid Nikraz

Abstract:

The use of computer hardware and software in education and training dates to the early 1940s, when American researchers developed flight simulators which used analog computers to generate simulated onboard instrument data.Computer software is widely used to help engineers and undergraduate student solve their problems quickly and more accurately. This paper presents the list of computer software in geotechnical engineering.

Keywords: Geotechnical, Teaching, Courseware

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743
2342 An Investigation on the Variation of Software Development Productivity

Authors: Zhizhong Jiang, Peter Naudé, Craig Comstock

Abstract:

The productivity of software development is one of the major concerns for project managers. Given the increasing complexity of the software being developed and the concomitant rise in the typical project size, the productivity has not consistently improved. By analyzing the latest release of ISBSG data repository with 4106 projects ever developed, we report on the factors found to significantly influence productivity, and present an original model for the estimation of productivity during project design. We further illustrate that software development productivity has experienced irregular variations between the years 1995 and 2005. Considering the factors significant to productivity, we found its variations are primarily caused by the variations of average team size for the development and the unbalanced use of the less productive development language 3GL.

Keywords: Development Platform, Function Point, Language, Productivity, Software Engineering, Team Size.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675
2341 Requirement Engineering and Software Product Line Scoping Paradigm

Authors: Ahmed Mateen, Zhu Qingsheng, Faisal Shahzad

Abstract:

Requirement Engineering (RE) is a part being created for programming structure during the software development lifecycle. Software product line development is a new topic area within the domain of software engineering. It also plays important role in decision making and it is ultimately helpful in rising business environment for productive programming headway. Decisions are central to engineering processes and they hold them together. It is argued that better decisions will lead to better engineering. To achieve better decisions requires that they are understood in detail. In order to address the issues, companies are moving towards Software Product Line Engineering (SPLE) which helps in providing large varieties of products with minimum development effort and cost. This paper proposed a new framework for software product line and compared with other models. The results can help to understand the needs in SPL testing, by identifying points that still require additional investigation. In our future scenario, we will combine this model in a controlled environment with industrial SPL projects which will be the new horizon for SPL process management testing strategies.

Keywords: Requirements engineering, software product lines, scoping, process structure, domain specific language.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 828
2340 Measuring Process Component Design on Achieving Managerial Goals

Authors: Eakong Atiptamvaree, Twittie Senivongse

Abstract:

Process-oriented software development is a new software development paradigm in which software design is modeled by a business process which is in turn translated into a process execution language for execution. The building blocks of this paradigm are software units that are composed together to work according to the flow of the business process. This new paradigm still exhibits the characteristic of the applications built with the traditional software component technology. This paper discusses an approach to apply a traditional technique for software component fabrication to the design of process-oriented software units, called process components. These process components result from decomposing a business process of a particular application domain into subprocesses, and these process components can be reused to design the business processes of other application domains. The decomposition considers five managerial goals, namely cost effectiveness, ease of assembly, customization, reusability, and maintainability. The paper presents how to design or decompose process components from a business process model and measure some technical features of the design that would affect the managerial goals. A comparison between the measurement values from different designs can tell which process component design is more appropriate for the managerial goals that have been set. The proposed approach can be applied in Web Services environment which accommodates process-oriented software development.

Keywords: Business Process Model, Managerial Goals, ProcessComponent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1512
2339 A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)

Authors: Rekha Kandwal, Kamal K.Bharadwaj

Abstract:

Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.

Keywords: Censored production rules, cumulative learning, data mining, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1484
2338 A Cognitive Measurement of Complexity and Comprehension for Object-Oriented Code

Authors: Amit Kumar Jakhar, Kumar Rajnish

Abstract:

Inherited complexity is one of the difficult tasks in software engineering field. Further, it is said that there is no physical laws or standard guidelines suit for designing different types of software. Hence, to make the software engineering as a matured engineering discipline like others, it is necessary that it has its own theoretical frameworks and laws. Software designing and development is a human effort which takes a lot of time and considers various parameters for successful completion of the software. The cognitive informatics plays an important role for understanding the essential characteristics of the software. The aim of this work is to consider the fundamental characteristics of the source code of Object-Oriented software i.e. complexity and understandability. The complexity of the programs is analyzed with the help of extracted important attributes of the source code, which is further utilized to evaluate the understandability factor. The aforementioned characteristics are analyzed on the basis of 16 C++ programs by distributing them to forty MCA students. They all tried to understand the source code of the given program and mean time is taken as the actual time needed to understand the program. For validation of this work, Briand’s framework is used and the presented metric is also evaluated comparatively with existing metric which proves its robustness.

Keywords: Software metrics, object-oriented, complexity, cognitive weight, understandability, basic control structures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1122
2337 Multi-view Description of Real-Time Systems- Architecture

Authors: A. Bessam, M. T. Kimour

Abstract:

Real-time embedded systems should benefit from component-based software engineering to handle complexity and deal with dependability. In these systems, applications should not only be logically correct but also behave within time windows. However, in the current component based software engineering approaches, a few of component models handles time properties in a manner that allows efficient analysis and checking at the architectural level. In this paper, we present a meta-model for component-based software description that integrates timing issues. To achieve a complete functional model of software components, our meta-model focuses on four functional aspects: interface, static behavior, dynamic behavior, and interaction protocol. With each aspect we have explicitly associated a time model. Such a time model can be used to check a component-s design against certain properties and to compute the timing properties of component assemblies.

Keywords: Real-time systems, Software architecture, software component, dependability, time properties, ADL, metamodeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
2336 Improving the Performance of Proxy Server by Using Data Mining Technique

Authors: P. Jomsri

Abstract:

Currently, web usage make a huge data from a lot of user attention. In general, proxy server is a system to support web usage from user and can manage system by using hit rates. This research tries to improve hit rates in proxy system by applying data mining technique. The data set are collected from proxy servers in the university and are investigated relationship based on several features. The model is used to predict the future access websites. Association rule technique is applied to get the relation among Date, Time, Main Group web, Sub Group web, and Domain name for created model. The results showed that this technique can predict web content for the next day, moreover the future accesses of websites increased from 38.15% to 85.57 %. This model can predict web page access which tends to increase the efficient of proxy servers as a result. In additional, the performance of internet access will be improved and help to reduce traffic in networks.

Keywords: Association rule, proxy server, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3062
2335 Application of Artificial Neural Network for Predicting Maintainability Using Object-Oriented Metrics

Authors: K. K. Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra

Abstract:

Importance of software quality is increasing leading to development of new sophisticated techniques, which can be used in constructing models for predicting quality attributes. One such technique is Artificial Neural Network (ANN). This paper examined the application of ANN for software quality prediction using Object- Oriented (OO) metrics. Quality estimation includes estimating maintainability of software. The dependent variable in our study was maintenance effort. The independent variables were principal components of eight OO metrics. The results showed that the Mean Absolute Relative Error (MARE) was 0.265 of ANN model. Thus we found that ANN method was useful in constructing software quality model.

Keywords: Software quality, Measurement, Metrics, Artificial neural network, Coupling, Cohesion, Inheritance, Principal component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2572
2334 Classifier Based Text Mining for Neural Network

Authors: M. Govindarajan, R. M. Chandrasekaran

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.

Keywords: Back propagation, classification accuracy, textmining, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4217
2333 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2488
2332 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5431
2331 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 760
2330 Arabic Light Stemmer for Better Search Accuracy

Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy

Abstract:

Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.

Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1496
2329 Mining Network Data for Intrusion Detection through Naïve Bayesian with Clustering

Authors: Dewan Md. Farid, Nouria Harbi, Suman Ahmmed, Md. Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Network security attacks are the violation of information security policy that received much attention to the computational intelligence society in the last decades. Data mining has become a very useful technique for detecting network intrusions by extracting useful knowledge from large number of network data or logs. Naïve Bayesian classifier is one of the most popular data mining algorithm for classification, which provides an optimal way to predict the class of an unknown example. It has been tested that one set of probability derived from data is not good enough to have good classification rate. In this paper, we proposed a new learning algorithm for mining network logs to detect network intrusions through naïve Bayesian classifier, which first clusters the network logs into several groups based on similarity of logs, and then calculates the prior and conditional probabilities for each group of logs. For classifying a new log, the algorithm checks in which cluster the log belongs and then use that cluster-s probability set to classify the new log. We tested the performance of our proposed algorithm by employing KDD99 benchmark network intrusion detection dataset, and the experimental results proved that it improves detection rates as well as reduces false positives for different types of network intrusions.

Keywords: Clustering, detection rate, false positive, naïveBayesian classifier, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5535
2328 Estimation of Component Reusability through Reusability Metrics

Authors: Aditya Pratap Singh, Pradeep Tomar

Abstract:

Software reusability is an essential characteristic of Component-Based Software (CBS). The component reusability is an important assess for the effective reuse of components in CBS. The attributes of reusability proposed by various researchers are studied and four of them are identified as potential factors affecting reusability. This paper proposes metric for reusability estimation of black-box software component along with metrics for Interface Complexity, Understandability, Customizability and Reliability. An experiment is performed for estimation of reusability through a case study on a sample web application using a real world component.

Keywords: Component-based software, component reusability, customizability, interface complexity, reliability, understandability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3058