Search results for: Web Page Categorization
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 140

Search results for: Web Page Categorization

80 A Proposed Framework for Improving IT Utilization in the Energy Industry

Authors: Jin Kyung Park, Ji Yeon Cho, Yong Ho Shim, Su Jin Kim, Bong Gyou Lee

Abstract:

The purpose of this study is to suggest direction for future study of the energy-IT industry that will be used for framework to increase IT utilization in the energy industry. Recently, Green IT is a becoming global issue because of global environmental pollution. Also, IT roles in energy industry are becoming more important. However, the related studies were IT industry oriented that is not sufficient to make plan for Green energy. Therefore, after analyzing existing studies related to Green energy and Green IT, re-categorization for Green energy-IT industry was suggested. Direction of framework is based on energy industry that enable to link between energy and IT. The results of this study suggest comprehensive insight to Green energy-IT industry. Thus it is able to provide useful implications and guidelines to increase IT utilization in the energy industry.

Keywords: Energy-IT Industry, Green Energy, Green IT, IT Utilization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1296
79 Systematic Functional Analysis Methods for Design Retrieval and Documentation

Authors: L. Zehtaban, D. Roller

Abstract:

Apart from geometry, functionality is one of the most significant hallmarks of a product. The functionality of a product can be considered as the fundamental justification for a product existence. Therefore a functional analysis including a complete and reliable descriptor has a high potential to improve product development process in various fields especially in knowledge-based design. One of the important applications of the functional analysis and indexing is in retrieval and design reuse concept. More than 75% of design activity for a new product development contains reusing earlier and existing design know-how. Thus, analysis and categorization of product functions concluded by functional indexing, influences directly in design optimization. This paper elucidates and evaluates major classes for functional analysis by discussing their major methods. Moreover it is finalized by presenting a noble hybrid approach for functional analysis.

Keywords: Functional analysis, design reuse, functionalindexing and representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5106
78 Definition in Law: Transgender Identities and Marriage

Authors: Kimberly Tao

Abstract:

This paper looks at transgender identities and the law in the context of marriage. It particularly focuses on the role of language and definition in classifying transgendered individuals into a legal category. Two lines of cases in transgender jurisprudence are examined. The former cases decided the definition of 'man' and 'woman' on the basis of biological criteria while the latter cases held that biological factors should not be the sole criterion for defining a man or a woman. Three categories were found to classify transgender people, namely male, female and "monstrous". Since transgender people challenge the core gender distinction that the law stresses, they are often regarded as problematic and monstrous which caused them to be subjected to severe legal consequences. This paper discusses these issues by analyzing and comparing different cases in transgender jurisprudence as well as examining how these issues play out in contemporary Hong Kong.

Keywords: Trangender, Monstrousness, Categorization, Definition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2148
77 Evolutionary Feature Selection for Text Documents using the SVM

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

Keywords: Feature Selection, Learning with Kernels, Support Vector Machine, Genetic Algorithm, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1660
76 Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
75 Trends in IT Consulting in Austria

Authors: Michael Torggler

Abstract:

IT consultants often take over an important role as an interface between technological, organizational and managerial structures. As a result, the services offered are in many cases assigned to different disciplines which can cause a lack of transparency on the market for consulting services. However, not all consulting products are suitable for every company because of different frameworks and business processes. In this context the questions arises as to what consulting products are currently offered and how they can be compared as well as how the market for IT consulting services is structured on the supply side. The presented study aims to shed light on the IT consulting market by giving an overview of the current structure of the supply-side for IT consulting services as well as proposing a categorization of the currently available consulting services (consulting fields) in order to provide a theoretical background for the empirical study. Apart from these theoretical considerations, the empirical results of field surveys on the Austrian IT consulting market are presented and analyzed.

Keywords: IT Consulting, Management Consulting, ISConsulting, Consulting Fields, Market study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365
74 A Similarity Measure for Clustering and its Applications

Authors: Guadalupe J. Torres, Ram B. Basnet, Andrew H. Sung, Srinivas Mukkamala, Bernardete M. Ribeiro

Abstract:

This paper introduces a measure of similarity between two clusterings of the same dataset produced by two different algorithms, or even the same algorithm (K-means, for instance, with different initializations usually produce different results in clustering the same dataset). We then apply the measure to calculate the similarity between pairs of clusterings, with special interest directed at comparing the similarity between various machine clusterings and human clustering of datasets. The similarity measure thus can be used to identify the best (in terms of most similar to human) clustering algorithm for a specific problem at hand. Experimental results pertaining to the text categorization problem of a Portuguese corpus (wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The significance and other potential applications of the proposed measure are discussed.

Keywords: Clustering Algorithms, Clustering Applications, Similarity Measures, Text Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1519
73 A New Method of Adaptation in Integrated Learning Environment

Authors: Ildar Galeev, Renat Mustaphin, C. Ardil

Abstract:

A new method of adaptation in a partially integrated learning environment that includes electronic textbook (ET) and integrated tutoring system (ITS) is described. The algorithm of adaptation is described in detail. It includes: establishment of Interconnections of operations and concepts; estimate of the concept mastering level (for all concepts); estimate of student-s non-mastering level on the current learning step of information on each page of ET; creation of a rank-order list of links to the e-manual pages containing information that require repeated work.

Keywords: Adaptation, Integrated Learning Environment, Integrated Tutoring System, Electronic Textbook.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419
72 Auto Classification for Search Intelligence

Authors: Lilac A. E. Al-Safadi

Abstract:

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Keywords: Information Processing on the Web, Data Mining, Document Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1584
71 Efficient Web Usage Mining Based on K-Medoids Clustering Technique

Authors: P. Sengottuvelan, T. Gopalakrishnan

Abstract:

Web Usage Mining is the application of data mining techniques to find usage patterns from web log data, so as to grasp required patterns and serve the requirements of Web-based applications. User’s expertise on the internet may be improved by minimizing user’s web access latency. This may be done by predicting the future search page earlier and the same may be prefetched and cached. Therefore, to enhance the standard of web services, it is needed topic to research the user web navigation behavior. Analysis of user’s web navigation behavior is achieved through modeling web navigation history. We propose this technique which cluster’s the user sessions, based on the K-medoids technique.

Keywords: Clustering, K-medoids, Recommendation, User Session, Web Usage Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1358
70 Stochastic Learning Algorithms for Modeling Human Category Learning

Authors: Toshihiko Matsuka, James E. Corter

Abstract:

Most neural network (NN) models of human category learning use a gradient-based learning method, which assumes that locally-optimal changes are made to model parameters on each learning trial. This method tends to under predict variability in individual-level cognitive processes. In addition many recent models of human category learning have been criticized for not being able to replicate rapid changes in categorization accuracy and attention processes observed in empirical studies. In this paper we introduce stochastic learning algorithms for NN models of human category learning and show that use of the algorithms can result in (a) rapid changes in accuracy and attention allocation, and (b) different learning trajectories and more realistic variability at the individual-level.

Keywords: category learning, cognitive modeling, radial basis function, stochastic optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
69 A Technique for Execution of Written Values on Shared Variables

Authors: Parvinder S. Sandhu, Vijay K. Banga, Prateek Gupta, Amit Verma

Abstract:

The current paper conceptualizes the technique of release consistency indispensable with the concept of synchronization that is user-defined. Programming model concreted with object and class is illustrated and demonstrated. The essence of the paper is phases, events and parallel computing execution .The technique by which the values are visible on shared variables is implemented. The second part of the paper consist of user defined high level synchronization primitives implementation and system architecture with memory protocols. There is a proposition of techniques which are core in deciding the validating and invalidating a stall page .

Keywords: synchronization objects, barrier, phases and events, shared memory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1147
68 An Empirical Analysis of Arabic WebPages Classification using Fuzzy Operators

Authors: Ahmad T. Al-Taani, Noor Aldeen K. Al-Awad

Abstract:

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.

Keywords: Text classification, HTML documents, Web pages, Machine learning, Fuzzy logic, Arabic Web pages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1862
67 An Optimal Algorithm for HTML Page Building Process

Authors: Maryam Jasim Abdullah, Bassim. H. Graimed, Jalal. S. Hameed

Abstract:

Demand over web services is in growing with increases number of Web users. Web service is applied by Web application. Web application size is affected by its user-s requirements and interests. Differential in requirements and interests lead to growing of Web application size. The efficient way to save store spaces for more data and information is achieved by implementing algorithms to compress the contents of Web application documents. This paper introduces an algorithm to reduce Web application size based on reduction of the contents of HTML files. It removes unimportant contents regardless of the HTML file size. The removing is not ignored any character that is predicted in the HTML building process.

Keywords: HTML code, HTML tag, WEB applications, Document compression, DOM tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999
66 Determination of Temperature and Velocity Fields in a Corridor at a Central Interim Spent Fuel Storage Facility Using Numerical Simulation

Authors: V. Salajka, J. Kala, P. Hradil

Abstract:

The presented article deals with the description of a numerical model of a corridor at a Central Interim Spent Fuel Storage Facility (hereinafter CISFSF). The model takes into account the effect of air flows on the temperature of stored waste. The computational model was implemented in the ANSYS/CFX programming environment in the form of a CFD task solution, which was compared with an approximate analytical calculation. The article includes a categorization of the individual alternatives for the ventilation of such underground systems. The aim was to evaluate a ventilation system for a CISFSF with regard to its stability and capacity to provide sufficient ventilation for the removal of heat produced by stored casks with spent nuclear fuel.

Keywords: Temperature fields, Spent Fuel, Interim storage facility, CFD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1356
65 A Comparative Study of Web-pages Classification Methods using Fuzzy Operators Applied to Arabic Web-pages

Authors: Ahmad T. Al-Taani, Noor Aldeen K. Al-Awad

Abstract:

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web-pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.

Keywords: Text classification, HTML, web pages, machine learning, fuzzy logic, Arabic web pages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2191
64 Web Based Remote Access Microcontroller Laboratory

Authors: H. Çimen, İ. Yabanova, M. Nartkaya, S. M. Çinar

Abstract:

This paper presents a web based remote access microcontroller laboratory. Because of accelerated development in electronics and computer technologies, microcontroller-based devices and appliances are found in all aspects of our daily life. Before the implementation of remote access microcontroller laboratory an experiment set is developed by teaching staff for training microcontrollers. Requirement of technical teaching and industrial applications are considered when experiment set is designed. Students can make the experiments by connecting to the experiment set which is connected to the computer that set as the web server. The students can program the microcontroller, can control digital and analog inputs and can observe experiment. Laboratory experiment web page can be accessed via www.elab.aku.edu.tr address.

Keywords: Embedded systems education, distance learning, internet-based control, remote microcontroller laboratory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2228
63 Compression of Semistructured Documents

Authors: Leo Galambos, Jan Lansky, Katsiaryna Chernik

Abstract:

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio

Keywords: Compression, search engine, HTML, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532
62 Evaluating the Effectiveness of Memory Overcommit Techniques on KVM-based Hosting Platform

Authors: Chin-Hung Li

Abstract:

Determining how many virtual machines a Linux host could run can be a challenge. One of tough missions is to find the balance among performance, density and usability. Now KVM hypervisor has become the most popular open source full virtualization solution. It supports several ways of running guests with more memory than host really has. Due to large differences between minimum and maximum guest memory requirements, this paper presents initial results on same-page merging, ballooning and live migration techniques that aims at optimum memory usage on KVM-based cloud platform. Given the design of initial experiments, the results data is worth reference for system administrators. The results from these experiments concluded that each method offers different reliability tradeoff.

Keywords: Kernel-based Virtual Machine, Overcommit, Virtualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3078
61 Principles of Editing and Story Telling in Relation to Editorial Graphic Design

Authors: Melike Taşcıoğlu

Abstract:

This paper aims to combine film-editing principles with basic design principles to explore what graphic designers do in terms of storytelling. The sequential aspect of film is designed and examined through the art of editing. Examining the rules, principles and formulas of film editing can be a used as a method by graphic designers to further practice the art of storytelling. There are many publications and extensive research on design basics; however, time, pace, dramatic structure and choreography are not very well defined in the area of graphic design. In this era of creative storytelling and interdisciplinary collaboration, not only film editors, but also graphic designers and students of art and design should understand the theory and practice of editing to be able to create a strong mise-en-scène and not only a mise-en-page.

Keywords: Design principles, editing principles, editorial design, film editing, graphic design, storytelling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2549
60 Quantifying the Sustainable Building Criteria Based on Case Studies from Malaysia

Authors: Fahanim Abdul Rashid, Muhammad Azzam Ismail, Deo Prasad

Abstract:

In order to encourage the construction of green homes (GH) in Malaysia, a simple and attainable framework for designing and building GHs is needed. This can be achieved by aligning GH principles against Cole-s 'Sustainable Building Criteria' (SBC). This set of considerations was used to categorize the GH features of three case studies from Malaysia. Although the categorization of building features is useful at exploring the presence of sustainability inclinations of each house, the overall impact of building features in each of the five SBCs are unknown. Therefore, this paper explored the possibility of quantifying the impact of building features categorized in SBC1 – “Buildings will have to adapt to the new environment and restore damaged ecology while mitigating resource use" based on existing GH assessment tools and methods and other literature. This process as reported in this paper could lead to a new dimension in green home rating and assessment methods.

Keywords: Green homes, Malaysia, Sustainable BuildingCriteria, Sustainable homes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2098
59 A K-Means Based Clustering Approach for Finding Faulty Modules in Open Source Software Systems

Authors: Parvinder S. Sandhu, Jagdeep Singh, Vikas Gupta, Mandeep Kaur, Sonia Manhas, Ramandeep Sidhu

Abstract:

Prediction of fault-prone modules provides one way to support software quality engineering. Clustering is used to determine the intrinsic grouping in a set of unlabeled data. Among various clustering techniques available in literature K-Means clustering approach is most widely being used. This paper introduces K-Means based Clustering approach for software finding the fault proneness of the Object-Oriented systems. The contribution of this paper is that it has used Metric values of JEdit open source software for generation of the rules for the categorization of software modules in the categories of Faulty and non faulty modules and thereafter empirically validation is performed. The results are measured in terms of accuracy of prediction, probability of Detection and Probability of False Alarms.

Keywords: K-Means, Software Fault, Classification, ObjectOriented Metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2259
58 Virtual Training, Human-Computer and Software Interactions, and Social-Based Embodiness

Authors: Philippe Fauquet-Alekhine

Abstract:

For professions of high risk industries, simulation training has always been thought in terms of high degree of fidelity regarding the real operational situation. Due to the recent progress, this way of training is changing, modifying the human-computer and software interactions: the interactions between trainees during simulation training session tend to become virtual, transforming the social-based embodiness (the way subjects integrate social skills for interpersonal relationship with co-workers). On the basis of the analysis of eight different profession trainings, a categorization of interactions has help to produce an analytical tool, the social interactions table. This tool may be very valuable to point out the changes of social interactions when the training sessions are skipping from a high fidelity simulator to a virtual simulator. In this case, it helps the designers of professional training to analyze and to assess the consequences of the potential lack the social-based embodiness.

Keywords: Interface, interaction, simulator, virtual training.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1748
57 Service-Oriented Architecture for Object- Centric Information Fusion

Authors: Jeffrey A. Dunne, Kevin Ligozio

Abstract:

In many applications there is a broad variety of information relevant to a focal “object" of interest, and the fusion of such heterogeneous data types is desirable for classification and categorization. While these various data types can sometimes be treated as orthogonal (such as the hull number, superstructure color, and speed of an oil tanker), there are instances where the inference and the correlation between quantities can provide improved fusion capabilities (such as the height, weight, and gender of a person). A service-oriented architecture has been designed and prototyped to support the fusion of information for such “object-centric" situations. It is modular, scalable, and flexible, and designed to support new data sources, fusion algorithms, and computational resources without affecting existing services. The architecture is designed to simplify the incorporation of legacy systems, support exact and probabilistic entity disambiguation, recognize and utilize multiple types of uncertainties, and minimize network bandwidth requirements.

Keywords: Data fusion, distributed computing, service-oriented architecture, SOA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1427
56 Sustainability Assessment of Municipal Wastewater Treatment

Authors: Yousra Zakaria Ahmed, Ahmed El Gendy, Salah El Haggar

Abstract:

In this paper, our methodology to assess sustainability of wastewater treatment technologies in Egypt is presented. The preliminary list of factors to be considered, as well as their ranking listed. The factors include, but are not limited to pollutants removal efficiency and energy consumption under the environmental dimension, construction cost, operation and maintenance costs and required land area cost under the economic dimension and public acceptance, noise and generating job opportunities for local residents. This methodology is intended to be a user-friendly screening tool to support the decision making process when investigating different wastewater treatment technologies in Egypt. Based on the research work results presented in this paper, it can be generally concluded that the categorization of some of the social and environmental aspects of sustainability is subjective and highly dependent on the local conditions and researchers’ background.

Keywords: Sustainability, wastewater treatment, sustainability assessment, Egypt.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526
55 The Influence of Some Polyphenols on Human Erythrocytes Glutathione S-Transferase Activity

Authors: Mustafa Erat

Abstract:

Glutathione S-transferase was purified from human erythrocytes and effects of some polyphenols were investigated on the enzyme activity. The purification procedure was performed on Glutathione-Agarose affinity chromatography after preparation of erythrocytes hemolysate with a yield of 81%. The purified enzyme showed a single band on the SDS-PAGE. The effects of some poliphenolic compounds such as catechin, dopa, dopamine, progallol and catechol were examined on the in vitro GST activity. Catechin was determined to be inhibitor for the enzyme, but others were not effective on the enzyme as inhibitors or activators. IC50 value -the concentration of inhibitor which reduces enzyme activity by 50%- was estimated to be 10 mM. Ki constants were also calculated as 6.38 ± 0,70 mM with GSH substrate, and 3.86 ± 0,78 mM with CDNB substrate using the equations of graphs for the inhibitor, and its inhibition type was determined as non-competitive.

Keywords: Drug resistance, Glutathione S-transferase, Inhibition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2142
54 Unsupervised Feature Selection Using Feature Density Functions

Authors: Mina Alibeigi, Sattar Hashemi, Ali Hamzeh

Abstract:

Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. In this paper, we propose a new unsupervised feature selection method which will remove redundant features from the original feature space by the use of probability density functions of various features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several datasets derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both classification accuracy and the number of selected features.

Keywords: Feature, Feature Selection, Filter, Probability Density Function

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2031
53 Evaluating some Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Features selection, learning with kernels, support vector machine, genetic algorithms and classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491
52 The Functionality and Usage of CRM Systems

Authors: Michael Torggler

Abstract:

Modern information and communication technologies offer a variety of support options for the efficient handling of customer relationships. CRM systems have been developed, which are designed to support the processes in the areas of marketing, sales and service. Along with technological progress, CRM systems are constantly changing, i.e. the systems are continually enhanced by new functions. However, not all functions are suitable for every company because of different frameworks and business processes. In this context the question arises whether or not CRM systems are widely used in Austrian companies and which business processes are most frequently supported by CRM systems. This paper aims to shed light on the popularity of CRM systems in Austrian companies in general and the use of different functions to support their daily business. First of all, the paper provides a theoretical overview of the structure of modern CRM systems and proposes a categorization of currently available software functionality for collaborative, operational and analytical CRM processes, which provides the theoretical background for the empirical study. Apart from these theoretical considerations, the paper presents the empirical results of a field survey on the use of CRM systems in Austrian companies and analyzes its findings.

Keywords: CRM systems, CRM system adoption, CRM system diffusion, CRM functionality, Market study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3931
51 A Web Text Mining Flexible Architecture

Authors: M. Castellano, G. Mastronardi, A. Aprile, G. Tarricone

Abstract:

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Keywords: Web text mining, flexible architecture, knowledgediscovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2615