Search results for: Text Features.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2040

Search results for: Text Features.

1950 Emotional Analysis for Text Search Queries on Internet

Authors: Gemma García López

Abstract:

The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.

Keywords: Emotion classification, text search queries, emotional analysis, sentiment analysis in text, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 662
1949 Video Data Mining based on Information Fusion for Tamper Detection

Authors: Girija Chetty, Renuka Biswas

Abstract:

In this paper, we propose novel algorithmic models based on information fusion and feature transformation in crossmodal subspace for different types of residue features extracted from several intra-frame and inter-frame pixel sub-blocks in video sequences for detecting digital video tampering or forgery. An evaluation of proposed residue features – the noise residue features and the quantization features, their transformation in cross-modal subspace, and their multimodal fusion, for emulated copy-move tamper scenario shows a significant improvement in tamper detection accuracy as compared to single mode features without transformation in cross-modal subspace.

Keywords: image tamper detection, digital forensics, correlation features image fusion

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1856
1948 Exploiting Global Self Similarity for Head-Shoulder Detection

Authors: Lae-Jeong Park, Jung-Ho Moon

Abstract:

People detection from images has a variety of applications such as video surveillance and driver assistance system, but is still a challenging task and more difficult in crowded environments such as shopping malls in which occlusion of lower parts of human body often occurs. Lack of the full-body information requires more effective features than common features such as HOG. In this paper, new features are introduced that exploits global self-symmetry (GSS) characteristic in head-shoulder patterns. The features encode the similarity or difference of color histograms and oriented gradient histograms between two vertically symmetric blocks. The domain-specific features are rapid to compute from the integral images in Viola-Jones cascade-of-rejecters framework. The proposed features are evaluated with our own head-shoulder dataset that, in part, consists of a well-known INRIA pedestrian dataset. Experimental results show that the GSS features are effective in reduction of false alarmsmarginally and the gradient GSS features are preferred more often than the color GSS ones in the feature selection.

Keywords: Pedestrian detection, cascade of rejecters, feature extraction, self-symmetry, HOG.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2353
1947 Improved Processing Speed for Text Watermarking Algorithm in Color Images

Authors: Hamza A. Al-Sewadi, Akram N. A. Aldakari

Abstract:

Copyright protection and ownership proof of digital multimedia are achieved nowadays by digital watermarking techniques. A text watermarking algorithm for protecting the property rights and ownership judgment of color images is proposed in this paper. Embedding is achieved by inserting texts elements randomly into the color image as noise. The YIQ image processing model is found to be faster than other image processing methods, and hence, it is adopted for the embedding process. An optional choice of encrypting the text watermark before embedding is also suggested (in case required by some applications), where, the text can is encrypted using any enciphering technique adding more difficulty to hackers. Experiments resulted in embedding speed improvement of more than double the speed of other considered systems (such as least significant bit method, and separate color code methods), and a fairly acceptable level of peak signal to noise ratio (PSNR) with low mean square error values for watermarking purposes.

Keywords: Steganography, watermarking, private keys, time complexity measurements.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 772
1946 Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features

Authors: Mehmet Hacibeyoglu, Ahmet Arslan, Sirzat Kahramanli

Abstract:

This study analyzes the effect of discretization on classification of datasets including continuous valued features. Six datasets from UCI which containing continuous valued features are discretized with entropy-based discretization method. The performance improvement between the dataset with original features and the dataset with discretized features is compared with k-nearest neighbors, Naive Bayes, C4.5 and CN2 data mining classification algorithms. As the result the classification accuracies of the six datasets are improved averagely by 1.71% to 12.31%.

Keywords: Data mining classification algorithms, entropy-baseddiscretization method

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2417
1945 Using Reservoir Models for Monitoring Geothermal Surface Features

Authors: John P. O’Sullivan, Thomas M. P. Ratouis, Michael J. O’Sullivan

Abstract:

As the use of geothermal energy grows internationally more effort is required to monitor and protect areas with rare and important geothermal surface features. A number of approaches are presented for developing and calibrating numerical geothermal reservoir models that are capable of accurately representing geothermal surface features. The approaches are discussed in the context of cases studies of the Rotorua geothermal system and the Orakei-korako geothermal system, both of which contain important surface features. The results show that models are able to match the available field data accurately and hence can be used as valuable tools for predicting the future response of the systems to changes in use.

Keywords: Geothermal reservoir models, surface features, monitoring, TOUGH2.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2024
1944 Investigating Relationship between Product Features and Supply Chain Integration

Authors: Saied Rasul Hosseini Baharanchi

Abstract:

This paper addresses integration issues in supply chain, and tries to investigate how different aspects of integration are linked with some product features. Integration in this study is interpreted as "internal", "upstream" (supply), and "downstream" (demand). Two features of product innovative and quality are considered. To examine the relationships between supply chain integrations – as mentioned above, and product features, this research follows the survey method in automotive industry.The results imply that supply chain upstream integration has a higher impact on product quality, comparing to internal and supply chain downstream integrations. It is also found that the influence of supply chain downstream integration on product innovation is greater than other variables. In brief, this study mainly tackles the importance of specific level of supply chain integrations and its effects on two product features.

Keywords: Supply chain upstream integration, supply chaindownstream integration, internal integration, product features

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1784
1943 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial neural network, competitive dynamics, logistic regression, text classification, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 461
1942 Image Search by Features of Sorted Gray level Histogram Polynomial Curve

Authors: Awais Adnan, Muhammad Ali, Amir Hanif Dar

Abstract:

Image Searching was always a problem specially when these images are not properly managed or these are distributed over different locations. Currently different techniques are used for image search. On one end, more features of the image are captured and stored to get better results. Storing and management of such features is itself a time consuming job. While on the other extreme if fewer features are stored the accuracy rate is not satisfactory. Same image stored with different visual properties can further reduce the rate of accuracy. In this paper we present a new concept of using polynomials of sorted histogram of the image. This approach need less overhead and can cope with the difference in visual features of image.

Keywords: Sorted Histogram, Polynomial Curves, feature pointsof images, Grayscale, visual properties of image.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1375
1941 Evaluating 8D Reports Using Text-Mining

Authors: Benjamin Kuester, Bjoern Eilert, Malte Stonis, Ludger Overmeyer

Abstract:

Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.

Keywords: 8D report, complaint management, evaluation system, text-mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 975
1940 A New Graphical Password: Combination of Recall & Recognition Based Approach

Authors: Md. Asraful Haque, Babbar Imam

Abstract:

Information Security is the most describing problem in present times. To cop up with the security of the information, the passwords were introduced. The alphanumeric passwords are the most popular authentication method and still used up to now. However, text based passwords suffer from various drawbacks such as they are easy to crack through dictionary attacks, brute force attacks, keylogger, social engineering etc. Graphical Password is a good replacement for text password. Psychological studies say that human can remember pictures better than text. So this is the fact that graphical passwords are easy to remember. But at the same time due to this reason most of the graphical passwords are prone to shoulder surfing. In this paper, we have suggested a shoulder-surfing resistant graphical password authentication method. The system is a combination of recognition and pure recall based techniques. Proposed scheme can be useful for smart hand held devices (like smart phones i.e. PDAs, iPod, iPhone, etc) which are more handy and convenient to use than traditional desktop computer systems.

Keywords: Authentication, Graphical Password, Text Password, Information Security, Shoulder-surfing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4089
1939 Effective Features for Disambiguation of Turkish Verbs

Authors: Zeynep Orhan, Zeynep Altan

Abstract:

This paper summarizes the results of some experiments for finding the effective features for disambiguation of Turkish verbs. Word sense disambiguation is a current area of investigation in which verbs have the dominant role. Generally verbs have more senses than the other types of words in the average and detecting these features for verbs may lead to some improvements for other word types. In this paper we have considered only the syntactical features that can be obtained from the corpus and tested by using some famous machine learning algorithms.

Keywords: Word sense disambiguation, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714
1938 Spacial Poetic Text throughout Samih al-Qasim's Poetry

Authors: Saleem Abu Jaber, Khaled Igbaria

Abstract:

For readers, space/place is one of the most significant references to reveal deep significances and indications in modern Arabic poetic texts. Generally, when poets evoke places and/or spaces, they do not mean to refer readers to detailed geographic or physical spaces, but to the symbolic significances and dimensions that those spaces have and through which poets encourage spacial awareness in their readers. Recently, as a result, there has been a great deal of interest in research addressing spacial poetic texts and dimensions in modern Arabic poetry in general and in Palestinian poetry in particular. Samih al-Qasim is one of the most recent prominent Palestinian revolutionary poets. Al-Qasim has published six series of poems that are well known in the Arab world. Although several researchers have studied al-Qasim's poetry, to our knowledge, yet no one has studied the aspect of spacial poetic text in his poetry. Therefore, this paper seeks to fill a gap in the scholarship that has not been addressed up to now. This article aims, not only to demonstrate the presence of spacial poetic text and dimensions throughout al-Qasim's poetry, but also to investigate the purpose for which the poet uses spacial poetic text. Our theory is that the poet, consciously and significantly, uses spacial poetic texts to magnify the Palestinian identity of the Palestinian readers.  Methodologically, we applied a descriptive analytic method, referencing al-Qasim's poetry, addressing spacial poetic texts practically but not theoretically or statistically.

Keywords: Samih al-Qasim, place and space, Palestinian poetry, spacial poetic text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 801
1937 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

Authors: Khaled M. Alhawiti

Abstract:

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Keywords: Data Retrieval, Information retrieval, Natural Language Processing, Text Structuring.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2771
1936 Different Multimedia Presentation Types and Students' Interpretation Achievement

Authors: Cenk Akbiyik, Gonul Altin Akbiyik

Abstract:

The main purpose of the study was to determine whether students- interpretation achievement differed with the use of various multimedia presentation types. Four groups of students, text only (T), audio only (A), text and audio (TA), text and image (TI), were arranged and they were presented the same story via different types of multimedia presentations. Inference achievement was measured by a critical thinking inference test. Higher mean scores for the TA group compared to the other three groups were found. Also when compared pairwise, interpretation achievement of the TA group differed significantly from scores of the T and TI groups. These differences were interpreted with the increased cognitive load. Increased cognitive load for the TA group may have invited students to put more effort into comprehending the text, thus resulting in better test scores. Findings of the study can be seen as a sign of the importance of learning situations and learning outcomes in multimedia-supported learning environments and may have practical benefits for instructional designers.

Keywords: Multimedia, cognitive multimedia, dual coding, cognitive load, critical thinking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3394
1935 Evolutionary Feature Selection for Text Documents using the SVM

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

Keywords: Feature Selection, Learning with Kernels, Support Vector Machine, Genetic Algorithm, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658
1934 A Study of Touching Characters in Degraded Gurmukhi Text

Authors: M. K. Jindal, G. S. Lehal, R. K. Sharma

Abstract:

Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper a study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis.Structural properties of the Gurmukhi characters are used for defining the categories. New algorithms have been proposed to segment the touching characters in middle zone. These algorithms have shown a reasonable improvement in segmenting the touching characters in degraded Gurmukhi script. The algorithms proposed in this paper are applicable only to machine printed text.

Keywords: Character Segmentation, Middle Zone, Touching Characters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791
1933 Web Page Watermarking: XML files using Synonyms and Acronyms

Authors: Nighat Mir, Sayed Afaq Hussain

Abstract:

Advent enhancements in the field of computing have increased massive use of web based electronic documents. Current Copyright protection laws are inadequate to prove the ownership for electronic documents and do not provide strong features against copying and manipulating information from the web. This has opened many channels for securing information and significant evolutions have been made in the area of information security. Digital Watermarking has developed into a very dynamic area of research and has addressed challenging issues for digital content. Watermarking can be visible (logos or signatures) and invisible (encoding and decoding). Many visible watermarking techniques have been studied for text documents but there are very few for web based text. XML files are used to trade information on the internet and contain important information. In this paper, two invisible watermarking techniques using Synonyms and Acronyms are proposed for XML files to prove the intellectual ownership and to achieve the security. Analysis is made for different attacks and amount of capacity to be embedded in the XML file is also noticed. A comparative analysis for capacity is also made for both methods. The system has been implemented using C# language and all tests are made practically to get the results.

Keywords: Watermarking, Extensible Markup Language (XML), Synonyms, Acronyms, Copyright protection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2227
1932 Providing Medical Information in Braille: Research and Development of Automatic Braille Translation Program for Japanese “eBraille“

Authors: Aki Sugano, Mika Ohta, Mineko Ikegami, Kenji Miura, Sayo Tsukamoto, Akihiro Ichinose, Toshiko Ohshima, Eiichi Maeda, Masako Matsuura, Yutaka Takao

Abstract:

Along with the advances in medicine, providing medical information to individual patient is becoming more important. In Japan such information via Braille is hardly provided to blind and partially sighted people. Thus we are researching and developing a Web-based automatic translation program “eBraille" to translate Japanese text into Japanese Braille. First we analyzed the Japanese transcription rules to implement them on our program. We then added medical words to the dictionary of the program to improve its translation accuracy for medical text. Finally we examined the efficacy of statistical learning models (SLMs) for further increase of word segmentation accuracy in braille translation. As a result, eBraille had the highest translation accuracy in the comparison with other translation programs, improved the accuracy for medical text and is utilized to make hospital brochures in braille for outpatients and inpatients.

Keywords: Automatic Braille translation, Medical text, Partially sighted people.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555
1931 A Web-Based System for Mapping Features into ISO 14649-Compliant Machining Workingsteps

Authors: J. C. T. Benavente, J. C. E. Ferreira

Abstract:

The rapid development of manufacturing and information systems has caused significant changes in manufacturing environments in recent decades. Mass production has given way to flexible manufacturing systems, in which an important characteristic is customized or "on demand" production. In this scenario, the seamless and without gaps information flow becomes a key factor for success of enterprises. In this paper we present a framework to support the mapping of features into machining workingsteps compliant with the ISO 14649 standard (known as STEP-NC). The system determines how the features can be made with the available manufacturing resources. Examples of the mapping method are presented for features such as a pocket with a general surface.

Keywords: Features, ISO 14649 standard, STEP-NC, mapping, machining workingsteps.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1838
1930 Image Spam Detection Using Color Features and K-Nearest Neighbor Classification

Authors: T. Kumaresan, S. Sanjushree, C. Palanisamy

Abstract:

Image spam is a kind of email spam where the spam text is embedded with an image. It is a new spamming technique being used by spammers to send their messages to bulk of internet users. Spam email has become a big problem in the lives of internet users, causing time consumption and economic losses. The main objective of this paper is to detect the image spam by using histogram properties of an image. Though there are many techniques to automatically detect and avoid this problem, spammers employing new tricks to bypass those techniques, as a result those techniques are inefficient to detect the spam mails. In this paper we have proposed a new method to detect the image spam. Here the image features are extracted by using RGB histogram, HSV histogram and combination of both RGB and HSV histogram. Based on the optimized image feature set classification is done by using k- Nearest Neighbor(k-NN) algorithm. Experimental result shows that our method has achieved better accuracy. From the result it is known that combination of RGB and HSV histogram with k-NN algorithm gives the best accuracy in spam detection.

Keywords: File Type, HSV Histogram, k-NN, RGB Histogram, Spam Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2089
1929 Edit Distance Algorithm to Increase Storage Efficiency of Javanese Corpora

Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy

Abstract:

Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).

Keywords: edit distance coefficient, Javanese, parallel text alignment, phrase pair combination

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
1928 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2406
1927 Image Segmentation Using the K-means Algorithm for Texture Features

Authors: Wan-Ting Lin, Chuen-Horng Lin, Tsung-Ho Wu, Yung-Kuan Chan

Abstract:

This study aims to segment objects using the K-means algorithm for texture features. Firstly, the algorithm transforms color images into gray images. This paper describes a novel technique for the extraction of texture features in an image. Then, in a group of similar features, objects and backgrounds are differentiated by using the K-means algorithm. Finally, this paper proposes a new object segmentation algorithm using the morphological technique. The experiments described include the segmentation of single and multiple objects featured in this paper. The region of an object can be accurately segmented out. The results can help to perform image retrieval and analyze features of an object, as are shown in this paper.

Keywords: k-mean, multiple objects, segmentation, texturefeatures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2779
1926 The Negative Effect of Traditional Loops Style on the Performance of Algorithms

Authors: Mahmoud Moh'd Mhashi

Abstract:

A new algorithm called Character-Comparison to Character-Access (CCCA) is developed to test the effect of both: 1) converting character-comparison and number-comparison into character-access and 2) the starting point of checking on the performance of the checking operation in string searching. An experiment is performed using both English text and DNA text with different sizes. The results are compared with five algorithms, namely, Naive, BM, Inf_Suf_Pref, Raita, and Cycle. With the CCCA algorithm, the results suggest that the evaluation criteria of the average number of total comparisons are improved up to 35%. Furthermore, the results suggest that the clock time required by the other algorithms is improved in range from 22.13% to 42.33% by the new CCCA algorithm.

Keywords: Pattern matching, string searching, charactercomparison, character-access, text type, and checking

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1222
1925 Feature Selection with Kohonen Self Organizing Classification Algorithm

Authors: Francesco Maiorana

Abstract:

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Keywords: Clustering algorithm, Data mining, Feature selection, Grid, Kohonen Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3001
1924 Identification of Arousal and Relaxation by using SVM-Based Fusion of PPG Features

Authors: Chi Jung Kim, Mincheol Whang, Eui Chul Lee

Abstract:

In this paper, we propose a new method to distinguish between arousal and relaxation states by using multiple features acquired from a photoplethysmogram (PPG) and support vector machine (SVM). To induce arousal and relaxation states in subjects, 2 kinds of sound stimuli are used, and their corresponding biosignals are obtained using the PPG sensor. Two features–pulse to pulse interval (PPI) and pulse amplitude (PA)–are extracted from acquired PPG data, and a nonlinear classification between arousal and relaxation is performed using SVM. This methodology has several advantages when compared with previous similar studies. Firstly, we extracted 2 separate features from PPG, i.e., PPI and PA. Secondly, in order to improve the classification accuracy, SVM-based nonlinear classification was performed. Thirdly, to solve classification problems caused by generalized features of whole subjects, we defined each threshold according to individual features. Experimental results showed that the average classification accuracy was 74.67%. Also, the proposed method showed the better identification performance than the single feature based methods. From this result, we confirmed that arousal and relaxation can be classified using SVM and PPG features.

Keywords: Support Vector Machine, PPG, Emotion Recognition, Arousal, Relaxation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2432
1923 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud

Abstract:

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: Gradient boosting, XGBoost, LightGBM, CatBoost, home credit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8948
1922 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2430
1921 Deployment of Service Quality Characteristics

Authors: Shuki Dror

Abstract:

This work discusses an innovative methodology for deployment of service quality characteristics. Four groups of organizational features that may influence the quality of services are identified: human resource, technology, planning, and organizational relationships. A House of Service Quality (HOSQ) matrix is built to extract the desired improvement in the service quality characteristics and to translate them into a hierarchy of important organizational features. The Mean Square Error (MSE) criterion enables the pinpointing of the few essential service quality characteristics to be improved as well as selection of the vital organizational features. The method was implemented in an engineering supply enterprise and provides useful information on its vital service dimensions.

Keywords: HOQ, organizational features, service quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1817