Search results for: classification of big data actors
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8127

Search results for: classification of big data actors

7737 Exploring Structure of Mobile Ecosystem: Inter-Industry Network Analysis Approach

Authors: Yongyoon Suh, Chulhyun Kim, Moon-soo Kim

Abstract:

As increasing importance of symbiosis and cooperation among mobile communication industries, the mobile ecosystem has been especially highlighted in academia and practice. The structure of mobile ecosystem is quite complex and the ecological role of actors is important to understand that structure. In this respect, this study aims to explore structure of mobile ecosystem in the case of Korea using inter-industry network analysis. Then, the ecological roles in mobile ecosystem are identified using centrality measures as a result of network analysis: degree of centrality, closeness, and betweenness. The result shows that the manufacturing and service industries are separate. Also, the ecological roles of some actors are identified based on the characteristics of ecological terms: keystone, niche, and dominator. Based on the result of this paper, we expect that the policy makers can formulate the future of mobile industry and healthier mobile ecosystem can be constructed.

Keywords: Mobile ecosystem, structure, ecological roles, network analysis, network index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2059
7736 Risk Classification of SMEs by Early Warning Model Based on Data Mining

Authors: Nermin Ozgulbas, Ali Serhan Koyuncugil

Abstract:

One of the biggest problems of SMEs is their tendencies to financial distress because of insufficient finance background. In this study, an Early Warning System (EWS) model based on data mining for financial risk detection is presented. CHAID algorithm has been used for development of the EWS. Developed EWS can be served like a tailor made financial advisor in decision making process of the firms with its automated nature to the ones who have inadequate financial background. Besides, an application of the model implemented which covered 7,853 SMEs based on Turkish Central Bank (TCB) 2007 data. By using EWS model, 31 risk profiles, 15 risk indicators, 2 early warning signals, and 4 financial road maps has been determined for financial risk mitigation.

Keywords: Early Warning Systems, Data Mining, Financial Risk, SMEs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3374
7735 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2478
7734 Corporate Credit Rating using Multiclass Classification Models with order Information

Authors: Hyunchul Ahn, Kyoung-Jae Kim

Abstract:

Corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has been one of the attractive research topics in the literature. In recent years, multiclass classification models such as artificial neural network (ANN) or multiclass support vector machine (MSVM) have become a very appealing machine learning approaches due to their good performance. However, most of them have only focused on classifying samples into nominal categories, thus the unique characteristic of the credit rating - ordinality - has been seldom considered in their approaches. This study proposes new types of ANN and MSVM classifiers, which are named OMANN and OMSVM respectively. OMANN and OMSVM are designed to extend binary ANN or SVM classifiers by applying ordinal pairwise partitioning (OPP) strategy. These models can handle ordinal multiple classes efficiently and effectively. To validate the usefulness of these two models, we applied them to the real-world bond rating case. We compared the results of our models to those of conventional approaches. The experimental results showed that our proposed models improve classification accuracy in comparison to typical multiclass classification techniques with the reduced computation resource.

Keywords: Artificial neural network, Corporate credit rating, Support vector machines, Ordinal pairwise partitioning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3427
7733 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2694
7732 A New Classification of Risk-Reduction Options to Improve the Risk-Reduction Readiness of the Railway Industry

Authors: Eberechi Weli, Michael Todinov

Abstract:

The gap between the selection of risk-reduction options in the railway industry and the task of their effective implementation results in compromised safety and substantial losses. An effective risk management must necessarily integrate the evaluation phases with the implementation phase. This paper proposes an essential categorisation of risk reduction measures that best addresses a standard railway industry portfolio. By categorising the risk reduction options into design, operational, procedural and technical options, it is guaranteed that the efforts of the implementation facilitators (people, processes and supporting systems) are systematically harmonised. The classification is based on an integration of fundamental principles of risk reduction in the railway industry with the systems engineering approach.

This paper argues that the use of a similar classification approach is an attribute of organisations possessing a superior level of risk-reduction readiness. The integration of the proposed rational classification structure provides a solid ground for effective risk reduction.

Keywords: Cost effectiveness, organisational readiness, risk reduction, railway, system engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794
7731 Transformation of the Business Model in an Occupational Health Care Company Embedded in an Emerging Personal Data Ecosystem: A Case Study in Finland

Authors: Tero Huhtala, Minna Pikkarainen, Saila Saraniemi

Abstract:

Information technology has long been used as an enabler of exchange for goods and services. Services are evolving from generic to personalized, and the reverse use of customer data has been discussed in both academia and industry for the past few years. This article presents the results of an empirical case study in the area of preventive health care services. The primary data were gathered in workshops, in which future personal data-based services were conceptualized by analyzing future scenarios from a business perspective. The aim of this study is to understand business model transformation in emerging personal data ecosystems. The work was done as a case study in the context of occupational healthcare. The results have implications to theory and practice, indicating that adopting personal data management principles requires transformation of the business model, which, if successfully managed, may provide access to more resources, potential to offer better value, and additional customer channels. These advantages correlate with the broadening of the business ecosystem. Expanding the scope of this study to include more actors would improve the validity of the research. The results draw from existing literature and are based on findings from a case study and the economic properties of the healthcare industry in Finland.

Keywords: Ecosystem, business model, personal data, preventive healthcare.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1130
7730 Automatic Classification of Lung Diseases from CT Images

Authors: Abobaker Mohammed Qasem Farhan, Shangming Yang, Mohammed Al-Nehari

Abstract:

Pneumonia is a kind of lung disease that creates congestion in the chest. Such pneumonic conditions lead to loss of life due to the severity of high congestion. Pneumonic lung disease is caused by viral pneumonia, bacterial pneumonia, or COVID-19 induced pneumonia. The early prediction and classification of such lung diseases help reduce the mortality rate. We propose the automatic Computer-Aided Diagnosis (CAD) system in this paper using the deep learning approach. The proposed CAD system takes input from raw computerized tomography (CT) scans of the patient's chest and automatically predicts disease classification. We designed the Hybrid Deep Learning Algorithm (HDLA) to improve accuracy and reduce processing requirements. The raw CT scans are pre-processed first to enhance their quality for further analysis. We then applied a hybrid model that consists of automatic feature extraction and classification. We propose the robust 2D Convolutional Neural Network (CNN) model to extract the automatic features from the pre-processed CT image. This CNN model assures feature learning with extremely effective 1D feature extraction for each input CT image. The outcome of the 2D CNN model is then normalized using the Min-Max technique. The second step of the proposed hybrid model is related to training and classification using different classifiers. The simulation outcomes using the publicly available dataset prove the robustness and efficiency of the proposed model compared to state-of-art algorithms.

Keywords: CT scans, COVID-19, deep learning, image processing, pneumonia, lung disease.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 571
7729 Data Mining on the Router Logs for Statistical Application Classification

Authors: M. Rahmati, S.M. Mirzababaei

Abstract:

With the advance of information technology in the new era the applications of Internet to access data resources has steadily increased and huge amount of data have become accessible in various forms. Obviously, the network providers and agencies, look after to prevent electronic attacks that may be harmful or may be related to terrorist applications. Thus, these have facilitated the authorities to under take a variety of methods to protect the special regions from harmful data. One of the most important approaches is to use firewall in the network facilities. The main objectives of firewalls are to stop the transfer of suspicious packets in several ways. However because of its blind packet stopping, high process power requirements and expensive prices some of the providers are reluctant to use the firewall. In this paper we proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. By discriminating these data, an administrator may take an approach action against the user. This method is very fast and can be used simply in adjacent with the Internet routers.

Keywords: Data Mining, Firewall, Optimization, Packetclassification, Statistical Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642
7728 Spectral Mixture Model Applied to Cannabis Parcel Determination

Authors: Levent Basayigit, Sinan Demir, Yusuf Ucar, Burhan Kara

Abstract:

Many research projects require accurate delineation of the different land cover type of the agricultural area. Especially it is critically important for the definition of specific plants like cannabis. However, the complexity of vegetation stands structure, abundant vegetation species, and the smooth transition between different seconder section stages make vegetation classification difficult when using traditional approaches such as the maximum likelihood classifier. Most of the time, classification distinguishes only between trees/annual or grain. It has been difficult to accurately determine the cannabis mixed with other plants. In this paper, a mixed distribution models approach is applied to classify pure and mix cannabis parcels using Worldview-2 imagery in the Lakes region of Turkey. Five different land use types (i.e. sunflower, maize, bare soil, and cannabis) were identified in the image. A constrained Gaussian mixture discriminant analysis (GMDA) was used to unmix the image. In the study, 255 reflectance ratios derived from spectral signatures of seven bands (Blue-Green-Yellow-Red-Rededge-NIR1-NIR2) were randomly arranged as 80% for training and 20% for test data. Gaussian mixed distribution model approach is proved to be an effective and convenient way to combine very high spatial resolution imagery for distinguishing cannabis vegetation. Based on the overall accuracies of the classification, the Gaussian mixed distribution model was found to be very successful to achieve image classification tasks. This approach is sensitive to capture the illegal cannabis planting areas in the large plain. This approach can also be used for monitoring and determination with spectral reflections in illegal cannabis planting areas.

Keywords: Gaussian mixture discriminant analysis, spectral mixture model, World View-2, land parcels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 787
7727 Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach

Authors: Lin Yang, Zhian Mi, Jiacheng Xiao, Rong Li

Abstract:

Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.

Keywords: Harmonic analysis, machine learning, music classification and tagging, MIDI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 738
7726 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: Decision tree, genetic algorithm, machine learning, software defect prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1454
7725 Detection of Action Potentials in the Presence of Noise Using Phase-Space Techniques

Authors: Christopher Paterson, Richard Curry, Alan Purvis, Simon Johnson

Abstract:

Emerging Bio-engineering fields such as Brain Computer Interfaces, neuroprothesis devices and modeling and simulation of neural networks have led to increased research activity in algorithms for the detection, isolation and classification of Action Potentials (AP) from noisy data trains. Current techniques in the field of 'unsupervised no-prior knowledge' biosignal processing include energy operators, wavelet detection and adaptive thresholding. These tend to bias towards larger AP waveforms, AP may be missed due to deviations in spike shape and frequency and correlated noise spectrums can cause false detection. Also, such algorithms tend to suffer from large computational expense. A new signal detection technique based upon the ideas of phasespace diagrams and trajectories is proposed based upon the use of a delayed copy of the AP to highlight discontinuities relative to background noise. This idea has been used to create algorithms that are computationally inexpensive and address the above problems. Distinct AP have been picked out and manually classified from real physiological data recorded from a cockroach. To facilitate testing of the new technique, an Auto Regressive Moving Average (ARMA) noise model has been constructed bases upon background noise of the recordings. Along with the AP classification means this model enables generation of realistic neuronal data sets at arbitrary signal to noise ratio (SNR).

Keywords: Action potential detection, Low SNR, Phase spacediagrams/trajectories, Unsupervised/no-prior knowledge.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
7724 A New Weighted LDA Method in Comparison to Some Versions of LDA

Authors: Delaram Jarchi, Reza Boostani

Abstract:

Linear Discrimination Analysis (LDA) is a linear solution for classification of two classes. In this paper, we propose a variant LDA method for multi-class problem which redefines the between class and within class scatter matrices by incorporating a weight function into each of them. The aim is to separate classes as much as possible in a situation that one class is well separated from other classes, incidentally, that class must have a little influence on classification. It has been suggested to alleviate influence of classes that are well separated by adding a weight into between class scatter matrix and within class scatter matrix. To obtain a simple and effective weight function, ordinary LDA between every two classes has been used in order to find Fisher discrimination value and passed it as an input into two weight functions and redefined between class and within class scatter matrices. Experimental results showed that our new LDA method improved classification rate, on glass, iris and wine datasets, in comparison to different versions of LDA.

Keywords: Discriminant vectors, weighted LDA, uncorrelation, principle components, Fisher-face method, Bootstarp method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513
7723 Classification of the Latin Alphabet as Pattern on ARToolkit Markers for Augmented Reality Applications

Authors: Mohamed Badeche, Mohamed Benmohammed

Abstract:

augmented reality is a technique used to insert virtual objects in real scenes. One of the most used libraries in the area is the ARToolkit library. It is based on the recognition of the markers that are in the form of squares with a pattern inside. This pattern which is mostly textual is source of confusing. In this paper, we present the results of a classification of Latin characters as a pattern on the ARToolkit markers to know the most distinguishable among them.

Keywords: ARToolkit library, augmented reality, K-means, patterns

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1826
7722 Multistage Data Envelopment Analysis Model for Malmquist Productivity Index Using Grey's System Theory to Evaluate Performance of Electric Power Supply Chain in Iran

Authors: Mesbaholdin Salami, Farzad Movahedi Sobhani, Mohammad Sadegh Ghazizadeh

Abstract:

Evaluation of organizational performance is among the most important measures that help organizations and entities continuously improve their efficiency. Organizations can use the existing data and results from the comparison of units under investigation to obtain an estimation of their performance. The Malmquist Productivity Index (MPI) is an important index in the evaluation of overall productivity, which considers technological developments and technical efficiency at the same time. This article proposed a model based on the multistage MPI, considering limited data (Grey’s theory). This model can evaluate the performance of units using limited and uncertain data in a multistage process. It was applied by the electricity market manager to Iran’s electric power supply chain (EPSC), which contains uncertain data, to evaluate the performance of its actors. Results from solving the model showed an improvement in the accuracy of future performance of the units under investigation, using the Grey’s system theory. This model can be used in all case studies, in which MPI is used and there are limited or uncertain data.

Keywords: Malmquist Index, Grey's Theory, Charnes Cooper & Rhodes (CCR) Model, network data envelopment analysis, Iran electricity power chain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 544
7721 Effects of Hidden Unit Sizes and Autoregressive Features in Mental Task Classification

Authors: Ramaswamy Palaniappan, Nai-Jen Huan

Abstract:

Classification of electroencephalogram (EEG) signals extracted during mental tasks is a technique that is actively pursued for Brain Computer Interfaces (BCI) designs. In this paper, we compared the classification performances of univariateautoregressive (AR) and multivariate autoregressive (MAR) models for representing EEG signals that were extracted during different mental tasks. Multilayer Perceptron (MLP) neural network (NN) trained by the backpropagation (BP) algorithm was used to classify these features into the different categories representing the mental tasks. Classification performances were also compared across different mental task combinations and 2 sets of hidden units (HU): 2 to 10 HU in steps of 2 and 20 to 100 HU in steps of 20. Five different mental tasks from 4 subjects were used in the experimental study and combinations of 2 different mental tasks were studied for each subject. Three different feature extraction methods with 6th order were used to extract features from these EEG signals: AR coefficients computed with Burg-s algorithm (ARBG), AR coefficients computed with stepwise least square algorithm (ARLS) and MAR coefficients computed with stepwise least square algorithm. The best results were obtained with 20 to 100 HU using ARBG. It is concluded that i) it is important to choose the suitable mental tasks for different individuals for a successful BCI design, ii) higher HU are more suitable and iii) ARBG is the most suitable feature extraction method.

Keywords: Autoregressive, Brain-Computer Interface, Electroencephalogram, Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1796
7720 Water End-Use Classification with Contemporaneous Water-Energy Data and Deep Learning Network

Authors: Khoi A. Nguyen, Rodney A. Stewart, Hong Zhang

Abstract:

‘Water-related energy’ is energy use which is directly or indirectly influenced by changes to water use. Informatics applying a range of mathematical, statistical and rule-based approaches can be used to reveal important information on demand from the available data provided at second, minute or hourly intervals. This study aims to combine these two concepts to improve the current water end use disaggregation problem through applying a wide range of most advanced pattern recognition techniques to analyse the concurrent high-resolution water-energy consumption data. The obtained results have shown that recognition accuracies of all end-uses have significantly increased, especially for mechanised categories, including clothes washer, dishwasher and evaporative air cooler where over 95% of events were correctly classified.

Keywords: Deep learning network, smart metering, water end use, water-energy data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1352
7719 A Software Framework for Predicting Oil-Palm Yield from Climate Data

Authors: Mohd. Noor Md. Sap, A. Majid Awan

Abstract:

Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.

Keywords: Pattern analysis, clustering, kernel methods, spatial data, crop yield

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1967
7718 A Spatial Hypergraph Based Semi-Supervised Band Selection Method for Hyperspectral Imagery Semantic Interpretation

Authors: Akrem Sellami, Imed Riadh Farah

Abstract:

Hyperspectral imagery (HSI) typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image. Hence, a pixel in HSI is a high-dimensional vector of intensities with a large spectral range and a high spectral resolution. Therefore, the semantic interpretation is a challenging task of HSI analysis. We focused in this paper on object classification as HSI semantic interpretation. However, HSI classification still faces some issues, among which are the following: The spatial variability of spectral signatures, the high number of spectral bands, and the high cost of true sample labeling. Therefore, the high number of spectral bands and the low number of training samples pose the problem of the curse of dimensionality. In order to resolve this problem, we propose to introduce the process of dimensionality reduction trying to improve the classification of HSI. The presented approach is a semi-supervised band selection method based on spatial hypergraph embedding model to represent higher order relationships with different weights of the spatial neighbors corresponding to the centroid of pixel. This semi-supervised band selection has been developed to select useful bands for object classification. The presented approach is evaluated on AVIRIS and ROSIS HSIs and compared to other dimensionality reduction methods. The experimental results demonstrate the efficacy of our approach compared to many existing dimensionality reduction methods for HSI classification.

Keywords: Hyperspectral image, spatial hypergraph, dimensionality reduction, semantic interpretation, band selection, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1211
7717 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis

Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel

Abstract:

Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.

Keywords: Artificial Immune System, Breast Cancer Diagnosis, Euclidean Function, Gaussian Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2108
7716 Integrated ACOR/IACOMV-R-SVM Algorithm

Authors: Hiba Basim Alwan, Ku Ruhana Ku-Mahamud

Abstract:

A direction for ACO is to optimize continuous and mixed (discrete and continuous) variables in solving problems with various types of data. Support Vector Machine (SVM), which originates from the statistical approach, is a present day classification technique. The main problems of SVM are selecting feature subset and tuning the parameters. Discretizing the continuous value of the parameters is the most common approach in tuning SVM parameters. This process will result in loss of information which affects the classification accuracy. This paper presents two algorithms that can simultaneously tune SVM parameters and select the feature subset. The first algorithm, ACOR-SVM, will tune SVM parameters, while the second IACOMV-R-SVM algorithm will simultaneously tune SVM parameters and select the feature subset. Three benchmark UCI datasets were used in the experiments to validate the performance of the proposed algorithms. The results show that the proposed algorithms have good performances as compared to other approaches.

Keywords: Continuous ant colony optimization, incremental continuous ant colony, simultaneous optimization, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 869
7715 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: Consensus, curse of correlation, imbalanced classification, rank-based chain-mode ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 717
7714 Effectual Role of Local Level Partnership Schemes in Affordable Housing Delivery

Authors: Hala S. Mekawy

Abstract:

Affordable housing delivery for low and lower middle income families is a prominent problem in many developing countries; governments alone are unable to address this challenge due to diverse financial and regulatory constraints, and the private sector's contribution is rare and assists only middle-income households even when institutional and legal reforms are conducted to persuade it to go down market. Also, the market-enabling policy measures advocated by the World Bank since the early nineties have been strongly criticized and proven to be inappropriate to developing country contexts, where it is highly unlikely that the formal private sector can reach low income population. In addition to governments and private developers, affordable housing delivery systems involve an intricate network of relationships between a diverse range of actors. Collaboration between them was proven to be vital, and hence, an approach towards partnership schemes for affordable housing delivery has emerged. The basic premise of this paper is that addressing housing affordability challenges in Egypt demands direct public support, as markets and market actors alone would never succeed in delivering decent affordable housing to low and lower middle income groups. It argues that this support would ideally be through local level partnership schemes, with a leading decentralized local government role, and partners being identified according to specific local conditions. It attempts to identify major attributes that would ensure the fulfillment of the goals of such schemes in the Egyptian context. This is based upon evidence from diversified worldwide experiences, in addition to the main outcomes of a questionnaire that was conducted to specialists and chief actors in the field.

Keywords: Affordable housing, partnership schemes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2785
7713 Optimal Multilayer Perceptron Structure For Classification of HIV Sub-Type Viruses

Authors: Zeyneb Kurt, Oguzhan Yavuz

Abstract:

The feature of HIV genome is in a wide range because of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point, R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to classify by using the coreceptors of HIV genome. The aim of this study is to develop the optimal Multilayer Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model (AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4 viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.

Keywords: Multilayer Perceptron, Auto-Regressive Model, HIV, ROC Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1429
7712 Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu

Abstract:

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.

Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
7711 Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Authors: Lu Zhang, Chunping Li, Jun Liu, Hui Wang

Abstract:

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

Keywords: Text classification, Text clustering, Text similarity, Wikipedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2104
7710 Real-Time Testing of Steel Strip Welds based on Bayesian Decision Theory

Authors: Julio Molleda, Daniel F. García, Juan C. Granda, Francisco J. Suárez

Abstract:

One of the main trouble in a steel strip manufacturing line is the breakage of whatever weld carried out between steel coils, that are used to produce the continuous strip to be processed. A weld breakage results in a several hours stop of the manufacturing line. In this process the damages caused by the breakage must be repaired. After the reparation and in order to go on with the production it will be necessary a restarting process of the line. For minimizing this problem, a human operator must inspect visually and manually each weld in order to avoid its breakage during the manufacturing process. The work presented in this paper is based on the Bayesian decision theory and it presents an approach to detect, on real-time, steel strip defective welds. This approach is based on quantifying the tradeoffs between various classification decisions using probability and the costs that accompany such decisions.

Keywords: Classification, Pattern Recognition, ProbabilisticReasoning, Statistical Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1397
7709 An Exact Solution to Support Vector Mixture

Authors: Monjed Ezzeddinne, Nicolas Lefebvre, Régis Lengellé

Abstract:

This paper presents a new version of the SVM mixture algorithm initially proposed by Kwok for classification and regression problems. For both cases, a slight modification of the mixture model leads to a standard SVM training problem, to the existence of an exact solution and allows the direct use of well known decomposition and working set selection algorithms. Only the regression case is considered in this paper but classification has been addressed in a very similar way. This method has been successfully applied to engine pollutants emission modeling.

Keywords: Identification, Learning systems, Mixture ofExperts, Support Vector Machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1358
7708 Weighted k-Nearest-Neighbor Techniques for High Throughput Screening Data

Authors: Kozak K, M. Kozak, K. Stapor

Abstract:

The k-nearest neighbors (knn) is a simple but effective method of classification. In this paper we present an extended version of this technique for chemical compounds used in High Throughput Screening, where the distances of the nearest neighbors can be taken into account. Our algorithm uses kernel weight functions as guidance for the process of defining activity in screening data. Proposed kernel weight function aims to combine properties of graphical structure and molecule descriptors of screening compounds. We apply the modified knn method on several experimental data from biological screens. The experimental results confirm the effectiveness of the proposed method.

Keywords: biological screening, kernel methods, KNN, QSAR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2258