Search results for: statistical features

2242 Quantitative Genetics Researches on Milk Protein Systems of Romanian Grey Steppe Breed

Authors: V. Maciuc, Şt. Creangă, I. Gîlcă, V. Ujică

Abstract:

The paper makes part from a complex research project on Romanian Grey Steppe, a unique breed in terms of biological and cultural-historical importance, on the verge of extinction and which has been included in a preservation programme of genetic resources from Romania. The study of genetic polymorphism of protean fractions, especially kappa-casein, and the genotype relations of these lactoproteins with some quantitative and qualitative features of milk yield represents a current theme and a novelty for this breed. In the estimation of the genetic parameters we used R.E.M.L. (Restricted Maximum Likelihood) method. The main lactoprotein from milk, kappa - casein (K-cz), characterized in the specialized literature as a feature having a high degree of hereditary transmission, behaves as such in the nucleus under study, a value also confirmed by the heritability coefficient (h2 = 0.57 %). We must mention the medium values for milk and fat quantity (h2=0.26, 0.29 %) and the fat and protein percentage from milk having a high hereditary influence h2 = 0.71 - 0.63 %. Correlations between kappa-casein and the milk quantity are negative and strong. Between kappa-casein and other qualitative features of milk (fat content 0.58-0.67 % and protein content 0.77- 0.87%), there are positive and very strong correlations. At the same time, between kappa-casein and β casein (β-cz), β lactoglobulin (β- lg) respectively, correlations are positive having high values (0.37 – 0.45 %), indicating the same causes and determining factors for the two groups of features.

Keywords: breed, genetic preservation, lactoproteins, Romanian Grey Steppe

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1703

2241 Speech Enhancement by Marginal Statistical Characterization in the Log Gabor Wavelet Domain

Authors: Suman Senapati, Goutam Saha

Abstract:

This work presents a fusion of Log Gabor Wavelet (LGW) and Maximum a Posteriori (MAP) estimator as a speech enhancement tool for acoustical background noise reduction. The probability density function (pdf) of the speech spectral amplitude is approximated by a Generalized Laplacian Distribution (GLD). Compared to earlier estimators the proposed method estimates the underlying statistical model more accurately by appropriately choosing the model parameters of GLD. Experimental results show that the proposed estimator yields a higher improvement in Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral Distortion (LSD) in two different noisy environments compared to other estimators.

Keywords: Speech Enhancement, Generalized Laplacian Distribution, Log Gabor Wavelet, Bayesian MAP Marginal Estimator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617

2240 A New Fast Intra Prediction Mode Decision Algorithm for H.264/AVC Encoders

Authors: A. Elyousfi, A. Tamtaoui, E. Bouyakhf

Abstract:

The H.264/AVC video coding standard contains a number of advanced features. Ones of the new features introduced in this standard is the multiple intramode prediction. Its function exploits directional spatial correlation with adjacent block for intra prediction. With this new features, intra coding of H.264/AVC offers a considerably higher improvement in coding efficiency compared to other compression standard, but computational complexity is increased significantly when brut force rate distortion optimization (RDO) algorithm is used. In this paper, we propose a new fast intra prediction mode decision method for the complexity reduction of H.264 video coding. for luma intra prediction, the proposed method consists of two step: in the first step, we make the RDO for four mode of intra 4x4 block, based the distribution of RDO cost of those modes and the idea that the fort correlation with adjacent mode, we select the best mode of intra 4x4 block. In the second step, we based the fact that the dominating direction of a smaller block is similar to that of bigger block, the candidate modes of 8x8 blocks and 16x16 macroblocks are determined. So, in case of chroma intra prediction, the variance of the chroma pixel values is much smaller than that of luma ones, since our proposed uses only the mode DC. Experimental results show that the new fast intra mode decision algorithm increases the speed of intra coding significantly with negligible loss of PSNR.

Keywords: Intra prediction, H264/AVC, video coding, encodercomplexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2489

2239 Statistical Description of Wave Interactions in 1D Defect Turbulence

Authors: Yusuke Uchiyama, Hidetoshi Konno

Abstract:

We have investigated statistical properties of the defect turbulence in 1D CGLE wherein many body interaction is involved between local depressing wave (LDW) and local standing wave (LSW). It is shown that the counting number fluctuation of LDW is subject to the sub-Poisson statistics (SUBP). The physical origin of the SUBP can be ascribed to pair extinction of LDWs based on the master equation approach. It is also shown that the probability density function (pdf) of inter-LDW distance can be identified by the hyper gamma distribution. Assuming a superstatistics of the exponential distribution (Poisson configuration), a plausible explanation is given. It is shown further that the pdf of amplitude of LDW has a fattail. The underlying mechanism of its fluctuation is examined by introducing a generalized fractional Poisson configuration.

Keywords: sub-Poisson statistics, hyper gamma distribution, fractional Poisson configuration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543

2238 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: Share of electricity generation, CO2 emission, targets, multivariate methods, hierarchical clustering, K-means clustering, discriminant analyzed, correlation, EU member countries.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237

2237 Statistical Study of Drink Markets: Case Study

Authors: Seyed Habib A. Rahmati, Arash Haji Karimi, Reza Saffari, Zeeya Rashvand

Abstract:

An important official knowledge in each country is to have a comprehensive knowledge about markets of each group of products. Drink markets are one the most important markets of each country as a sub-group of nourishment markets. This paper is going to study these markets in Iran. To do so, first, two drink products are selected as pilot, including milk and concentrate. Then, for each product, two groups of information are estimated for the last five years, including 1) total consumption (demand) and 2) total production. Finally, the two groups of productions are compared statistically by means of two statistical tests called t test and Mann- Whitney test. The implemented Different related tables and figures are also illustrated to show the method more explicitly.

Keywords: Market evaluation, Drink, Estimation, Mann- Whitney test

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1338

2236 Experimental Testing of Statistical Size Effect in Civil Engineering Structures

Authors: Jana Kaděrová, Miroslav Vořechovský

Abstract:

The presented paper copes with an experimental evaluation of a model based on modified Weibull size effect theory. Classical statistical Weibull theory was modified by introducing a new parameter (correlation length lp) representing the spatial autocorrelation of a random mechanical properties of material. This size effect modification was observed on two different materials used in civil engineering: unreinforced (plain) concrete and multi-filament yarns made of alkaliresistant (AR) glass which are used for textile-reinforced concrete. The behavior under flexural, resp. tensile loading was investigated by laboratory experiments. A high number of specimens of different sizes was tested to obtain statistically significant data which were subsequently corrected and statistically processed. Due to a distortion of the measured displacements caused by the unstiff experiment device, only the maximal load values were statistically evaluated. Results of the experiments showed a decreasing strength with an increasing sample length. Size effect curves were obtained and the correlation length was fitted according to measured data. Results did not exclude the existence of the proposed new parameter lp.

Keywords: Statistical size effect, concrete, multi filaments yarns, experiment, autocorrelation length.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1971

2235 Performance Prediction of Multi-Agent Based Simulation Applications on the Grid

Authors: Dawit Mengistu, Lars Lundberg, Paul Davidsson

Abstract:

A major requirement for Grid application developers is ensuring performance and scalability of their applications. Predicting the performance of an application demands understanding its specific features. This paper discusses performance modeling and prediction of multi-agent based simulation (MABS) applications on the Grid. An experiment conducted using a synthetic MABS workload explains the key features to be included in the performance model. The results obtained from the experiment show that the prediction model developed for the synthetic workload can be used as a guideline to understand to estimate the performance characteristics of real world simulation applications.

Keywords: Grid computing, Performance modeling, Performance prediction, Multi-agent simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440

2234 Morphological Analysis of English L1-Persian L2 Adult Learners’ Interlanguage: From the Perspective of SLA Variation

Authors: Maassoumeh Bemani Naeini

Abstract:

Studies on interlanguage have long been engaged in describing the phenomenon of variation in SLA. Pursuing the same goal and particularly addressing the role of linguistic features, this study describes the use of Persian morphology in the interlanguage of two adult English-speaking learners of Persian L2. Taking the general approach of a combination of contrastive analysis, error analysis and interlanguage analysis, this study focuses on the identification and prediction of some possible instances of transfer from English L1 to Persian L2 across six elicitation tasks aiming to investigate whether any of contextual features may variably influence the learners’ order of morpheme accuracy in the areas of copula, possessives, articles, demonstratives, plural form, personal pronouns, and genitive cases. Results describe the existence of task variation in the interlanguage system of Persian L2 learners.

Keywords: English L1, Interlanguage Analysis, Persian L2, SLA variation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1298

2233 High Impedance Faults Detection Technique Based on Wavelet Transform

Authors: Ming-Ta Yang, Jin-Lung Guan, Jhy-Cherng Gu

Abstract:

The purpose of this paper is to solve the problem of protecting aerial lines from high impedance faults (HIFs) in distribution systems. This investigation successfully applies 3I0 zero sequence current to solve HIF problems. The feature extraction system based on discrete wavelet transform (DWT) and the feature identification technique found on statistical confidence are then applied to discriminate effectively between the HIFs and the switch operations. Based on continuous wavelet transform (CWT) pattern recognition of HIFs is proposed, also. Staged fault testing results demonstrate that the proposed wavelet based algorithm is feasible performance well.

Keywords: Continuous wavelet transform, discrete wavelet transform, high impedance faults, statistical confidence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2315

2232 Driving Mechanism of Urban Sprawl in Chinese Context from the Perspective of Domestic and Overseas Comparison

Authors: Tingke Wu, Yaping Huang

Abstract:

Many cities in China have been experiencing serious urban sprawl since the 1980s, which pose great challenges to a country with scare cultivated land and huge population. Because of different social and economic context and development stage, driving forces of urban sprawl in China are quite different from developed countries. Therefore, it is of great importance to probe into urban sprawl driving mechanism in Chinese context. By a comparison study of the background and features of urban sprawl between China and developed countries, this research establishes an analytical framework for sprawl dynamic mechanism in China. By literature review and analyzing data from national statistical yearbook, it then probes into the driving mechanism and the primary cause of urban sprawl. The results suggest that population increase, economic growth, traffic and information technology development lead to rapid expansion of urban space; defects of land institution and lack of effective guidance give rise to low efficiency of urban land use. Moreover, urban sprawl is ultimately attributed to imperfections of policy and institution. On this basis, this research puts forward several sprawl control strategies in Chinese context.

Keywords: China, driving forces, driving mechanism, land institution, urban expansion, urban sprawl.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 751

2231 Application of Statistical Approach for Optimizing CMCase Production by Bacillus tequilensis S28 Strain via Submerged Fermentation Using Wheat Bran as Carbon Source

Authors: A. Sharma, R. Tewari, S. K. Soni

Abstract:

Biofuels production has come forth as a future technology to combat the problem of depleting fossil fuels. Bio-based ethanol production from enzymatic lignocellulosic biomass degradation serves an efficient method and catching the eye of scientific community. High cost of the enzyme is the major obstacle in preventing the commercialization of this process. Thus main objective of the present study was to optimize composition of medium components for enhancing cellulase production by newly isolated strain of Bacillus tequilensis. Nineteen factors were taken into account using statistical Plackett-Burman Design. The significant variables influencing the cellulose production were further employed in statistical Response Surface Methodology using Central Composite Design for maximizing cellulase production. The optimum medium composition for cellulase production was: peptone (4.94 g/L), ammonium chloride (4.99 g/L), yeast extract (2.00 g/L), Tween-20 (0.53 g/L), calcium chloride (0.20 g/L) and cobalt chloride (0.60 g/L) with pH 7, agitation speed 150 rpm and 72 h incubation at 37oC. Analysis of variance (ANOVA) revealed high coefficient of determination (R2) of 0.99. Maximum cellulase productivity of 11.5 IU/ml was observed against the model predicted value of 13 IU/ml. This was found to be optimally active at 60oC and pH 5.5.

Keywords: Bacillus tequilensis, CMCase, Submerged Fermentation, Optimization, Plackett-Burman Design, Response Surface Methodology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3053

2230 Secure Image Retrieval Based On Orthogonal Decomposition under Cloud Environment

Authors: Yanyan Xu, Lizhi Xiong, Zhengquan Xu, Li Jiang

Abstract:

In order to protect data privacy, image with sensitive or private information needs to be encrypted before being outsourced to the cloud. However, this causes difficulties in image retrieval and data management. A secure image retrieval method based on orthogonal decomposition is proposed in the paper. The image is divided into two different components, for which encryption and feature extraction are executed separately. As a result, cloud server can extract features from an encrypted image directly and compare them with the features of the queried images, so that the user can thus obtain the image. Different from other methods, the proposed method has no special requirements to encryption algorithms. Experimental results prove that the proposed method can achieve better security and better retrieval precision.

Keywords: Secure image retrieval, secure search, orthogonal decomposition, secure cloud computing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2108

2229 Spurious Crests in Second-Order Waves

Authors: M. A. Tayfun

Abstract:

Occurrences of spurious crests on the troughs of large, relatively steep second-order Stokes waves are anomalous and not an inherent characteristic of real waves. Here, the effects of such occurrences on the statistics described by the standard second-order stochastic model are examined theoretically and by way of simulations. Theoretical results and simulations indicate that when spurious occurrences are sufficiently large, the standard model leads to physically unrealistic surface features and inaccuracies in the statistics of various surface features, in particular, the troughs and thus zero-crossing heights of large waves. Whereas inaccuracies can be fairly noticeable for long-crested waves in both deep and shallower depths, they tend to become relatively insignificant in directional waves.

Keywords: Large waves, non-linear effects, simulation, spectra, spurious crests, Stokes waves, wave breaking, wave statistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1303

2228 Effect of Speed and Torque on Statistical Parameters in Tapered Bearing Fault Detection

Authors: Sylvester A. Aye, Philippus S. Heyns

Abstract:

The effect of the rotational speed and axial torque on the diagnostics of tapered rolling element bearing defects was investigated. The accelerometer was mounted on the bearing housing and connected to Sound and Vibration Analyzer (SVAN 958) and was used to measure the accelerations from the bearing housing. The data obtained from the bearing was processed to detect damage of the bearing using statistical tools and the results were subsequently analyzed to see if bearing damage had been captured. From this study it can be seen that damage is more evident when the bearing is loaded. Also, at the incipient stage of damage the crest factor and kurtosis values are high but as time progresses the crest factors and kurtosis values decrease whereas the peak and RMS values are low at the incipient stage but increase with damage.

Keywords: crest factor, damage detection, kurtosis, RMS, tapered roller bearing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2300

2227 Poli4SDG: An Application for Environmental Crises Management and Gender Support

Authors: Angelica S. Valeriani, Lorenzo Biasiolo

Abstract:

In recent years, the scale of the impact of climate change and its related side effects has become ever more massive and devastating. Sustainable Development Goals (SDGs), promoted by United Nations, aim to front issues related to climate change, among others. In particular, the project CROWD4SDG focuses on a bunch of SDGs, since it promotes environmental activities and climate-related issues. In this context, we developed a prototype of an application, under advanced development considering web design, that focuses on SDG 13 (SDG on climate action) by providing users with useful instruments to face environmental crises and climate-related disasters. Our prototype is thought and structured for both web and mobile development. The main goal of the application, POLI4SDG, is to help users to get through emergency services. To this extent, an organized overview and classification prove to be very effective and helpful to people in need. A careful analysis of data related to environmental crises prompted us to integrate the user contribution, i.e. exploiting a core principle of Citizen Science, into the realization of a public catalog, available for consulting and organized according to typology and specific features. In addition, gender equality and opportunity features are considered in the prototype, in order to allow women, often the most vulnerable category, to have direct support. The overall description of the application functionalities is detailed. Moreover, implementation features and properties of the prototype are discussed.

Keywords: Crowdsourcing, social media, SDG, climate change, natural disasters, gender equality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 664

2226 A Novel SVM-Based OOK Detector in Low SNR Infrared Channels

Authors: J. P. Dubois, O. M. Abdul-Latif

Abstract:

Support Vector Machine (SVM) is a recent class of statistical classification and regression techniques playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM is applied to an infrared (IR) binary communication system with different types of channel models including Ricean multipath fading and partially developed scattering channel with additive white Gaussian noise (AWGN) at the receiver. The structure and performance of SVM in terms of the bit error rate (BER) metric is derived and simulated for these channel stochastic models and the computational complexity of the implementation, in terms of average computational time per bit, is also presented. The performance of SVM is then compared to classical binary signal maximum likelihood detection using a matched filter driven by On-Off keying (OOK) modulation. We found that the performance of SVM is superior to that of the traditional optimal detection schemes used in statistical communication, especially for very low signal-to-noise ratio (SNR) ranges. For large SNR, the performance of the SVM is similar to that of the classical detectors. The implication of these results is that SVM can prove very beneficial to IR communication systems that notoriously suffer from low SNR at the cost of increased computational complexity.

Keywords: Least square-support vector machine, on-off keying, matched filter, maximum likelihood detector, wireless infrared communication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944

2225 Dicotyledon Weed Quantification Algorithm for Selective Herbicide Application in Maize Crops: Statistical Evaluation of the Potential Herbicide Savings

Authors: Morten Stigaard Laursen, Rasmus Nyholm Jørgensen, Henrik Skov Midtiby, Anders Krogh Mortensen, Sanmohan Baby

Abstract:

This work contributes a statistical model and simulation framework yielding the best estimate possible for the potential herbicide reduction when using the MoDiCoVi algorithm all the while requiring a efficacy comparable to conventional spraying. In June 2013 a maize field located in Denmark were seeded. The field was divided into parcels which was assigned to one of two main groups: 1) Control, consisting of subgroups of no spray and full dose spraty; 2) MoDiCoVi algorithm subdivided into five different leaf cover thresholds for spray activation. In addition approximately 25% of the parcels were seeded with additional weeds perpendicular to the maize rows. In total 299 parcels were randomly assigned with the 28 different treatment combinations. In the statistical analysis, bootstrapping was used for balancing the number of replicates. The achieved potential herbicide savings was found to be 70% to 95% depending on the initial weed coverage. However additional field trials covering more seasons and locations are needed to verify the generalisation of these results. There is a potential for further herbicide savings as the time interval between the first and second spraying session was not long enough for the weeds to turn yellow, instead they only stagnated in growth.

Keywords: Weed crop discrimination, macrosprayer, herbicide reduction, site-specific, sprayer-boom.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1037

2224 Analysis of Differences between Public and Experts’ Views Regarding Sustainable Development of Developing Cities: A Case Study in the Iraqi Capital Baghdad

Authors: Marwah Mohsin, Thomas Beach, Alan Kwan, Mahdi Ismail

Abstract:

This paper describes the differences in views on sustainable development between the general public and experts in a developing country, Iraq. This paper will answer the question: How do the views of the public differ from the generally accepted view of experts in the context of sustainable urban development in Iraq? In order to answer this question, the views of both the public and the experts will be analysed. These results are taken from a public survey and a Delphi questionnaire. These will be analysed using statistical methods in order to identify the significant differences. This will enable investigation of the different perceptions between the public perceptions and the experts’ views towards urban sustainable development factors. This is important due to the fact that different viewpoints between policy-makers and the public will impact on the acceptance by the public of any future sustainable development work that is undertaken. The brief findings of the statistical analysis show that the views of both the public and the experts are considered different in most of the variables except six variables show no differences. Those variables are ‘The importance of establishing sustainable cities in Iraq’, ‘Mitigate traffic congestion’, ‘Waste recycling and separating’, ‘Use wastewater recycling’, ‘Parks and green spaces’, and ‘Promote investment’.

Keywords: Urban sustainable development, experts’ views, public views, statistical analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 531

2223 Resolving Dependency Ambiguity of Subordinate Clauses using Support Vector Machines

Authors: Sang-Soo Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

In this paper, we propose a method of resolving dependency ambiguities of Korean subordinate clauses based on Support Vector Machines (SVMs). Dependency analysis of clauses is well known to be one of the most difficult tasks in parsing sentences, especially in Korean. In order to solve this problem, we assume that the dependency relation of Korean subordinate clauses is the dependency relation among verb phrase, verb and endings in the clauses. As a result, this problem is represented as a binary classification task. In order to apply SVMs to this problem, we selected two kinds of features: static and dynamic features. The experimental results on STEP2000 corpus show that our system achieves the accuracy of 73.5%.

Keywords: Dependency analysis, subordinate clauses, binaryclassification, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1584

2222 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: Breast cancer, health diagnosis, Machine Learning, biomarker classification, Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 288

2221 Standard Deviation of Mean and Variance of Rows and Columns of Images for CBIR

Authors: H. B. Kekre, Kavita Patil

Abstract:

This paper describes a novel and effective approach to content-based image retrieval (CBIR) that represents each image in the database by a vector of feature values called “Standard deviation of mean vectors of color distribution of rows and columns of images for CBIR". In many areas of commerce, government, academia, and hospitals, large collections of digital images are being created. This paper describes the approach that uses contents as feature vector for retrieval of similar images. There are several classes of features that are used to specify queries: colour, texture, shape, spatial layout. Colour features are often easily obtained directly from the pixel intensities. In this paper feature extraction is done for the texture descriptor that is 'variance' and 'Variance of Variances'. First standard deviation of each row and column mean is calculated for R, G, and B planes. These six values are obtained for one image which acts as a feature vector. Secondly we calculate variance of the row and column of R, G and B planes of an image. Then six standard deviations of these variance sequences are calculated to form a feature vector of dimension six. We applied our approach to a database of 300 BMP images. We have determined the capability of automatic indexing by analyzing image content: color and texture as features and by applying a similarity measure Euclidean distance.

Keywords: Standard deviation Image retrieval, color distribution, Variance, Variance of Variance, Euclidean distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3733

2220 A Molding Surface Auto-Inspection System

Authors: Ssu-Han Chen, Der-Baau Perng

Abstract:

Molding process in IC manufacturing secures chips against the harms done by hot, moisture or other external forces. While a chip was being molded,defects like cracks, dilapidation, or voids may be embedding on the molding surface. The molding surfaces the study poises to treat and the ones on the market, though, differ in the surface where texture similar to defects is everywhere. Manual inspection usually passes over low-contrast cracks or voids; hence an automatic optical inspection system for molding surface is necessary. The proposed system is consisted of a CCD, a coaxial light, a back light as well as a motion control unit. Based on the property of statistical textures of the molding surface, a series of digital image processing and classification procedure is carried out. After training of the parameter associated with above algorithm, result of the experiment suggests that the accuracy rate is up to 93.75%, contributing to the inspection quality of IC molding surface.

Keywords: Molding surface, machine vision, statistical texture, discrete Fourier transformation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2736

2219 Tidal Data Analysis using ANN

Authors: Ritu Vijay, Rekha Govil

Abstract:

The design of a complete expansion that allows for compact representation of certain relevant classes of signals is a central problem in signal processing applications. Achieving such a representation means knowing the signal features for the purpose of denoising, classification, interpolation and forecasting. Multilayer Neural Networks are relatively a new class of techniques that are mathematically proven to approximate any continuous function arbitrarily well. Radial Basis Function Networks, which make use of Gaussian activation function, are also shown to be a universal approximator. In this age of ever-increasing digitization in the storage, processing, analysis and communication of information, there are numerous examples of applications where one needs to construct a continuously defined function or numerical algorithm to approximate, represent and reconstruct the given discrete data of a signal. Many a times one wishes to manipulate the data in a way that requires information not included explicitly in the data, which is done through interpolation and/or extrapolation. Tidal data are a very perfect example of time series and many statistical techniques have been applied for tidal data analysis and representation. ANN is recent addition to such techniques. In the present paper we describe the time series representation capabilities of a special type of ANN- Radial Basis Function networks and present the results of tidal data representation using RBF. Tidal data analysis & representation is one of the important requirements in marine science for forecasting.

Keywords: ANN, RBF, Tidal Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646

2218 An Improved Fast Search Method Using Histogram Features for DNA Sequence Database

Authors: Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi

Abstract:

In this paper, we propose an efficient hierarchical DNA sequence search method to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search method using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.

Keywords: Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1317

2217 Dynamic Features Selection for Heart Disease Classification

Authors: Walid MOUDANI

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts- knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: Multi-Classifier Decisions Tree, Features Reduction, Dynamic Programming, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2523

2216 Describing Learning Features of Reusable Resources: A Proposal

Authors: Serena Alvino, Paola Forcheri, Maria Grazia Ierardi, Luigi Sarti

Abstract:

One of the main advantages of the LO paradigm is to allow the availability of good quality, shareable learning material through the Web. The effectiveness of the retrieval process requires a formal description of the resources (metadata) that closely fits the user-s search criteria; in spite of the huge international efforts in this field, educational metadata schemata often fail to fulfil this requirement. This work aims to improve the situation, by the definition of a metadata model capturing specific didactic features of shareable learning resources. It classifies LOs into “teacher-oriented" and “student-oriented" categories, in order to describe the role a LO is to play when it is integrated into the educational process. This article describes the model and a first experimental validation process that has been carried out in a controlled environment.

Keywords: Learning object, pedagogical metadata, experimental validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1533

2215 Support Vector Machine for Persian Font Recognition

Authors: A. Borji, M. Hamidi

Abstract:

In this paper we examine the use of global texture analysis based approaches for the purpose of Persian font recognition in machine-printed document images. Most existing methods for font recognition make use of local typographical features and connected component analysis. However derivation of such features is not an easy task. Gabor filters are appropriate tools for texture analysis and are motivated by human visual system. Here we consider document images as textures and use Gabor filter responses for identifying the fonts. The method is content independent and involves no local feature analysis. Two different classifiers Weighted Euclidean Distance and SVM are used for the purpose of classification. Experiments on seven different type faces and four font styles show average accuracy of 85% with WED and 82% with SVM classifier over typefaces

Keywords: Persian font recognition, support vector machine, gabor filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1696

2214 Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu

Abstract:

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.

Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777

2213 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 931