Search results for: multivariate statistical approach
5844 SVM-Based Detection of SAR Images in Partially Developed Speckle Noise
Authors: J. P. Dubois, O. M. Abdul-Latif
Abstract:
Support Vector Machine (SVM) is a statistical learning tool that was initially developed by Vapnik in 1979 and later developed to a more complex concept of structural risk minimization (SRM). SVM is playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM was applied to the detection of SAR (synthetic aperture radar) images in the presence of partially developed speckle noise. The simulation was done for single look and multi-look speckle models to give a complete overlook and insight to the new proposed model of the SVM-based detector. The structure of the SVM was derived and applied to real SAR images and its performance in terms of the mean square error (MSE) metric was calculated. We showed that the SVM-detected SAR images have a very low MSE and are of good quality. The quality of the processed speckled images improved for the multi-look model. Furthermore, the contrast of the SVM detected images was higher than that of the original non-noisy images, indicating that the SVM approach increased the distance between the pixel reflectivity levels (the detection hypotheses) in the original images.Keywords: Least Square-Support Vector Machine, SyntheticAperture Radar. Partially Developed Speckle, Multi-Look Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14945843 Statistical Feature Extraction Method for Wood Species Recognition System
Authors: Mohd Iz'aan Paiz Bin Zamri, Anis Salwa Mohd Khairuddin, Norrima Mokhtar, Rubiyah Yusof
Abstract:
Effective statistical feature extraction and classification are important in image-based automatic inspection and analysis. An automatic wood species recognition system is designed to perform wood inspection at custom checkpoints to avoid mislabeling of timber which will results to loss of income to the timber industry. The system focuses on analyzing the statistical pores properties of the wood images. This paper proposed a fuzzy-based feature extractor which mimics the experts’ knowledge on wood texture to extract the properties of pores distribution from the wood surface texture. The proposed feature extractor consists of two steps namely pores extraction and fuzzy pores management. The total number of statistical features extracted from each wood image is 38 features. Then, a backpropagation neural network is used to classify the wood species based on the statistical features. A comprehensive set of experiments on a database composed of 5200 macroscopic images from 52 tropical wood species was used to evaluate the performance of the proposed feature extractor. The advantage of the proposed feature extraction technique is that it mimics the experts’ interpretation on wood texture which allows human involvement when analyzing the wood texture. Experimental results show the efficiency of the proposed method.Keywords: Classification, fuzzy, inspection system, image analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16965842 A Optimal Subclass Detection Method for Credit Scoring
Authors: Luciano Nieddu, Giuseppe Manfredi, Salvatore D'Acunto, Katia La Regina
Abstract:
In this paper a non-parametric statistical pattern recognition algorithm for the problem of credit scoring will be presented. The proposed algorithm is based on a clustering k- means algorithm and allows for the determination of subclasses of homogenous elements in the data. The algorithm will be tested on two benchmark datasets and its performance compared with other well known pattern recognition algorithm for credit scoring.
Keywords: Constrained clustering, Credit scoring, Statistical pattern recognition, Supervised classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20065841 Advances in Artificial Intelligence Using Speech Recognition
Authors: Khaled M. Alhawiti
Abstract:
This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.Keywords: Speech recognition, acoustic phonetic, artificial intelligence, Hidden Markov Models (HMM), statistical models of speech recognition, human machine performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 79035840 Using Artificial Neural Network to Predict Collisions on Horizontal Tangents of 3D Two-Lane Highways
Authors: Omer F. Cansiz, Said M. Easa
Abstract:
The purpose of this study is mainly to predict collision frequency on the horizontal tangents combined with vertical curves using artificial neural network methods. The proposed ANN models are compared with existing regression models. First, the variables that affect collision frequency were investigated. It was found that only the annual average daily traffic, section length, access density, the rate of vertical curvature, smaller curve radius before and after the tangent were statistically significant according to related combinations. Second, three statistical models (negative binomial, zero inflated Poisson and zero inflated negative binomial) were developed using the significant variables for three alignment combinations. Third, ANN models are developed by applying the same variables for each combination. The results clearly show that the ANN models have the lowest mean square error value than those of the statistical models. Similarly, the AIC values of the ANN models are smaller to those of the regression models for all the combinations. Consequently, the ANN models have better statistical performances than statistical models for estimating collision frequency. The ANN models presented in this paper are recommended for evaluating the safety impacts 3D alignment elements on horizontal tangents.Keywords: Collision frequency, horizontal tangent, 3D two-lane highway, negative binomial, zero inflated Poisson, artificial neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16025839 Statistical Characteristics of Distribution of Radiation-Induced Defects under Random Generation
Authors: Pavlo Selyshchev
Abstract:
We consider fluctuations of defects density taking into account their interaction. Stochastic field of displacement generation rate gives random defect distribution. We determinate statistical characteristics (mean and dispersion) of random field of point defect distribution as function of defect generation parameters, temperature and properties of irradiated crystal.
Keywords: Irradiation, Primary Defects, Interaction, Fluctuations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17995838 Integrated ACOR/IACOMV-R-SVM Algorithm
Authors: Hiba Basim Alwan, Ku Ruhana Ku-Mahamud
Abstract:
A direction for ACO is to optimize continuous and mixed (discrete and continuous) variables in solving problems with various types of data. Support Vector Machine (SVM), which originates from the statistical approach, is a present day classification technique. The main problems of SVM are selecting feature subset and tuning the parameters. Discretizing the continuous value of the parameters is the most common approach in tuning SVM parameters. This process will result in loss of information which affects the classification accuracy. This paper presents two algorithms that can simultaneously tune SVM parameters and select the feature subset. The first algorithm, ACOR-SVM, will tune SVM parameters, while the second IACOMV-R-SVM algorithm will simultaneously tune SVM parameters and select the feature subset. Three benchmark UCI datasets were used in the experiments to validate the performance of the proposed algorithms. The results show that the proposed algorithms have good performances as compared to other approaches.Keywords: Continuous ant colony optimization, incremental continuous ant colony, simultaneous optimization, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8215837 Real-Time Testing of Steel Strip Welds based on Bayesian Decision Theory
Authors: Julio Molleda, Daniel F. García, Juan C. Granda, Francisco J. Suárez
Abstract:
One of the main trouble in a steel strip manufacturing line is the breakage of whatever weld carried out between steel coils, that are used to produce the continuous strip to be processed. A weld breakage results in a several hours stop of the manufacturing line. In this process the damages caused by the breakage must be repaired. After the reparation and in order to go on with the production it will be necessary a restarting process of the line. For minimizing this problem, a human operator must inspect visually and manually each weld in order to avoid its breakage during the manufacturing process. The work presented in this paper is based on the Bayesian decision theory and it presents an approach to detect, on real-time, steel strip defective welds. This approach is based on quantifying the tradeoffs between various classification decisions using probability and the costs that accompany such decisions.Keywords: Classification, Pattern Recognition, ProbabilisticReasoning, Statistical Data Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13695836 A Practical Approach for Testing the Process Quality
Authors: Mou-Yuan Liao, Chien-Wei Wu, Chien-Hua Lin
Abstract:
Process capability index Cpk is the most widely used index in making managerial decisions since it provides bounds on the process yield for normally distributed processes. However, existent methods for assessing process performance which constructed by statistical inference may unfortunately lead to fine results, because uncertainties exist in most real-world applications. Thus, this study adopts fuzzy inference to deal with testing of Cpk . A brief score is obtained for assessing a supplier’s process instead of a severe evaluation.
Keywords: Process capability analysis, quality control.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13855835 A Machine Learning Approach for Anomaly Detection in Environmental IoT-Driven Wastewater Purification Systems
Authors: Giovanni Cicceri, Roberta Maisano, Nathalie Morey, Salvatore Distefano
Abstract:
The main goal of this paper is to present a solution for a water purification system based on an Environmental Internet of Things (EIoT) platform to monitor and control water quality and machine learning (ML) models to support decision making and speed up the processes of purification of water. A real case study has been implemented by deploying an EIoT platform and a network of devices, called Gramb meters and belonging to the Gramb project, on wastewater purification systems located in Calabria, south of Italy. The data thus collected are used to control the wastewater quality, detect anomalies and predict the behaviour of the purification system. To this extent, three different statistical and machine learning models have been adopted and thus compared: Autoregressive Integrated Moving Average (ARIMA), Long Short Term Memory (LSTM) autoencoder, and Facebook Prophet (FP). The results demonstrated that the ML solution (LSTM) out-perform classical statistical approaches (ARIMA, FP), in terms of both accuracy, efficiency and effectiveness in monitoring and controlling the wastewater purification processes.Keywords: EIoT, machine learning, anomaly detection, environment monitoring.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9385834 Manufacturing Dispersions Based Simulation and Synthesis of Design Tolerances
Authors: Nassima Cheikh, Abdelmadjid Cheikh, Said Hamou
Abstract:
The objective of this work which is based on the approach of simultaneous engineering is to contribute to the development of a CIM tool for the synthesis of functional design dimensions expressed by average values and tolerance intervals. In this paper, the dispersions method known as the Δl method which proved reliable in the simulation of manufacturing dimensions is used to develop a methodology for the automation of the simulation. This methodology is constructed around three procedures. The first procedure executes the verification of the functional requirements by automatically extracting the functional dimension chains in the mechanical sub-assembly. Then a second procedure performs an optimization of the dispersions on the basis of unknown variables. The third procedure uses the optimized values of the dispersions to compute the optimized average values and tolerances of the functional dimensions in the chains. A statistical and cost based approach is integrated in the methodology in order to take account of the capabilities of the manufacturing processes and to distribute optimal values among the individual components of the chains.Keywords: functional tolerances, manufacturing dispersions, simulation, CIM.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14285833 Wheat Yield Prediction through Agro Meteorological Indices for Ardebil District
Authors: Fariba Esfandiary, Ghafoor Aghaie, Ali Dolati Mehr
Abstract:
Wheat prediction was carried out using different meteorological variables together with agro meteorological indices in Ardebil district for the years 2004-2005 & 2005–2006. On the basis of correlation coefficients, standard error of estimate as well as relative deviation of predicted yield from actual yield using different statistical models, the best subset of agro meteorological indices were selected including daily minimum temperature (Tmin), accumulated difference of maximum & minimum temperatures (TD), growing degree days (GDD), accumulated water vapor pressure deficit (VPD), sunshine hours (SH) & potential evapotranspiration (PET). Yield prediction was done two months in advance before harvesting time which was coincide with commencement of reproductive stage of wheat (5th of June). It revealed that in the final statistical models, 83% of wheat yield variability was accounted for variation in above agro meteorological indices.
Keywords: Wheat yields prediction, agro meteorological indices, statistical models
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20965832 A Prototype of Augmented Reality for Visualising Large Sensors’ Datasets
Authors: Folorunso Olufemi Ayinde, Mohd Shahrizal Sunar, Sarudin Kari, Dzulkifli Mohamad
Abstract:
In this paper we discuss the development of an Augmented Reality (AR) - based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations. Therefore we have developed a data model to effectively manage such data and enhance the computational support needed for the effective data explorations. A challenge of this approach is to reduce the data inefficiency powered by the disparate, repeated, inconsistent and missing attributes of most available sensors datasets. To handle this challenge, this paper aim to develop an AR-based scientific visualization interface which automatically identifies, localise and visualizes all necessary data relevant to a particularly selected region of interest (ROI) along the virtual pipeline network. Necessary system architectural supports needed as well as the interface requirements for such visualizations are also discussed in this paper.
Keywords: Sensor Leakages Datasets, Augmented Reality, Sensor Data-Model, Scientific Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16185831 Lean Models Classification: Towards a Holistic View
Authors: Y. Tiamaz, N. Souissi
Abstract:
The purpose of this paper is to present a classification of Lean models which aims to capture all the concepts related to this approach and thus facilitate its implementation. This classification allows the identification of the most relevant models according to several dimensions. From this perspective, we present a review and an analysis of Lean models literature and we propose dimensions for the classification of the current proposals while respecting among others the axes of the Lean approach, the maturity of the models as well as their application domains. This classification allowed us to conclude that researchers essentially consider the Lean approach as a toolbox also they design their models to solve problems related to a specific environment. Since Lean approach is no longer intended only for the automotive sector where it was invented, but to all fields (IT, Hospital, ...), we consider that this approach requires a generic model that is capable of being implemented in all areas.
Keywords: Lean approach, lean models, classification, dimensions, holistic view.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12045830 Coverage Probability Analysis of WiMAX Network under Additive White Gaussian Noise and Predicted Empirical Path Loss Model
Authors: Chaudhuri Manoj Kumar Swain, Susmita Das
Abstract:
This paper explores a detailed procedure of predicting a path loss (PL) model and its application in estimating the coverage probability in a WiMAX network. For this a hybrid approach is followed in predicting an empirical PL model of a 2.65 GHz WiMAX network deployed in a suburban environment. Data collection, statistical analysis, and regression analysis are the phases of operations incorporated in this approach and the importance of each of these phases has been discussed properly. The procedure of collecting data such as received signal strength indicator (RSSI) through experimental set up is demonstrated. From the collected data set, empirical PL and RSSI models are predicted with regression technique. Furthermore, with the aid of the predicted PL model, essential parameters such as PL exponent as well as the coverage probability of the network are evaluated. This research work may assist in the process of deployment and optimisation of any cellular network significantly.
Keywords: WiMAX, RSSI, path loss, coverage probability, regression analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6445829 Neural Networks: From Black Box towards Transparent Box Application to Evapotranspiration Modeling
Authors: A. Johannet, B. Vayssade, D. Bertin
Abstract:
Neural networks are well known for their ability to model non linear functions, but as statistical methods usually does, they use a no parametric approach thus, a priori knowledge is not obvious to be taken into account no more than the a posteriori knowledge. In order to deal with these problematics, an original way to encode the knowledge inside the architecture is proposed. This method is applied to the problem of the evapotranspiration inside karstic aquifer which is a problem of huge utility in order to deal with water resource.Keywords: Neural-Networks, Hydrology, Evapotranpiration, Hidden Function Modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17545828 Investigation of the Main Trends of Tourist Expenses in Georgia
Authors: Nino Abesadze, Marine Mindorashvili, Nino Paresashvili
Abstract:
The main purpose of the article is to make complex statistical analysis of tourist expenses of foreign visitors. We used mixed technique of selection that implies rules of random and proportional selection. Computer software SPSS was used to compute statistical data for corresponding analysis. Corresponding methodology of tourism statistics was implemented according to international standards. Important information was collected and grouped from the major Georgian airports. Techniques of statistical observation were prepared. A representative population of foreign visitors and a rule of selection of respondents were determined. We have a trend of growth of tourist numbers and share of tourists from post-soviet countries constantly increases. Level of satisfaction with tourist facilities and quality of service has grown, but still we have a problem of disparity between quality of service and prices. The design of tourist expenses of foreign visitors is diverse; competitiveness of tourist products of Georgian tourist companies is higher.
Keywords: Tourist, expenses, methods, statistics, analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9005827 A New Approach for Flexible Document Categorization
Authors: Jebari Chaker, Ounelli Habib
Abstract:
In this paper we propose a new approach for flexible document categorization according to the document type or genre instead of topic. Our approach implements two homogenous classifiers: contextual classifier and logical classifier. The contextual classifier is based on the document URL, whereas, the logical classifier use the logical structure of the document to perform the categorization. The final categorization is obtained by combining contextual and logical categorizations. In our approach, each document is assigned to all predefined categories with different membership degrees. Our experiments demonstrate that our approach is best than other genre categorization approaches.
Keywords: Categorization, combination, flexible, logicalstructure, genre, category, URL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14365826 A Time-Reducible Approach to Compute Determinant |I-X|
Authors: Wang Xingbo
Abstract:
Computation of determinant in the form |I-X| is primary and fundamental because it can help to compute many other determinants. This article puts forward a time-reducible approach to compute determinant |I-X|. The approach is derived from the Newton’s identity and its time complexity is no more than that to compute the eigenvalues of the square matrix X. Mathematical deductions and numerical example are presented in detail for the approach. By comparison with classical approaches the new approach is proved to be superior to the classical ones and it can naturally reduce the computational time with the improvement of efficiency to compute eigenvalues of the square matrix.Keywords: Algorithm, determinant, computation, eigenvalue, time complexity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11115825 The Application of the Queuing Theory in the Traffic Flow of Intersection
Authors: Shuguo Yang, Xiaoyan Yang
Abstract:
It is practically significant to research the traffic flow of intersection because the capacity of intersection affects the efficiency of highway network directly. This paper analyzes the traffic conditions of an intersection in certain urban by the methods of queuing theory and statistical experiment, sets up a corresponding mathematical model and compares it with the actual values. The result shows that queuing theory is applied in the study of intersection traffic flow and it can provide references for the other similar designs.
Keywords: Intersection, Queuing theory, Statistical experiment, System metrics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 74785824 Monte Carlo Analysis and Fuzzy Sets for Uncertainty Propagation in SIS Performance Assessment
Authors: Fares Innal, Yves Dutuit, Mourad Chebila
Abstract:
The object of this work is the probabilistic performance evaluation of safety instrumented systems (SIS), i.e. the average probability of dangerous failure on demand (PFDavg) and the average frequency of failure (PFH), taking into account the uncertainties related to the different parameters that come into play: failure rate (λ), common cause failure proportion (β), diagnostic coverage (DC)... This leads to an accurate and safe assessment of the safety integrity level (SIL) inherent to the safety function performed by such systems. This aim is in keeping with the requirement of the IEC 61508 standard with respect to handling uncertainty. To do this, we propose an approach that combines (1) Monte Carlo simulation and (2) fuzzy sets. Indeed, the first method is appropriate where representative statistical data are available (using pdf of the relating parameters), while the latter applies in the case characterized by vague and subjective information (using membership function). The proposed approach is fully supported with a suitable computer code.
Keywords: Fuzzy sets, Monte Carlo simulation, Safety instrumented system, Safety integrity level.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27365823 Asymmetrical Informative Estimation for Macroeconomic Model: Special Case in the Tourism Sector of Thailand
Authors: Chukiat Chaiboonsri, Satawat Wannapan
Abstract:
This paper used an asymmetric informative concept to apply in the macroeconomic model estimation of the tourism sector in Thailand. The variables used to statistically analyze are Thailand international and domestic tourism revenues, the expenditures of foreign and domestic tourists, service investments by private sectors, service investments by the government of Thailand, Thailand service imports and exports, and net service income transfers. All of data is a time-series index which was observed between 2002 and 2015. Empirically, the tourism multiplier and accelerator were estimated by two statistical approaches. The first was the result of the Generalized Method of Moments model (GMM) based on the assumption which the tourism market in Thailand had perfect information (Symmetrical data). The second was the result of the Maximum Entropy Bootstrapping approach (MEboot) based on the process that attempted to deal with imperfect information and reduced uncertainty in data observations (Asymmetrical data). In addition, the tourism leakages were investigated by a simple model based on the injections and leakages concept. The empirical findings represented the parameters computed from the MEboot approach which is different from the GMM method. However, both of the MEboot estimation and GMM model suggests that Thailand’s tourism sectors are in a period capable of stimulating the economy.
Keywords: Thailand tourism, maximum entropy bootstrapping approach, macroeconomic model, asymmetric information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12075822 Posture Recognition using Combined Statistical and Geometrical Feature Vectors based on SVM
Authors: Omer Rashid, Ayoub Al-Hamadi, Axel Panning, Bernd Michaelis
Abstract:
It is hard to percept the interaction process with machines when visual information is not available. In this paper, we have addressed this issue to provide interaction through visual techniques. Posture recognition is done for American Sign Language to recognize static alphabets and numbers. 3D information is exploited to obtain segmentation of hands and face using normal Gaussian distribution and depth information. Features for posture recognition are computed using statistical and geometrical properties which are translation, rotation and scale invariant. Hu-Moment as statistical features and; circularity and rectangularity as geometrical features are incorporated to build the feature vectors. These feature vectors are used to train SVM for classification that recognizes static alphabets and numbers. For the alphabets, curvature analysis is carried out to reduce the misclassifications. The experimental results show that proposed system recognizes posture symbols by achieving recognition rate of 98.65% and 98.6% for ASL alphabets and numbers respectively.Keywords: Feature Extraction, Posture Recognition, Pattern Recognition, Application.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14785821 A File Splitting Technique for Reducing the Entropy of Text Files
Authors: Abdel-Rahman M. Jaradat, , Mansour I. Irshid, Talha T. Nassar
Abstract:
A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.
Keywords: Bit-wise compression, entropy, file splitting, source mapping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13975820 An Adequate Choice of Initial Sample Size for Selection Approach
Authors: Mohammad H. Almomani, Rosmanjawati Abdul Rahman
Abstract:
In this paper, we consider the effect of the initial sample size on the performance of a sequential approach that used in selecting a good enough simulated system, when the number of alternatives is very large. We implement a sequential approach on M=M=1 queuing system under some parameter settings, with a different choice of the initial sample sizes to explore the impacts on the performance of this approach. The results show that the choice of the initial sample size does affect the performance of our selection approach.Keywords: Ranking and Selection, Ordinal Optimization, Optimal Computing Budget Allocation, Subset Selection, Indifference-Zone, Initial Sample Size.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12575819 The Usefulness of Logical Structure in Flexible Document Categorization
Authors: Jebari Chaker, Ounalli Habib
Abstract:
This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, ...). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is dynamically modified at each new document categorization. The experimentation of the proposed approach provides satisfactory results.Keywords: categorization rule, document categorization, flexible categorization, logical structure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12085818 Isolation and Classification of Red Blood Cells in Anemic Microscopic Images
Authors: Jameela Ali Alkrimi, Loay E. George, Azizah Suliman, Abdul Rahim Ahmad, Karim Al-Jashamy
Abstract:
Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. Anemia is a lack of RBCs is characterized by its level compared to the normal hemoglobin level. In this study, a system based image processing methodology was developed to localize and extract RBCs from microscopic images. Also, the machine learning approach is adopted to classify the localized anemic RBCs images. Several textural and geometrical features are calculated for each extracted RBCs. The training set of features was analyzed using principal component analysis (PCA). With the proposed method, RBCs were isolated in 4.3secondsfrom an image containing 18 to 27 cells. The reasons behind using PCA are its low computation complexity and suitability to find the most discriminating features which can lead to accurate classification decisions. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network RBFNN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained within short time period, and the results became better when PCA was used.
Keywords: Red blood cells, pre-processing image algorithms, classification algorithms, principal component analysis PCA, confusion matrix, kappa statistical parameters, ROC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31575817 AudioMine: Medical Data Mining in Heterogeneous Audiology Records
Authors: Shaun Cox, Michael Oakes, Stefan Wermter, Maurice Hawthorne
Abstract:
We report on the results of a pilot study in which a data-mining tool was developed for mining audiology records. The records were heterogeneous in that they contained numeric, category and textual data. The tools developed are designed to observe associations between any field in the records and any other field. The techniques employed were the statistical chi-squared test, and the use of self-organizing maps, an unsupervised neural learning approach.
Keywords: Audiology, data mining, chi-squared, self-organizing maps
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16205816 Earthquake Classification in Molluca Collision Zone Using Conventional Statistical Methods
Authors: H. J. Wattimanela, U. S. Passaribu, N. T. Puspito, S. W. Indratno
Abstract:
Molluca Collision Zone is located at the junction of the Eurasian, Australian, Pacific and the Philippines plates. Between the Sangihe arc, west of the collision zone, and to the east of Halmahera arc is active collision and convex toward the Molluca Sea. This research will analyze the behavior of earthquake occurrence in Molluca Collision Zone related to the distributions of an earthquake in each partition regions, determining the type of distribution of a occurrence earthquake of partition regions, and the mean occurence of earthquakes each partition regions, and the correlation between the partitions region. We calculate number of earthquakes using partition method and its behavioral using conventional statistical methods. In this research, we used data of shallow earthquakes type and its magnitudes ≥4 SR (period 1964-2013). From the results, we can classify partitioned regions based on the correlation into two classes: strong and very strong. This classification can be used for early warning system in disaster management.
Keywords: Molluca Collision Zone, partition regions, conventional statistical methods, Earthquakes, classifications, disaster management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19385815 Clustering Categorical Data Using Hierarchies (CLUCDUH)
Authors: Gökhan Silahtaroğlu
Abstract:
Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).Keywords: Clustering, tree, split, pruning, entropy, gini.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502