Search results for: statistical methods
4913 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features
Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim
Abstract:
The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.
Keywords: Data mining, Korean linguistic feature, literary fiction, relationship extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17954912 On Preprocessing of Speech Signals
Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin
Abstract:
Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.
Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17104911 Comparison of Neural Network and Logistic Regression Methods to Predict Xerostomia after Radiotherapy
Authors: Hui-Min Ting, Tsair-Fwu Lee, Ming-Yuan Cho, Pei-Ju Chao, Chun-Ming Chang, Long-Chang Chen, Fu-Min Fang
Abstract:
To evaluate the ability to predict xerostomia after radiotherapy, we constructed and compared neural network and logistic regression models. In this study, 61 patients who completed a questionnaire about their quality of life (QoL) before and after a full course of radiation therapy were included. Based on this questionnaire, some statistical data about the condition of the patients’ salivary glands were obtained, and these subjects were included as the inputs of the neural network and logistic regression models in order to predict the probability of xerostomia. Seven variables were then selected from the statistical data according to Cramer’s V and point-biserial correlation values and were trained by each model to obtain the respective outputs which were 0.88 and 0.89 for AUC, 9.20 and 7.65 for SSE, and 13.7% and 19.0% for MAPE, respectively. These parameters demonstrate that both neural network and logistic regression methods are effective for predicting conditions of parotid glands.
Keywords: NPC, ANN, logistic regression, xerostomia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16374910 Novel Adaptive Channel Equalization Algorithms by Statistical Sampling
Authors: János Levendovszky, András Oláh
Abstract:
In this paper, novel statistical sampling based equalization techniques and CNN based detection are proposed to increase the spectral efficiency of multiuser communication systems over fading channels. Multiuser communication combined with selective fading can result in interferences which severely deteriorate the quality of service in wireless data transmission (e.g. CDMA in mobile communication). The paper introduces new equalization methods to combat interferences by minimizing the Bit Error Rate (BER) as a function of the equalizer coefficients. This provides higher performance than the traditional Minimum Mean Square Error equalization. Since the calculation of BER as a function of the equalizer coefficients is of exponential complexity, statistical sampling methods are proposed to approximate the gradient which yields fast equalization and superior performance to the traditional algorithms. Efficient estimation of the gradient is achieved by using stratified sampling and the Li-Silvester bounds. A simple mechanism is derived to identify the dominant samples in real-time, for the sake of efficient estimation. The equalizer weights are adapted recursively by minimizing the estimated BER. The near-optimal performance of the new algorithms is also demonstrated by extensive simulations. The paper has also developed a (Cellular Neural Network) CNN based approach to detection. In this case fast quadratic optimization has been carried out by t, whereas the task of equalizer is to ensure the required template structure (sparseness) for the CNN. The performance of the method has also been analyzed by simulations.
Keywords: Cellular Neural Network, channel equalization, communication over fading channels, multiuser communication, spectral efficiency, statistical sampling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15204909 Metrology-Inspired Methods to Assess the Biases of Artificial Intelligence Systems
Authors: Belkacem Laimouche
Abstract:
With the field of Artificial Intelligence (AI) experiencing exponential growth, fueled by technological advancements that pave the way for increasingly innovative and promising applications, there is an escalating need to develop rigorous methods for assessing their performance in pursuit of transparency and equity. This article proposes a metrology-inspired statistical framework for evaluating bias and explainability in AI systems. Drawing from the principles of metrology, we propose a pioneering approach, using a concrete example, to evaluate the accuracy and precision of AI models, as well as to quantify the sources of measurement uncertainty that can lead to bias in their predictions. Furthermore, we explore a statistical approach for evaluating the explainability of AI systems based on their ability to provide interpretable and transparent explanations of their predictions.
Keywords: Artificial intelligence, metrology, measurement uncertainty, prediction error, bias, machine learning algorithms, probabilistic models, inter-laboratory comparison, data analysis, data reliability, bias impact assessment, bias measurement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1444908 Investigation of the Main Trends of Tourist Expenses in Georgia
Authors: Nino Abesadze, Marine Mindorashvili, Nino Paresashvili
Abstract:
The main purpose of the article is to make complex statistical analysis of tourist expenses of foreign visitors. We used mixed technique of selection that implies rules of random and proportional selection. Computer software SPSS was used to compute statistical data for corresponding analysis. Corresponding methodology of tourism statistics was implemented according to international standards. Important information was collected and grouped from the major Georgian airports. Techniques of statistical observation were prepared. A representative population of foreign visitors and a rule of selection of respondents were determined. We have a trend of growth of tourist numbers and share of tourists from post-soviet countries constantly increases. Level of satisfaction with tourist facilities and quality of service has grown, but still we have a problem of disparity between quality of service and prices. The design of tourist expenses of foreign visitors is diverse; competitiveness of tourist products of Georgian tourist companies is higher.
Keywords: Tourist, expenses, methods, statistics, analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9494907 Data Mining on the Router Logs for Statistical Application Classification
Authors: M. Rahmati, S.M. Mirzababaei
Abstract:
With the advance of information technology in the new era the applications of Internet to access data resources has steadily increased and huge amount of data have become accessible in various forms. Obviously, the network providers and agencies, look after to prevent electronic attacks that may be harmful or may be related to terrorist applications. Thus, these have facilitated the authorities to under take a variety of methods to protect the special regions from harmful data. One of the most important approaches is to use firewall in the network facilities. The main objectives of firewalls are to stop the transfer of suspicious packets in several ways. However because of its blind packet stopping, high process power requirements and expensive prices some of the providers are reluctant to use the firewall. In this paper we proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. By discriminating these data, an administrator may take an approach action against the user. This method is very fast and can be used simply in adjacent with the Internet routers.Keywords: Data Mining, Firewall, Optimization, Packetclassification, Statistical Pattern Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16554906 A Usability Testing Approach to Evaluate User-Interfaces in Business Administration
Authors: Salaheddin Odeh, Ibrahim O. Adwan
Abstract:
This interdisciplinary study is an investigation to evaluate user-interfaces in business administration. The study is going to be implemented on two computerized business administration systems with two distinctive user-interfaces, so that differences between the two systems can be determined. Both systems, a commercial and a prototype developed for the purpose of this study, deal with ordering of supplies, tendering procedures, issuing purchase orders, controlling the movement of the stocks against their actual balances on the shelves and editing them on their tabulations. In the second suggested system, modern computer graphics and multimedia issues were taken into consideration to cover the drawbacks of the first system. To highlight differences between the two investigated systems regarding some chosen standard quality criteria, the study employs various statistical techniques and methods to evaluate the users- interaction with both systems. The study variables are divided into two divisions: independent representing the interfaces of the two systems, and dependent embracing efficiency, effectiveness, satisfaction, error rate etc.
Keywords: Evaluation and usability testing, software prototyping, statistical methods, user-interface design.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14664905 Using Artificial Neural Network to Predict Collisions on Horizontal Tangents of 3D Two-Lane Highways
Authors: Omer F. Cansiz, Said M. Easa
Abstract:
The purpose of this study is mainly to predict collision frequency on the horizontal tangents combined with vertical curves using artificial neural network methods. The proposed ANN models are compared with existing regression models. First, the variables that affect collision frequency were investigated. It was found that only the annual average daily traffic, section length, access density, the rate of vertical curvature, smaller curve radius before and after the tangent were statistically significant according to related combinations. Second, three statistical models (negative binomial, zero inflated Poisson and zero inflated negative binomial) were developed using the significant variables for three alignment combinations. Third, ANN models are developed by applying the same variables for each combination. The results clearly show that the ANN models have the lowest mean square error value than those of the statistical models. Similarly, the AIC values of the ANN models are smaller to those of the regression models for all the combinations. Consequently, the ANN models have better statistical performances than statistical models for estimating collision frequency. The ANN models presented in this paper are recommended for evaluating the safety impacts 3D alignment elements on horizontal tangents.Keywords: Collision frequency, horizontal tangent, 3D two-lane highway, negative binomial, zero inflated Poisson, artificial neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16374904 A Brief Study about Nonparametric Adherence Tests
Authors: Vinicius R. Domingues, Luan C. S. M. Ozelim
Abstract:
The statistical study has become indispensable for various fields of knowledge. Not any different, in Geotechnics the study of probabilistic and statistical methods has gained power considering its use in characterizing the uncertainties inherent in soil properties. One of the situations where engineers are constantly faced is the definition of a probability distribution that represents significantly the sampled data. To be able to discard bad distributions, goodness-of-fit tests are necessary. In this paper, three non-parametric goodness-of-fit tests are applied to a data set computationally generated to test the goodness-of-fit of them to a series of known distributions. It is shown that the use of normal distribution does not always provide satisfactory results regarding physical and behavioral representation of the modeled parameters.Keywords: Kolmogorov-Smirnov, Anderson-Darling, Cramer-Von-Mises, Nonparametric adherence tests.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18434903 An Approach to Correlate the Statistical-Based Lorenz Method, as a Way of Measuring Heterogeneity, with Kozeny-Carman Equation
Authors: H. Khanfari, M. Johari Fard
Abstract:
Dealing with carbonate reservoirs can be mind-boggling for the reservoir engineers due to various digenetic processes that cause a variety of properties through the reservoir. A good estimation of the reservoir heterogeneity which is defined as the quality of variation in rock properties with location in a reservoir or formation, can better help modeling the reservoir and thus can offer better understanding of the behavior of that reservoir. Most of reservoirs are heterogeneous formations whose mineralogy, organic content, natural fractures, and other properties vary from place to place. Over years, reservoir engineers have tried to establish methods to describe the heterogeneity, because heterogeneity is important in modeling the reservoir flow and in well testing. Geological methods are used to describe the variations in the rock properties because of the similarities of environments in which different beds have deposited in. To illustrate the heterogeneity of a reservoir vertically, two methods are generally used in petroleum work: Dykstra-Parsons permeability variations (V) and Lorenz coefficient (L) that are reviewed briefly in this paper. The concept of Lorenz is based on statistics and has been used in petroleum from that point of view. In this paper, we correlated the statistical-based Lorenz method to a petroleum concept, i.e. Kozeny-Carman equation and derived the straight line plot of Lorenz graph for a homogeneous system. Finally, we applied the two methods on a heterogeneous field in South Iran and discussed each, separately, with numbers and figures. As expected, these methods show great departure from homogeneity. Therefore, for future investment, the reservoir needs to be treated carefully.
Keywords: Carbonate reservoirs, heterogeneity, homogeneous system, Dykstra-Parsons permeability variations (V), Lorenz coefficient (L).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17914902 Evaluation of Clustering Based on Preprocessing in Gene Expression Data
Authors: Seo Young Kim, Toshimitsu Hamasaki
Abstract:
Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.
Keywords: Gene expression, clustering, data preprocessing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17404901 Informal Inferential Reasoning Using a Modelling Approach within a Computer-Based Simulation
Authors: Theodosia Prodromou
Abstract:
The article investigates how 14- to 15- year-olds build informal conceptions of inferential statistics as they engage in a modelling process and build their own computer simulations with dynamic statistical software. This study proposes four primary phases of informal inferential reasoning for the students in the statistical modeling and simulation process. Findings show shifts in the conceptual structures across the four phases and point to the potential of all of these phases for fostering the development of students- robust knowledge of the logic of inference when using computer based simulations to model and investigate statistical questions.
Keywords: Inferential reasoning, learning, modelling, statistical inference, simulation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14744900 Classification Control for Discrimination between Interictal Epileptic and Non – Epileptic Pathological EEG Events
Authors: Sozon H. Papavlasopoulos, Marios S. Poulos, George D. Bokos, Angelos M. Evangelou
Abstract:
In this study, the problem of discriminating between interictal epileptic and non- epileptic pathological EEG cases, which present episodic loss of consciousness, investigated. We verify the accuracy of the feature extraction method of autocross-correlated coefficients which extracted and studied in previous study. For this purpose we used in one hand a suitable constructed artificial supervised LVQ1 neural network and in other a cross-correlation technique. To enforce the above verification we used a statistical procedure which based on a chi- square control. The classification and the statistical results showed that the proposed feature extraction is a significant accurate method for diagnostic discrimination cases between interictal and non-interictal EEG events and specifically the classification procedure showed that the LVQ neural method is superior than the cross-correlation one.
Keywords: Cross-Correlation Methods, Diagnostic Test, Interictal Epileptic, LVQ1 neural network, Auto-Cross-Correlation Methods, chi-square test.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15194899 Visual-Graphical Methods for Exploring Longitudinal Data
Authors: H. W. Ker
Abstract:
Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20954898 Quantum Statistical Mechanical Formulations of Three-Body Problems via Non-Local Potentials
Authors: A. Maghari, V. H. Maleki
Abstract:
In this paper, we present a quantum statistical mechanical formulation from our recently analytical expressions for partial-wave transition matrix of a three-particle system. We report the quantum reactive cross sections for three-body scattering processes 1+(2,3)→1+(2,3) as well as recombination 1+(2,3)→1+(3,1) between one atom and a weakly-bound dimer. The analytical expressions of three-particle transition matrices and their corresponding cross-sections were obtained from the threedimensional Faddeev equations subjected to the rank-two non-local separable potentials of the generalized Yamaguchi form. The equilibrium quantum statistical mechanical properties such partition function and equation of state as well as non-equilibrium quantum statistical properties such as transport cross-sections and their corresponding transport collision integrals were formulated analytically. This leads to obtain the transport properties, such as viscosity and diffusion coefficient of a moderate dense gas.Keywords: Statistical mechanics, Nonlocal separable potential, three-body interaction, Faddeev equations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21204897 Statistical Computational of Volatility in Financial Time Series Data
Authors: S. Al Wadi, Mohd Tahir Ismail, Samsul Ariffin Abdul Karim
Abstract:
It is well known that during the developments in the economic sector and through the financial crises occur everywhere in the whole world, volatility measurement is the most important concept in financial time series. Therefore in this paper we discuss the volatility for Amman stocks market (Jordan) for certain period of time. Since wavelet transform is one of the most famous filtering methods and grows up very quickly in the last decade, we compare this method with the traditional technique, Fast Fourier transform to decide the best method for analyzing the volatility. The comparison will be done on some of the statistical properties by using Matlab program.Keywords: Fast Fourier transforms, Haar wavelet transform, Matlab (Wavelet tools), stocks market, Volatility.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23184896 Statistical (Radio) Path Loss Modelling: For RF Propagations within localized Indoor and Outdoor Environments of the Academic Building of INTI University College (Laureate International Universities)
Authors: Emmanuel O.O. Ojakominor, Tian F. Lai
Abstract:
A handful of propagation textbooks that discuss radio frequency (RF) propagation models merely list out the models and perhaps discuss them rather briefly; this may well be frustrating for the potential first time modeller who's got no idea on how these models could have been derived. This paper fundamentally provides an overture in modelling the radio channel. Explicitly, for the modelling practice discussed here, signal strength field measurements had to be conducted beforehand (this was done at 469 MHz); to be precise, this paper primarily concerns empirically/statistically modelling the radio channel, and thus provides results obtained from empirically modelling the environments in question. This paper, on the whole, proposes three propagation models, corresponding to three experimented environments. Perceptibly, the models have been derived by way of making the most use of statistical measures. Generally speaking, the first two models were derived via simple linear regression analysis, whereas the third have been originated using multiple regression analysis (with five various predictors). Additionally, as implied by the title of this paper, both indoor and outdoor environments have been experimented; however, (somewhat) two of the environments are neither entirely indoor nor entirely outdoor. The other environment, however, is completely indoor.
Keywords: RF propagation, radio channel modelling, statistical methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24344895 Statistical Analysis-Driven Risk Assessment of Criteria Air Pollutants: A Sulfur Dioxide Case Study
Authors: Ehsan Bashiri
Abstract:
A 7-step method (with 25 sub-steps) to assess risk of air pollutants is introduced. These steps are: pre-considerations, sampling, statistical analysis, exposure matrix and likelihood, doseresponse matrix and likelihood, total risk evaluation, and discussion of findings. All mentioned words and expressions are wellunderstood; however, almost all steps have been modified, improved, and coupled in such a way that a comprehensive method has been prepared. Accordingly, the SADRA (Statistical Analysis-Driven Risk Assessment) emphasizes extensive and ongoing application of analytical statistics in traditional risk assessment models. A Sulfur Dioxide case study validates the claim and provides a good illustration for this method.Keywords: Criteria air pollutants, Matrix of risk, Riskassessment, Statistical analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17064894 Predicting Automotive Interior Noise Including Wind Noise by Statistical Energy Analysis
Authors: Yoshio Kurosawa
Abstract:
The applications of soundproof materials for reduction of high frequency automobile interior noise have been researched. This paper presents a sound pressure prediction technique including wind noise by Hybrid Statistical Energy Analysis (HSEA) in order to reduce weight of acoustic insulations. HSEA uses both analytical SEA and experimental SEA. As a result of chassis dynamo test and road test, the validity of SEA modeling was shown, and utility of the method was confirmed.
Keywords: Vibration, noise, car, statistical energy analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15774893 Automated Process Quality Monitoring with Prediction of Fault Condition Using Measurement Data
Authors: Hyun-Woo Cho
Abstract:
Detection of incipient abnormal events is important to improve safety and reliability of machine operations and reduce losses caused by failures. Improper set-ups or aligning of parts often leads to severe problems in many machines. The construction of prediction models for predicting faulty conditions is quite essential in making decisions on when to perform machine maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of machine measurement data. The calibration model is used to predict two faulty conditions from historical reference data. This approach utilizes genetic algorithms (GA) based variable selection, and we evaluate the predictive performance of several prediction methods using real data. The results shows that the calibration model based on supervised probabilistic principal component analysis (SPPCA) yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.Keywords: Prediction, operation monitoring, on-line data, nonlinear statistical methods, empirical model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16584892 Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model
Authors: Selvam M, Natarajan. A M, Thangarajan R
Abstract:
Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.Keywords: Hybrid Language Model, Immediate Head Parsing, Lexicalized and Statistical Parsing, Natural Language Processing, Parts of Speech, Probabilistic Context Free Grammar, Tamil Language, Tree Bank.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36444891 Analysis of Web User Identification Methods
Authors: Renáta Iváncsy, Sándor Juhász
Abstract:
Web usage mining has become a popular research area, as a huge amount of data is available online. These data can be used for several purposes, such as web personalization, web structure enhancement, web navigation prediction etc. However, the raw log files are not directly usable; they have to be preprocessed in order to transform them into a suitable format for different data mining tasks. One of the key issues in the preprocessing phase is to identify web users. Identifying users based on web log files is not a straightforward problem, thus various methods have been developed. There are several difficulties that have to be overcome, such as client side caching, changing and shared IP addresses and so on. This paper presents three different methods for identifying web users. Two of them are the most commonly used methods in web log mining systems, whereas the third on is our novel approach that uses a complex cookie-based method to identify web users. Furthermore we also take steps towards identifying the individuals behind the impersonal web users. To demonstrate the efficiency of the new method we developed an implementation called Web Activity Tracking (WAT) system that aims at a more precise distinction of web users based on log data. We present some statistical analysis created by the WAT on real data about the behavior of the Hungarian web users and a comprehensive analysis and comparison of the three methodsKeywords: Data preparation, Tracking individuals, Web useridentification, Web usage mining
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 43934890 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014
Authors: Alexiou Dimitra, Fragkaki Maria
Abstract:
The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.
Keywords: Multiple factorial correspondence analysis, principal component analysis, factor analysis, E.U.-28 countries, statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu statistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10854889 Application of Scanning Electron Microscopy and X-Ray Evaluation of the Main Digestion Methods for Determination of Macroelements in Plant Tissue
Authors: Krasimir I. Ivanov, Penka S. Zapryanova, Stefan V. Krustev, Violina R. Angelova
Abstract:
Three commonly used digestion methods (dry ashing, acid digestion, and microwave digestion) in different variants were compared for digestion of tobacco leaves. Three main macroelements (K, Ca and Mg) were analysed using AAS Spectrometer Spectra АА 220, Varian, Australia. The accuracy and precision of the measurements were evaluated by using Polish reference material CTR-VTL-2 (Virginia tobacco leaves). To elucidate the problems with elemental recovery X-Ray and SEM–EDS analysis of all residues after digestion were performed. The X-ray investigation showed a formation of KClO4 when HClO4 was used as a part of the acids mixture. The use of HF at Ca and Mg determination led to the formation of CaF2 and MgF2. The results were confirmed by energy dispersive X-ray microanalysis. SPSS program for Windows was used for statistical data processing.
Keywords: Digestion methods, determination of macroelements, plant tissue.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9404888 Towards Integrating Statistical Color Features for Human Skin Detection
Authors: Mohd Zamri Osman, Mohd Aizaini Maarof, Mohd Foad Rohani
Abstract:
Human skin detection recognized as the primary step in most of the applications such as face detection, illicit image filtering, hand recognition and video surveillance. The performance of any skin detection applications greatly relies on the two components: feature extraction and classification method. Skin color is the most vital information used for skin detection purpose. However, color feature alone sometimes could not handle images with having same color distribution with skin color. A color feature of pixel-based does not eliminate the skin-like color due to the intensity of skin and skin-like color fall under the same distribution. Hence, the statistical color analysis will be exploited such mean and standard deviation as an additional feature to increase the reliability of skin detector. In this paper, we studied the effectiveness of statistical color feature for human skin detection. Furthermore, the paper analyzed the integrated color and texture using eight classifiers with three color spaces of RGB, YCbCr, and HSV. The experimental results show that the integrating statistical feature using Random Forest classifier achieved a significant performance with an F1-score 0.969.
Keywords: Color space, neural network, random forest, skin detection, statistical feature.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19554887 Using Statistical Significance and Prediction to Test Long/Short Term Public Services and Patients Cohorts: A Case Study in Scotland
Authors: Sotirios Raptis
Abstract:
Health and Social care (HSc) services planning and scheduling are facing unprecedented challenges, due to the pandemic pressure and also suffer from unplanned spending that is negatively impacted by the global financial crisis. Data-driven approaches can help to improve policies, plan and design services provision schedules using algorithms that assist healthcare managers to face unexpected demands using fewer resources. The paper discusses services packing using statistical significance tests and machine learning (ML) to evaluate demands similarity and coupling. This is achieved by predicting the range of the demand (class) using ML methods such as Classification and Regression Trees (CART), Random Forests (RF), and Logistic Regression (LGR). The significance tests Chi-Squared and Student’s test are used on data over a 39 years span for which data exist for services delivered in Scotland. The demands are associated using probabilities and are parts of statistical hypotheses. These hypotheses, as their NULL part, assume that the target demand is statistically dependent on other services’ demands. This linking is checked using the data. In addition, ML methods are used to linearly predict the above target demands from the statistically found associations and extend the linear dependence of the target’s demand to independent demands forming, thus, groups of services. Statistical tests confirmed ML coupling and made the prediction statistically meaningful and proved that a target service can be matched reliably to other services while ML showed that such marked relationships can also be linear ones. Zero padding was used for missing years records and illustrated better such relationships both for limited years and for the entire span offering long-term data visualizations while limited years periods explained how well patients numbers can be related in short periods of time or that they can change over time as opposed to behaviours across more years. The prediction performance of the associations were measured using metrics such as Receiver Operating Characteristic (ROC), Area Under Curve (AUC) and Accuracy (ACC) as well as the statistical tests Chi-Squared and Student. Co-plots and comparison tables for the RF, CART, and LGR methods as well as the p-value from tests and Information Exchange (IE/MIE) measures are provided showing the relative performance of ML methods and of the statistical tests as well as the behaviour using different learning ratios. The impact of k-neighbours classification (k-NN), Cross-Correlation (CC) and C-Means (CM) first groupings was also studied over limited years and for the entire span. It was found that CART was generally behind RF and LGR but in some interesting cases, LGR reached an AUC = 0 falling below CART, while the ACC was as high as 0.912 showing that ML methods can be confused by zero-padding or by data’s irregularities or by the outliers. On average, 3 linear predictors were sufficient, LGR was found competing well RF and CART followed with the same performance at higher learning ratios. Services were packed only when a significance level (p-value) of their association coefficient was more than 0.05. Social factors relationships were observed between home care services and treatment of old people, low birth weights, alcoholism, drug abuse, and emergency admissions. The work found that different HSc services can be well packed as plans of limited duration, across various services sectors, learning configurations, as confirmed by using statistical hypotheses.
Keywords: Class, cohorts, data frames, grouping, prediction, probabilities, services.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4614886 Analysing and Classifying VLF Transients
Authors: Ernst D. Schmitter
Abstract:
Monitoring lightning electromagnetic pulses (sferics) and other terrestrial as well as extraterrestrial transient radiation signals is of considerable interest for practical and theoretical purposes in astro- and geophysics as well as meteorology. Managing a continuous flow of data, automation of the analysis and classification process is important. Features based on a combination of wavelet and statistical methods proved efficient for this task and serve as input into a radial basis function network that is trained to discriminate transient shapes from pulse like to wave like. We concentrate on signals in the Very Low Frequency (VLF, 3 -30 kHz) range in this paper, but the developed methods are independent of this specific choice.
Keywords: Transient signals, statistics, wavelets, neural networks
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18804885 Comparison of Experimental Relationships to Determine Flow Discharge in Meandering Compound Channels Using M5 Decision Tree Model
Authors: Mehdi Kheradmand, Mehdi Azhdary Moghaddam, Abdolreza Zahiri, Khalil Ghorbani
Abstract:
This research compares results of major methods of determining the flow discharge using experimental relationships with results from the M5 decision tree model in meandering compound sections in several laboratory channels. It was found that the M5 decision tree model enjoyed greater accuracy of statistical parameters compared to methods to the said methods. This suggested that the M5 decision tree model has highly improved the calculated accuracy of the flow discharge in meandering compound channels.
Keywords: Stage-discharge relationship, M5 decision tree model, compound section, meandering compound channel.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2334884 The Effect of Cooperation Teaching Method on Learning of Students in Primary Schools
Authors: Fereshteh Afkari, Davood Bagheri
Abstract:
The effect of teaching method on learning assistance Dunn Review .The study, to compare the effects of collaboration on teaching mathematics learning courses, including writing, science, experimental girl students by other methods of teaching basic first paid and the amount of learning students methods have been trained to cooperate with other students with other traditional methods have been trained to compare. The survey on 100 students in Tehran that using random sampling ¬ cluster of girl students between the first primary selections was performed. Considering the topic of semi-experimental research methods used to practice the necessary information by questionnaire, examination questions by the researcher, in collaboration with teachers and view authority in this field and related courses that teach these must have been collected. Research samples to test and control groups were divided. Experimental group and control group collaboration using traditional methods of mathematics courses, including writing and experimental sciences were trained. Research results using statistical methods T is obtained in two independent groups show that, through training assistance will lead to positive results and student learning in comparison with traditional methods, will increase also led to collaboration methods increase skills to solve math lesson practice, better understanding and increased skill level of students in practical lessons such as science and has been writing.Keywords: method of teaching, learning, collaboration
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1638