Search results for: Statistical data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13845

Search results for: Statistical data analysis

13635 Person-Environment Fit (PE Fit): Evidence from Brazil

Authors: Jucelia Appio, Danielle Deimling De Carli, Bruno Henrique Rocha Fernandes, Nelson Natalino Frizon

Abstract:

The purpose of this paper is to investigate if there are positive and significant correlations between the dimensions of Person-Environment Fit (Person-Job, Person-Organization, Person-Group and Person-Supervisor) at the “Best Companies to Work for” in Brazil in 2017. For that, a quantitative approach was used with a descriptive method being defined as a research sample the "150 Best Companies to Work for", according to data base collected in 2017 and provided by Fundação Instituto of Administração (FIA) of the University of São Paulo (USP). About the data analysis procedures, asymmetry and kurtosis, factorial analysis, Kaiser-Meyer-Olkin (KMO) tests, Bartlett sphericity and Cronbach's alpha were used for the 69 research variables, and as a statistical technique for the purpose of analyzing the hypothesis, Pearson's correlation analysis was performed. As a main result, we highlight that there was a positive and significant correlation between the dimensions of Person-Environment Fit, corroborating the H1 hypothesis that there is a positive and significant correlation between Person-Job Fit, Person-Organization Fit, Person-Group Fit and Person-Supervisor Fit.

Keywords: Human resource management, person-environment fit, strategic people management, best companies to work for.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 998
13634 Non-negative Principal Component Analysis for Face Recognition

Authors: Zhang Yan, Yu Bin

Abstract:

Principle component analysis is often combined with the state-of-art classification algorithms to recognize human faces. However, principle component analysis can only capture these features contributing to the global characteristics of data because it is a global feature selection algorithm. It misses those features contributing to the local characteristics of data because each principal component only contains some levels of global characteristics of data. In this study, we present a novel face recognition approach using non-negative principal component analysis which is added with the constraint of non-negative to improve data locality and contribute to elucidating latent data structures. Experiments are performed on the Cambridge ORL face database. We demonstrate the strong performances of the algorithm in recognizing human faces in comparison with PCA and NREMF approaches.

Keywords: classification, face recognition, non-negativeprinciple component analysis (NPCA)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1695
13633 Statistical Analysis of the Impact of Maritime Transport Gross Domestic Product on Nigeria’s Economy

Authors: K. P. Oyeduntan, K. Oshinubi

Abstract:

Nigeria is referred as the ‘Giant of Africa’ due to high population, land mass and large economy. However, it still trails far behind many smaller economies in the continent in terms of maritime operations. As we have seen that the maritime industry is the sparkplug for national growth, because it houses the most crucial infrastructure that generates wealth for a nation, it is worrisome that a nation with six seaports lag in maritime activities. In this research, we have studied how the Gross Domestic Product (GDP) of the maritime transport influences the Nigerian economy. To do this, we applied Simple Linear Regression (SLR), Support Vector Machine (SVM), Polynomial Regression Model (PRM), Generalized Additive Model (GAM) and Generalized Linear Mixed Model (GLMM) to model the relationship between the nation’s Total GDP (TGDP) and the Maritime Transport GDP (MGDP) using a time series data of 20 years. The result showed that the MGDP is statistically significant to the Nigerian economy. Amongst the statistical tool applied, the PRM of order 4 describes the relationship better when compared to other methods. The recommendations presented in this study will guide policy makers and help improve the economy of Nigeria.

Keywords: Economy, GDP, maritime transport, port, regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 142
13632 Customer Relationship Management on Social Media Affecting Brand Loyalty of Siam Commercial Bank in Bangkok

Authors: Charawee Butbumrung

Abstract:

The purpose of this research was to study customer relationship management on social media affecting brand loyalty of Siam Commercial Bank in Bangkok. The statistics used in data analysis were frequency, mean, standard deviation, and Pearson’s correlation coefficient based on social science statistic program. The result of the study found that the majority of the respondents were female, 37–47 years old of age, bachelor degree of education and monthly income between 10,001 and 15,000 Baht. In addition, customer relationship management in the overall and by each aspect of formulating, maintaining, and extending the customer relationship had a high score. Furthermore, the result of hypothesis testing showed that the difference of the customer’s age, education, occupation, average monthly income had the difference in brand loyalty with the statistical significance level of 0.05 and customer relationship management had related with brand loyalty in the same direction with the low level of statistical significance 0.05.

Keywords: Brand loyalty, customer relationship, management, Siam Commercial Bank, social media.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1131
13631 Statistical Analysis for Overdispersed Medical Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.

Keywords: Zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3315
13630 Categorical Data Modeling: Logistic Regression Software

Authors: Abdellatif Tchantchane

Abstract:

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

Keywords: Logistic regression, Matlab, Categorical data, Influential observation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881
13629 Statistical Optimization of Process Conditions for Disinfection of Water Using Defatted Moringa oleifera Seed Extract

Authors: Suleyman A. Muyibi, Munirat, A. Idris, Saedi Jami, Parveen Jamal, Mohd Ismail Abdul Karim

Abstract:

In this study, statistical optimization design was used to study the optimum disinfection parameters using defatted crude Moringa oleifera seed extracts against Escherichia coli (E. coli) bacterial cells. The classical one-factor-at-a-time (OFAT) and response surface methodology (RSM) was used. The possible optimum range of dosage, contact time and mixing rate from the OFAT study were 25mg/l to 200mg/l, 30minutes to 240 minutes and 100rpm to 160rpm respectively. Analysis of variance (ANOVA) of the statistical optimization using faced centered central composite design showed that dosage, contact time and mixing rate were highly significant. The optimum disinfection range was 125mg/l, at contact time of 30 minutes with mixing rate of 120 rpm. 

Keywords: E.coli, disinfection, Moringa oleifera, response surface methodology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2588
13628 Multistage Condition Monitoring System of Aircraft Gas Turbine Engine

Authors: A. M. Pashayev, D. D. Askerov, C. Ardil, R. A. Sadiqov, P. S. Abdullayev

Abstract:

Researches show that probability-statistical methods application, especially at the early stage of the aviation Gas Turbine Engine (GTE) technical condition diagnosing, when the flight information has property of the fuzzy, limitation and uncertainty is unfounded. Hence the efficiency of application of new technology Soft Computing at these diagnosing stages with the using of the Fuzzy Logic and Neural Networks methods is considered. According to the purpose of this problem training with high accuracy of fuzzy multiple linear and non-linear models (fuzzy regression equations) which received on the statistical fuzzy data basis is made. For GTE technical condition more adequate model making dynamics of skewness and kurtosis coefficients- changes are analysed. Researches of skewness and kurtosis coefficients values- changes show that, distributions of GTE work parameters have fuzzy character. Hence consideration of fuzzy skewness and kurtosis coefficients is expedient. Investigation of the basic characteristics changes- dynamics of GTE work parameters allows drawing conclusion on necessity of the Fuzzy Statistical Analysis at preliminary identification of the engines' technical condition. Researches of correlation coefficients values- changes shows also on their fuzzy character. Therefore for models choice the application of the Fuzzy Correlation Analysis results is offered. At the information sufficiency is offered to use recurrent algorithm of aviation GTE technical condition identification (Hard Computing technology is used) on measurements of input and output parameters of the multiple linear and non-linear generalised models at presence of noise measured (the new recursive Least Squares Method (LSM)). The developed GTE condition monitoring system provides stageby- stage estimation of engine technical conditions. As application of the given technique the estimation of the new operating aviation engine technical condition was made.

Keywords: aviation gas turbine engine, technical condition, fuzzy logic, neural networks, fuzzy statistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1569
13627 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558
13626 Studying the Causes and Affecting Factors of Motorcycle Accidents A Case Study on the Road Accidents in Zanjan Province (IRAN) - 2007

Authors: A. Beheshti, S. Salkhordeh, H. Amini

Abstract:

Based on statistics released by Islamic Republic of Iran Police (IRIP), from among the total 9555 motorcycle accidents that happened in 2007, 857 riders died and 11219 one got injured. If we also consider the death toll and injuries of other vehicles' accidents resulted from traffic violation by motorcycle riders, then paying attention to the motorcycle accidents seems to be very necessary. Therefore, in this study we tried to investigate the traits and issues related to production, application, and training, along with causes of motorcycle accidents from 4 perspectives of road, human, environment and vehicle and also based on statistical and geographical analysis of accident-sheets prepared by Iran Road Patrol Department (IRPD). Unfamiliarity of riders with regulations and techniques of motorcycling, disuse of safety equipments, inefficiency of roads and design of junctions for safe trafficking of motorcycles and finally the lack of sufficient control of responsible organizations are among the major causes which lead to these accidents.

Keywords: Motorcycle, Motorcycle riders, Road accidents, Statistical analysis of accidents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1581
13625 Robust Regression and its Application in Financial Data Analysis

Authors: Mansoor Momeni, Mahmoud Dehghan Nayeri, Ali Faal Ghayoumi, Hoda Ghorbani

Abstract:

This research is aimed to describe the application of robust regression and its advantages over the least square regression method in analyzing financial data. To do this, relationship between earning per share, book value of equity per share and share price as price model and earning per share, annual change of earning per share and return of stock as return model is discussed using both robust and least square regressions, and finally the outcomes are compared. Comparing the results from the robust regression and the least square regression shows that the former can provide the possibility of a better and more realistic analysis owing to eliminating or reducing the contribution of outliers and influential data. Therefore, robust regression is recommended for getting more precise results in financial data analysis.

Keywords: Financial data analysis, Influential data, Outliers, Robust regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1931
13624 Multidimensional and Data Mining Analysis for Property Investment Risk Analysis

Authors: Nur Atiqah Rochin Demong, Jie Lu, Farookh Khadeer Hussain

Abstract:

Property investment in the real estate industry has a high risk due to the uncertainty factors that will affect the decisions made and high cost. Analytic hierarchy process has existed for some time in which referred to an expert-s opinion to measure the uncertainty of the risk factors for the risk analysis. Therefore, different level of experts- experiences will create different opinion and lead to the conflict among the experts in the field. The objective of this paper is to propose a new technique to measure the uncertainty of the risk factors based on multidimensional data model and data mining techniques as deterministic approach. The propose technique consist of a basic framework which includes four modules: user, technology, end-user access tools and applications. The property investment risk analysis defines as a micro level analysis as the features of the property will be considered in the analysis in this paper.

Keywords: Uncertainty factors, data mining, multidimensional data model, risk analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2922
13623 A Software Tool Design for Cerebral Infarction of MR Images

Authors: Kyoung-Jong Park, Woong-Gi Jeon, Hee-Cheol Kim, Dong-Eog Kim, Heung-Kook Choi

Abstract:

The brain MR imaging-based clinical research and analysis system were specifically built and the development for a large-scale data was targeted. We used the general clinical data available for building large-scale data. Registration period for the selection of the lesion ROI and the region growing algorithm was used and the Mesh-warp algorithm for matching was implemented. The accuracy of the matching errors was modified individually. Also, the large ROI research data can accumulate by our developed compression method. In this way, the correctly decision criteria to the research result was suggested. The experimental groups were age, sex, MR type, patient ID and smoking which can easily be queries. The result data was visualized of the overlapped images by a color table. Its data was calculated by the statistical package. The evaluation for the utilization of this system in the chronic ischemic damage in the area has done from patients with the acute cerebral infarction. This is the cause of neurologic disability index location in the center portion of the lateral ventricle facing. The corona radiate was found in the position. Finally, the system reliability was measured both inter-user and intra-user registering correlation.

Keywords: Software tool design, Cerebral infarction, Brain MR image, Registration

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1663
13622 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: Big Data, Social Networks, Sentiment Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4348
13621 Impact of Financial System’s Development on Economic Development: An Empirical Investigation

Authors: Vilma Deltuvaitė

Abstract:

Comparisons of financial development across countries are central to answering many of the questions on factors leading to economic development. For this reason this study analyzes the implications of financial system’s development on country’s economic development. The aim of the article: to analyze the impact of financial system’s development on economic development. The following research methods were used: systemic, logical and comparative analysis of scientific literature, analysis of statistical data, time series model (Autoregressive Distributed Lag (ARDL) Model). The empirical results suggest about positive short and long term effect of stock market development on GDP per capita.

Keywords: Banking sector, economic development, financial system’s development, stock market, private bond market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2124
13620 Statistical Computational of Volatility in Financial Time Series Data

Authors: S. Al Wadi, Mohd Tahir Ismail, Samsul Ariffin Abdul Karim

Abstract:

It is well known that during the developments in the economic sector and through the financial crises occur everywhere in the whole world, volatility measurement is the most important concept in financial time series. Therefore in this paper we discuss the volatility for Amman stocks market (Jordan) for certain period of time. Since wavelet transform is one of the most famous filtering methods and grows up very quickly in the last decade, we compare this method with the traditional technique, Fast Fourier transform to decide the best method for analyzing the volatility. The comparison will be done on some of the statistical properties by using Matlab program.

Keywords: Fast Fourier transforms, Haar wavelet transform, Matlab (Wavelet tools), stocks market, Volatility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2316
13619 Novel NMR-Technology to Assess Food Quality and Safety

Authors: Markus Link, Manfred Spraul, Hartmut Schaefer, Fang Fang, Birk Schuetz

Abstract:

High Resolution NMR Spectroscopy offers unique screening capabilities for food quality and safety by combining non-targeted and targeted screening in one analysis.

The objective is to demonstrate, that due to its extreme reproducibility NMR can detect smallest changes in concentrations of many components in a mixture, which is best monitored by statistical evaluation however also delivers reliable quantification results.

The methodology typically uses a 400 MHz high resolution instrument under full automation after minimized sample preparation.

For example one fruit juice analysis in a push button operation takes at maximum 15 minutes and delivers a multitude of results, which are automatically summarized in a PDF report.

The method has been proven on fruit juices, where so far unknown frauds could be detected. In addition conventional targeted parameters are obtained in the same analysis. This technology has the advantage that NMR is completely quantitative and concentration calibration only has to be done once for all compounds. Since NMR is so reproducible, it is also transferable between different instruments (with same field strength) and laboratories. Based on strict SOP`s, statistical models developed once can be used on multiple instruments and strategies for compound identification and quantification are applicable as well across labs.

Keywords: Automated solution, NMR, non-targeted screening, targeted screening.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2247
13618 Influence of Parameters of Modeling and Data Distribution for Optimal Condition on Locally Weighted Projection Regression Method

Authors: Farhad Asadi, Mohammad Javad Mollakazemi, Aref Ghafouri

Abstract:

Recent research in neural networks science and neuroscience for modeling complex time series data and statistical learning has focused mostly on learning from high input space and signals. Local linear models are a strong choice for modeling local nonlinearity in data series. Locally weighted projection regression is a flexible and powerful algorithm for nonlinear approximation in high dimensional signal spaces. In this paper, different learning scenario of one and two dimensional data series with different distributions are investigated for simulation and further noise is inputted to data distribution for making different disordered distribution in time series data and for evaluation of algorithm in locality prediction of nonlinearity. Then, the performance of this algorithm is simulated and also when the distribution of data is high or when the number of data is less the sensitivity of this approach to data distribution and influence of important parameter of local validity in this algorithm with different data distribution is explained.

Keywords: Local nonlinear estimation, LWPR algorithm, Online training method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1601
13617 Clustering Categorical Data Using Hierarchies (CLUCDUH)

Authors: Gökhan Silahtaroğlu

Abstract:

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

Keywords: Clustering, tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555
13616 Assessing Basic Computer Applications’ Skills of College-Level Students in Saudi Arabia

Authors: Mohammed A. Gharawi, Majed M. Khoja

Abstract:

This paper is a report on the findings of a study conducted at the Institute of Public Administration (IPA) in Saudi Arabia. The paper applied both qualitative and quantitative approaches to assess the levels of basic computer applications’ skills among students enrolled in the preparatory programs of the institution. Qualitative data have been collected from semi-structured interviews with the instructors who have previously been assigned to teach Introduction to information technology courses. Quantitative data were collected by executing a self-report questionnaire and a written statistical test. Three hundred eighty enrolled students responded to the questionnaire and one hundred forty two accomplished the statistical test. The results indicate the lack of necessary skills to deal with computer applications among most of the students who are enrolled in the IPA’s preparatory programs.

Keywords: Assessment, Computer Applications, Computer Literacy, Institute of Public Administration, Saudi Arabia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2682
13615 Speed Characteristics of Mixed Traffic Flow on Urban Arterials

Authors: Ashish Dhamaniya, Satish Chandra

Abstract:

Speed and traffic volume data are collected on different sections of four lane and six lane roads in three metropolitan cities in India. Speed data are analyzed to fit the statistical distribution to individual vehicle speed data and all vehicles speed data. It is noted that speed data of individual vehicle generally follows a normal distribution but speed data of all vehicle combined at a section of urban road may or may not follow the normal distribution depending upon the composition of traffic stream. A new term Speed Spread Ratio (SSR) is introduced in this paper which is the ratio of difference in 85th and 50th percentile speed to the difference in 50th and 15th percentile speed. If SSR is unity then speed data are truly normally distributed. It is noted that on six lane urban roads, speed data follow a normal distribution only when SSR is in the range of 0.86 – 1.11. The range of SSR is validated on four lane roads also.

Keywords: Normal distribution, percentile speed, speed spread ratio, traffic volume.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4245
13614 Characterisation and Classification of Natural Transients

Authors: Ernst D. Schmitter

Abstract:

Monitoring lightning electromagnetic pulses (sferics) and other terrestrial as well as extraterrestrial transient radiation signals is of considerable interest for practical and theoretical purposes in astro- and geophysics as well as meteorology. Managing a continuous flow of data, automisation of the detection and classification process is important. Features based on a combination of wavelet and statistical methods proved efficient for analysis and characterisation of transients and as input into a radial basis function network that is trained to discriminate transients from pulse like to wave like.

Keywords: transient signals, statistics, wavelets, neural networks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449
13613 A Monte Carlo Method to Data Stream Analysis

Authors: Kittisak Kerdprasop, Nittaya Kerdprasop, Pairote Sattayatham

Abstract:

Data stream analysis is the process of computing various summaries and derived values from large amounts of data which are continuously generated at a rapid rate. The nature of a stream does not allow a revisit on each data element. Furthermore, data processing must be fast to produce timely analysis results. These requirements impose constraints on the design of the algorithms to balance correctness against timely responses. Several techniques have been proposed over the past few years to address these challenges. These techniques can be categorized as either dataoriented or task-oriented. The data-oriented approach analyzes a subset of data or a smaller transformed representation, whereas taskoriented scheme solves the problem directly via approximation techniques. We propose a hybrid approach to tackle the data stream analysis problem. The data stream has been both statistically transformed to a smaller size and computationally approximated its characteristics. We adopt a Monte Carlo method in the approximation step. The data reduction has been performed horizontally and vertically through our EMR sampling method. The proposed method is analyzed by a series of experiments. We apply our algorithm on clustering and classification tasks to evaluate the utility of our approach.

Keywords: Data Stream, Monte Carlo, Sampling, DensityEstimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1416
13612 Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: Die-Map Clustering, Feature Extraction, Pattern Recognition, Semiconductor Manufacturing Process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3151
13611 Behavioral Response of Bee Farmers to Climate Change in South East, Nigeria

Authors: Jude A. Mbanasor, Chigozirim N. Onwusiribe

Abstract:

The enigma climate change is no longer an illusion but a reality. In the recent years, the Nigeria climate has changed and the changes are shown by the changing patterns of rainfall, the sunshine, increasing level carbon and nitrous emission as well as deforestation. This study analyzed the behavioural response of bee keepers to variations in the climate and the adaptation techniques developed in response to the climate variation. Beekeeping is a viable economic activity for the alleviation of poverty as the products include honey, wax, pollen, propolis, royal jelly, venom, queens, bees and their larvae and are all marketable. The study adopted the multistage sampling technique to select 120 beekeepers from the five states of Southeast Nigeria. Well-structured questionnaires and focus group discussions were adopted to collect the required data. Statistical tools like the Principal component analysis, data envelopment models, graphs, and charts were used for the data analysis. Changing patterns of rainfall and sunshine with the increasing rate of deforestation had a negative effect on the habitat of the bees. The bee keepers have adopted the Kenya Top bar and Langstroth hives and they establish the bee hives on fallow farmland close to the cultivated communal farms with more flowering crops.

Keywords: Climate, smart, smallholder, farmer, socioeconomic, response.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 607
13610 A Heuristic Statistical Model for Lifetime Distribution Analysis of Complicated Systems in the Reliability Centered Maintenance

Authors: Mojtaba Mahdavi, Mohamad Mahdavi, Maryam Yazdani

Abstract:

A heuristic conceptual model for to develop the Reliability Centered Maintenance (RCM), especially in preventive strategy, has been explored during this paper. In most real cases which complicity of system obligates high degree of reliability, this model proposes a more appropriate reliability function between life time distribution based and another which is based on relevant Extreme Value (EV) distribution. A statistical and mathematical approach is used to estimate and verify these two distribution functions. Then best one is chosen just among them, whichever is more reliable. A numeric Industrial case study will be reviewed to represent the concepts of this paper, more clearly.

Keywords: Lifetime distribution, Reliability, Estimation, Extreme value, Improving model, Series, Parallel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479
13609 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti

Abstract:

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., entropy, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one-class classification (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, principal component analysis (PCA), kernel principal component analysis (KPCA), and autoassociative neural network (ANN) are presented and their performance are compared. It is also shown that, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 95%.

Keywords: Anomaly detection, dimensionality reduction, frequencies selection, modal analysis, neural network, structural health monitoring, vibration measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 708
13608 A Propagator Method like Algorithm for Estimation of Multiple Real-Valued Sinusoidal Signal Frequencies

Authors: Sambit Prasad Kar, P.Palanisamy

Abstract:

In this paper a novel method for multiple one dimensional real valued sinusoidal signal frequency estimation in the presence of additive Gaussian noise is postulated. A computationally simple frequency estimation method with efficient statistical performance is attractive in many array signal processing applications. The prime focus of this paper is to combine the subspace-based technique and a simple peak search approach. This paper presents a variant of the Propagator Method (PM), where a collaborative approach of SUMWE and Propagator method is applied in order to estimate the multiple real valued sine wave frequencies. A new data model is proposed, which gives the dimension of the signal subspace is equal to the number of frequencies present in the observation. But, the signal subspace dimension is twice the number of frequencies in the conventional MUSIC method for estimating frequencies of real-valued sinusoidal signal. The statistical analysis of the proposed method is studied, and the explicit expression of asymptotic (large-sample) mean-squared-error (MSE) or variance of the estimation error is derived. The performance of the method is demonstrated, and the theoretical analysis is substantiated through numerical examples. The proposed method can achieve sustainable high estimation accuracy and frequency resolution at a lower SNR, which is verified by simulation by comparing with conventional MUSIC, ESPRIT and Propagator Method.

Keywords: Frequency estimation, peak search, subspace-based method without eigen decomposition, quadratic convex function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1731
13607 Review of the Road Crash Data Availability in Iraq

Authors: Abeer K. Jameel, Harry Evdorides

Abstract:

Iraq is a middle income country where the road safety issue is considered one of the leading causes of deaths. To control the road risk issue, the Iraqi Ministry of Planning, General Statistical Organization started to organise a collection system of traffic accidents data with details related to their causes and severity. These data are published as an annual report. In this paper, a review of the available crash data in Iraq will be presented. The available data represent the rate of accidents in aggregated level and classified according to their types, road users’ details, and crash severity, type of vehicles, causes and number of causalities. The review is according to the types of models used in road safety studies and research, and according to the required road safety data in the road constructions tasks. The available data are also compared with the road safety dataset published in the United Kingdom as an example of developed country. It is concluded that the data in Iraq are suitable for descriptive and exploratory models, aggregated level comparison analysis, and evaluation and monitoring the progress of the overall traffic safety performance. However, important traffic safety studies require disaggregated level of data and details related to the factors of the likelihood of traffic crashes. Some studies require spatial geographic details such as the location of the accidents which is essential in ranking the roads according to their level of safety, and name the most dangerous roads in Iraq which requires tactic plan to control this issue. Global Road safety agencies interested in solve this problem in low and middle-income countries have designed road safety assessment methodologies which are basing on the road attributes data only. Therefore, in this research it is recommended to use one of these methodologies.

Keywords: Data availability, Iraq, road safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 931
13606 Injury Prediction for Soccer Players Using Machine Learning

Authors: Amiel Satvedi, Richard Pyne

Abstract:

Injuries in professional sports occur on a regular basis. Some may be minor while others can cause huge impact on a player’s career and earning potential. In soccer, there is a high risk of players picking up injuries during game time. This research work seeks to help soccer players reduce the risk of getting injured by predicting the likelihood of injury while playing in the near future and then providing recommendations for intervention. The injury prediction tool will use a soccer player’s number of minutes played on the field, number of appearances, distance covered and performance data for the current and previous seasons as variables to conduct statistical analysis and provide injury predictive results using a machine learning linear regression model.

Keywords: Injury predictor, soccer injury prevention, machine learning in soccer, big data in soccer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1747