Search results for: statistical data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 43115

Search results for: statistical data analysis

42965 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 279
42964 AI-Driven Solutions for Optimizing Master Data Management

Authors: Srinivas Vangari

Abstract:

In the era of big data, ensuring the accuracy, consistency, and reliability of critical data assets is crucial for data-driven enterprises. Master Data Management (MDM) plays a crucial role in this endeavor. This paper investigates the role of Artificial Intelligence (AI) in enhancing MDM, focusing on how AI-driven solutions can automate and optimize various stages of the master data lifecycle. By integrating AI (Quantitative and Qualitative Analysis) into processes such as data creation, maintenance, enrichment, and usage, organizations can achieve significant improvements in data quality and operational efficiency. Quantitative analysis is employed to measure the impact of AI on key metrics, including data accuracy, processing speed, and error reduction. For instance, our study demonstrates an 18% improvement in data accuracy and a 75% reduction in duplicate records across multiple systems post-AI implementation. Furthermore, AI’s predictive maintenance capabilities reduced data obsolescence by 22%, as indicated by statistical analyses of data usage patterns over a 12-month period. Complementing this, a qualitative analysis delves into the specific AI-driven strategies that enhance MDM practices, such as automating data entry and validation, which resulted in a 28% decrease in manual errors. Insights from case studies highlight how AI-driven data cleansing processes reduced inconsistencies by 25% and how AI-powered enrichment strategies improved data relevance by 24%, thus boosting decision-making accuracy. The findings demonstrate that AI significantly enhances data quality and integrity, leading to improved enterprise performance through cost reduction, increased compliance, and more accurate, real-time decision-making. These insights underscore the value of AI as a critical tool in modern data management strategies, offering a competitive edge to organizations that leverage its capabilities.

Keywords: artificial intelligence, master data management, data governance, data quality

Procedia PDF Downloads 23
42963 Statistical Analysis of Rainfall Change over the Blue Nile Basin

Authors: Hany Mustafa, Mahmoud Roushdi, Khaled Kheireldin

Abstract:

Rainfall variability is an important feature of semi-arid climates. Climate change is very likely to increase the frequency, magnitude, and variability of extreme weather events such as droughts, floods, and storms. The Blue Nile Basin is facing extreme climate change-related events such as floods and droughts and its possible impacts on ecosystem, livelihood, agriculture, livestock, and biodiversity are expected. Rainfall variability is a threat to food production in the Blue Nile Basin countries. This study investigates the long-term variations and trends of seasonal and annual precipitation over the Blue Nile Basin for 102-year period (1901-2002). Six statistical trend analysis of precipitation was performed with nonparametric Mann-Kendall test and Sen's slope estimator. On the other hands, four statistical absolute homogeneity tests: Standard Normal Homogeneity Test, Buishand Range test, Pettitt test and the Von Neumann ratio test were applied to test the homogeneity of the rainfall data, using XLSTAT software, which results of p-valueless than alpha=0.05, were significant. The percentages of significant trends obtained for each parameter in the different seasons are presented. The study recommends adaptation strategies to be streamlined to relevant policies, enhancing local farmers’ adaptive capacity for facing future climate change effects.

Keywords: Blue Nile basin, climate change, Mann-Kendall test, trend analysis

Procedia PDF Downloads 555
42962 Statistical Analysis to Select Evacuation Route

Authors: Zaky Musyarof, Dwi Yono Sutarto, Dwima Rindy Atika, R. B. Fajriya Hakim

Abstract:

Each country should be responsible for the safety of people, especially responsible for the safety of people living in disaster-prone areas. One of those services is provides evacuation route for them. But all this time, the selection of evacuation route is seem doesn’t well organized, it could be seen that when a disaster happen, there will be many accumulation of people on the steps of evacuation route. That condition is dangerous to people because hampers evacuation process. By some methods in Statistical analysis, author tries to give a suggestion how to prepare evacuation route which is organized and based on people habit. Those methods are association rules, sequential pattern mining, hierarchical cluster analysis and fuzzy logic.

Keywords: association rules, sequential pattern mining, cluster analysis, fuzzy logic, evacuation route

Procedia PDF Downloads 509
42961 Effects of Video Games and Online Chat on Mathematics Performance in High School: An Approach of Multivariate Data Analysis

Authors: Lina Wu, Wenyi Lu, Ye Li

Abstract:

Regarding heavy video game players for boys and super online chat lovers for girls as a symbolic phrase in the current adolescent culture, this project of data analysis verifies the displacement effect on deteriorating mathematics performance. To evaluate correlation or regression coefficients between a factor of playing video games or chatting online and mathematics performance compared with other factors, we use multivariate analysis technique and take gender difference into account. We find the most important reason for the negative sign of the displacement effect on mathematics performance due to students’ poor academic background. Statistical analysis methods in this project could be applied to study internet users’ academic performance from the high school education to the college education.

Keywords: correlation coefficients, displacement effect, multivariate analysis technique, regression coefficients

Procedia PDF Downloads 369
42960 Metrology-Inspired Methods to Assess the Biases of Artificial Intelligence Systems

Authors: Belkacem Laimouche

Abstract:

With the field of artificial intelligence (AI) experiencing exponential growth, fueled by technological advancements that pave the way for increasingly innovative and promising applications, there is an escalating need to develop rigorous methods for assessing their performance in pursuit of transparency and equity. This article proposes a metrology-inspired statistical framework for evaluating bias and explainability in AI systems. Drawing from the principles of metrology, we propose a pioneering approach, using a concrete example, to evaluate the accuracy and precision of AI models, as well as to quantify the sources of measurement uncertainty that can lead to bias in their predictions. Furthermore, we explore a statistical approach for evaluating the explainability of AI systems based on their ability to provide interpretable and transparent explanations of their predictions.

Keywords: artificial intelligence, metrology, measurement uncertainty, prediction error, bias, machine learning algorithms, probabilistic models, interlaboratory comparison, data analysis, data reliability, measurement of bias impact on predictions, improvement of model accuracy and reliability

Procedia PDF Downloads 108
42959 Evaluation of the Efficiency of French Language Educational Software for Learners in Semnan Province, Iran

Authors: Alireza Hashemi

Abstract:

In recent decades, language teaching methodology has undergone significant changes due to the advent of computers and the growth of educational software. French language education has also benefited from these developments, and various software has been produced to facilitate the learning of this language. However, the question arises whether these software programs meet the educational needs of Iranian learners, particularly in Semnan Province. The aim of this study is to evaluate the efficiency and effectiveness of French language educational software for learners in Semnan Province, considering educational, cultural, and technical criteria. In this study, content analysis and performance evaluation methods were used to examine the educational software ‘Français Facile’. This software was evaluated based on criteria such as teaching methods, cultural compatibility, and technical features. To collect data, standardized questionnaires and semi-structured interviews with learners in Semnan Province were used. Additionally, the SPSS statistical software was employed for quantitative data analysis, and the thematic analysis method was used for qualitative data. The results indicated that the ‘Français Facile’ software has strengths such as providing diverse educational content and an interactive learning environment. However, some weaknesses include the lack of alignment of educational content with the learning culture of learners in Semnan Province and technical issues in software execution. Statistical data showed that 65% of learners were satisfied with the educational content, but 55% reported issues related to cultural alignment with their needs. This study indicates that to enhance the efficiency of French language educational software, there is a need to localize educational content and improve technical infrastructure. Producing locally adapted educational software can improve the quality of language learning and increase the motivation of learners in Semnan Province. This research emphasizes the importance of understanding the cultural and educational needs of learners in the development of educational software and recommends that developers of educational software pay special attention to these aspects.

Keywords: educational software, French language, Iran, learners in Semnan province

Procedia PDF Downloads 47
42958 Credit Card Fraud Detection with Ensemble Model: A Meta-Heuristic Approach

Authors: Gong Zhilin, Jing Yang, Jian Yin

Abstract:

The purpose of this paper is to develop a novel system for credit card fraud detection based on sequential modeling of data using hybrid deep learning models. The projected model encapsulates five major phases are pre-processing, imbalance-data handling, feature extraction, optimal feature selection, and fraud detection with an ensemble classifier. The collected raw data (input) is pre-processed to enhance the quality of the data through alleviation of the missing data, noisy data as well as null values. The pre-processed data are class imbalanced in nature, and therefore they are handled effectively with the K-means clustering-based SMOTE model. From the balanced class data, the most relevant features like improved Principal Component Analysis (PCA), statistical features (mean, median, standard deviation) and higher-order statistical features (skewness and kurtosis). Among the extracted features, the most optimal features are selected with the Self-improved Arithmetic Optimization Algorithm (SI-AOA). This SI-AOA model is the conceptual improvement of the standard Arithmetic Optimization Algorithm. The deep learning models like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and optimized Quantum Deep Neural Network (QDNN). The LSTM and CNN are trained with the extracted optimal features. The outcomes from LSTM and CNN will enter as input to optimized QDNN that provides the final detection outcome. Since the QDNN is the ultimate detector, its weight function is fine-tuned with the Self-improved Arithmetic Optimization Algorithm (SI-AOA).

Keywords: credit card, data mining, fraud detection, money transactions

Procedia PDF Downloads 136
42957 Characteristic Function in Estimation of Probability Distribution Moments

Authors: Vladimir S. Timofeev

Abstract:

In this article the problem of distributional moments estimation is considered. The new approach of moments estimation based on usage of the characteristic function is proposed. By statistical simulation technique, author shows that new approach has some robust properties. For calculation of the derivatives of characteristic function there is used numerical differentiation. Obtained results confirmed that author’s idea has a certain working efficiency and it can be recommended for any statistical applications.

Keywords: characteristic function, distributional moments, robustness, outlier, statistical estimation problem, statistical simulation

Procedia PDF Downloads 509
42956 Research on Transmission Parameters Determination Method Based on Dynamic Characteristic Analysis

Authors: Baoshan Huang, Fanbiao Bao, Bing Li, Lianghua Zeng, Yi Zheng

Abstract:

Parameter control strategy based on statistical characteristics can analyze the choice of the transmission ratio of an automobile transmission. According to the difference of the transmission gear, the number and spacing of the gear can be determined. Transmission ratio distribution of transmission needs to satisfy certain distribution law. According to the statistic characteristics of driving parameters, the shift control strategy of the vehicle is analyzed. CVT shift schedule adjustment algorithm based on statistical characteristic parameters can be seen from the above analysis, if according to the certain algorithm to adjust the size of, can adjust the target point are in the best efficiency curve and dynamic curve between the location, to alter the vehicle characteristics. Based on the dynamic characteristics and the practical application of the vehicle, this paper presents the setting scheme of the transmission ratio.

Keywords: vehicle dynamics, transmission ratio, transmission parameters, statistical characteristics

Procedia PDF Downloads 409
42955 Series-Parallel Systems Reliability Optimization Using Genetic Algorithm and Statistical Analysis

Authors: Essa Abrahim Abdulgader Saleem, Thien-My Dao

Abstract:

The main objective of this paper is to optimize series-parallel system reliability using Genetic Algorithm (GA) and statistical analysis; considering system reliability constraints which involve the redundant numbers of selected components, total cost, and total weight. To perform this work, firstly the mathematical model which maximizes system reliability subject to maximum system cost and maximum system weight constraints is presented; secondly, a statistical analysis is used to optimize GA parameters, and thirdly GA is used to optimize series-parallel systems reliability. The objective is to determine the strategy choosing the redundancy level for each subsystem to maximize the overall system reliability subject to total cost and total weight constraints. Finally, the series-parallel system case study reliability optimization results are showed, and comparisons with the other previous results are presented to demonstrate the performance of our GA.

Keywords: reliability, optimization, meta-heuristic, genetic algorithm, redundancy

Procedia PDF Downloads 341
42954 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which covers the variety of figure proportions in both height and girth. 3,000 data has been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from some states of India to produce the sizing system suitable for clothing manufacture and retailing. This data is used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from a large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: anthropometric data, data mining, decision tree, garments manufacturing, sizing systems, ready-made garments

Procedia PDF Downloads 137
42953 A Crowdsourced Homeless Data Collection System and Its Econometric Analysis: Strengthening Inclusive Public Administration Policies

Authors: Praniil Nagaraj

Abstract:

This paper proposes a method to collect homeless data using crowdsourcing and presents an approach to analyze the data, demonstrating its potential to strengthen existing and future policies aimed at promoting socio-economic equilibrium. This paper's contributions can be categorized into three main areas. Firstly, a unique method for collecting homeless data is introduced, utilizing a user-friendly smartphone app (currently available for Android). The app enables the general public to quickly record information about homeless individuals, including the number of people and details about their living conditions. The collected data, including date, time, and location, is anonymized and securely transmitted to the cloud. It is anticipated that an increasing number of users motivated to contribute to society will adopt the app, thus expanding the data collection efforts. Duplicate data is addressed through simple classification methods, and historical data is utilized to fill in missing information. The second contribution of this paper is the description of data analysis techniques applied to the collected data. By combining this new data with existing information, statistical regression analysis is employed to gain insights into various aspects, such as distinguishing between unsheltered and sheltered homeless populations, as well as examining their correlation with factors like unemployment rates, housing affordability, and labor demand. Initial data is collected in San Francisco, while pre-existing information is drawn from three cities: San Francisco, New York City, and Washington D.C., facilitating the conduction of simulations. The third contribution focuses on demonstrating the practical implications of the data processing results. The challenges faced by key stakeholders, including charitable organizations and local city governments, are taken into consideration. Two case studies are presented as examples. The first case study explores improving the efficiency of food and necessities distribution, as well as medical assistance, driven by charitable organizations. The second case study examines the correlation between micro-geographic budget expenditure by local city governments and homeless information to justify budget allocation and expenditures. The ultimate objective of this endeavor is to enable the continuous enhancement of the quality of life for the underprivileged. It is hoped that through increased crowdsourcing of data from the public, the Generosity Curve and the Need Curve will intersect, leading to a better world for all.

Keywords: crowdsourcing, homelessness, socio-economic policies, statistical analysis

Procedia PDF Downloads 54
42952 Predicting National Football League (NFL) Match with Score-Based System

Authors: Marcho Setiawan Handok, Samuel S. Lemma, Abdoulaye Fofana, Naseef Mansoor

Abstract:

This paper is proposing a method to predict the outcome of the National Football League match with data from 2019 to 2022 and compare it with other popular models. The model uses open-source statistical data of each team, such as passing yards, rushing yards, fumbles lost, and scoring. Each statistical data has offensive and defensive. For instance, a data set of anticipated values for a specific matchup is created by comparing the offensive passing yards obtained by one team to the defensive passing yards given by the opposition. We evaluated the model’s performance by contrasting its result with those of established prediction algorithms. This research is using a neural network to predict the score of a National Football League match and then predict the winner of the game.

Keywords: game prediction, NFL, football, artificial neural network

Procedia PDF Downloads 89
42951 Effects of Process Parameter Variation on the Surface Roughness of Rapid Prototyped Samples Using Design of Experiments

Authors: R. Noorani, K. Peerless, J. Mandrell, A. Lopez, R. Dalberto, M. Alzebaq

Abstract:

Rapid prototyping (RP) is an additive manufacturing technology used in industry that works by systematically depositing layers of working material to construct larger, computer-modeled parts. A key challenge associated with this technology is that RP parts often feature undesirable levels of surface roughness for certain applications. To combat this phenomenon, an experimental technique called Design of Experiments (DOE) can be employed during the growth procedure to statistically analyze which RP growth parameters are most influential to part surface roughness. Utilizing DOE to identify such factors is important because it is a technique that can be used to optimize a manufacturing process, which saves time, money, and increases product quality. In this study, a four-factor/two level DOE experiment was performed to investigate the effect of temperature, layer thickness, infill percentage, and infill speed on the surface roughness of RP prototypes. Samples were grown using the sixteen different possible growth combinations associated with a four-factor/two level study, and then the surface roughness data was gathered for each set of factors. After applying DOE statistical analysis to these data, it was determined that layer thickness played the most significant role in the prototype surface roughness.

Keywords: rapid prototyping, surface roughness, design of experiments, statistical analysis, factors and levels

Procedia PDF Downloads 263
42950 Urbanization and Income Inequality in Thailand

Authors: Acumsiri Tantikarnpanit

Abstract:

This paper aims to examine the relationship between urbanization and income inequality in Thailand during the period 2002–2020. Using a panel of data for 76 provinces collected from Thailand’s National Statistical Office (Labor Force Survey: LFS), as well as geospatial data from the U.S. Air Force Defense Meteorological Satellite Program (DMSP) and the Visible Infrared Imaging Radiometer Suite Day/Night band (VIIRS-DNB) satellite for nineteen selected years. This paper employs two different definitions to identify urban areas: 1) Urban areas defined by Thailand's National Statistical Office (Labor Force Survey: LFS), and 2) Urban areas estimated using nighttime light data from the DMSP and VIIRS-DNB satellite. The second method includes two sub-categories: 2.1) Determining urban areas by calculating nighttime light density with a population density of 300 people per square kilometer, and 2.2) Calculating urban areas based on nighttime light density corresponding to a population density of 1,500 people per square kilometer. The empirical analysis based on Ordinary Least Squares (OLS), fixed effects, and random effects models reveals a consistent U-shaped relationship between income inequality and urbanization. The findings from the econometric analysis demonstrate that urbanization or population density has a significant and negative impact on income inequality. Moreover, the square of urbanization shows a statistically significant positive impact on income inequality. Additionally, there is a negative association between logarithmically transformed income and income inequality. This paper also proposes the inclusion of satellite imagery, geospatial data, and spatial econometric techniques in future studies to conduct quantitative analysis of spatial relationships.

Keywords: income inequality, nighttime light, population density, Thailand, urbanization

Procedia PDF Downloads 81
42949 Statistical Description of Counterpoise Effective Length Based on Regressive Formulas

Authors: Petar Sarajcev, Josip Vasilj, Damir Jakus

Abstract:

This paper presents a novel statistical description of the counterpoise effective length due to lightning surges, where the (impulse) effective length had been obtained by means of regressive formulas applied to the transient simulation results. The effective length is described in terms of a statistical distribution function, from which median, mean, variance, and other parameters of interest could be readily obtained. The influence of lightning current amplitude, lightning front duration, and soil resistivity on the effective length has been accounted for, assuming statistical nature of these parameters. A method for determining the optimal counterpoise length, in terms of the statistical impulse effective length, is also presented. It is based on estimating the number of dangerous events associated with lightning strikes. Proposed statistical description and the associated method provide valuable information which could aid the design engineer in optimising physical lengths of counterpoises in different grounding arrangements and soil resistivity situations.

Keywords: counterpoise, grounding conductor, effective length, lightning, Monte Carlo method, statistical distribution

Procedia PDF Downloads 431
42948 Empirical and Indian Automotive Equity Portfolio Decision Support

Authors: P. Sankar, P. James Daniel Paul, Siddhant Sahu

Abstract:

A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.

Keywords: Indian automotive sector, stock market decisions, equity portfolio analysis, decision tree classifiers, statistical data analysis

Procedia PDF Downloads 490
42947 A Brief Study about Nonparametric Adherence Tests

Authors: Vinicius R. Domingues, Luan C. S. M. Ozelim

Abstract:

The statistical study has become indispensable for various fields of knowledge. Not any different, in Geotechnics the study of probabilistic and statistical methods has gained power considering its use in characterizing the uncertainties inherent in soil properties. One of the situations where engineers are constantly faced is the definition of a probability distribution that represents significantly the sampled data. To be able to discard bad distributions, goodness-of-fit tests are necessary. In this paper, three non-parametric goodness-of-fit tests are applied to a data set computationally generated to test the goodness-of-fit of them to a series of known distributions. It is shown that the use of normal distribution does not always provide satisfactory results regarding physical and behavioral representation of the modeled parameters.

Keywords: Kolmogorov-Smirnov test, Anderson-Darling test, Cramer-Von-Mises test, nonparametric adherence tests

Procedia PDF Downloads 449
42946 The Approach of Male and Female Spectators about the Presence of Female Spectators in Sport Stadiums of Iran

Authors: Mohammad Reza Boroumand Devlagh, Seyed Mohammad Hosein Razavi, Fatemeh Ahmadi, Azam Fazli Darzi

Abstract:

The issue of female presence in Iran stadiums has long been considered and debated by governmental experts and authorities, however, no conclusion is yielded yet. Thus, the present study has been done with the aim of investigating the approach of male and female spectators about the presence of female spectators in Iranian stadiums. The statistical population of the study includes all male and female spectators who have not experienced the live watching of male championship matches in stadiums. 224 subjects from the statistical population have selected through stratified random sampling as the sample of the study. For data collection, researcher-made questionnaire has been used whose validity has been confirmed by the university professors and its reliability has been studied and confirmed through an preliminary study. (r= 0.81). Data analysis has been done using descriptive and referential statistics in P< 0.05. The results of the study showed that male and female were meaningfully agreed with the female presence in stadiums and there is no meaningful difference between male and female approaches concerning the female spectators’ presence in sport stadiums of Iran (sig= 0.867).

Keywords: male, female spectators, Iran, sport stadiums, population

Procedia PDF Downloads 553
42945 Use of Statistical Correlations for the Estimation of Shear Wave Velocity from Standard Penetration Test-N-Values: Case Study of Algiers Area

Authors: Soumia Merat, Lynda Djerbal, Ramdane Bahar, Mohammed Amin Benbouras

Abstract:

Along with shear wave, many soil parameters are associated with the standard penetration test (SPT) as a dynamic in situ experiment. Both SPT-N data and geophysical data do not often exist in the same area. Statistical analysis of correlation between these parameters is an alternate method to estimate Vₛ conveniently and without additional investigations or data acquisition. Shear wave velocity is a basic engineering tool required to define dynamic properties of soils. In many instances, engineers opt for empirical correlations between shear wave velocity (Vₛ) and reliable static field test data like standard penetration test (SPT) N value, CPT (Cone Penetration Test) values, etc., to estimate shear wave velocity or dynamic soil parameters. The relation between Vs and SPT- N values of Algiers area is predicted using the collected data, and it is also compared with the previously suggested formulas of Vₛ determination by measuring Root Mean Square Error (RMSE) of each model. Algiers area is situated in high seismic zone (Zone III [RPA 2003: réglement parasismique algerien]), therefore the study is important for this region. The principal aim of this paper is to compare the field measurements of Down-hole test and the empirical models to show which one of these proposed formulas are applicable to predict and deduce shear wave velocity values.

Keywords: empirical models, RMSE, shear wave velocity, standard penetration test

Procedia PDF Downloads 341
42944 Storage System Validation Study for Raw Cocoa Beans Using Minitab® 17 and R (R-3.3.1)

Authors: Anthony Oppong Kyekyeku, Sussana Antwi-Boasiako, Emmanuel De-Graft Johnson Owusu Ansah

Abstract:

In this observational study, the performance of a known conventional storage system was tested and evaluated for fitness for its intended purpose. The system has a scope extended for the storage of dry cocoa beans. System sensitivity, reproducibility and uncertainties are not known in details. This study discusses the system performance in the context of existing literature on factors that influence the quality of cocoa beans during storage. Controlled conditions were defined precisely for the system to give reliable base line within specific established procedures. Minitab® 17 and R statistical software (R-3.3.1) were used for the statistical analyses. The approach to the storage system testing was to observe and compare through laboratory test methods the quality of the cocoa beans samples before and after storage. The samples were kept in Kilner jars and the temperature of the storage environment controlled and monitored over a period of 408 days. Standard test methods use in international trade of cocoa such as the cut test analysis, moisture determination with Aqua boy KAM III model and bean count determination were used for quality assessment. The data analysis assumed the entire population as a sample in order to establish a reliable baseline to the data collected. The study concluded a statistically significant mean value at 95% Confidence Interval (CI) for the performance data analysed before and after storage for all variables observed. Correlational graphs showed a strong positive correlation for all variables investigated with the exception of All Other Defect (AOD). The weak relationship between the before and after data for AOD had an explained variability of 51.8% with the unexplained variability attributable to the uncontrolled condition of hidden infestation before storage. The current study concluded with a high-performance criterion for the storage system.

Keywords: benchmarking performance data, cocoa beans, hidden infestation, storage system validation

Procedia PDF Downloads 176
42943 Using Discriminant Analysis to Forecast Crime Rate in Nigeria

Authors: O. P. Popoola, O. A. Alawode, M. O. Olayiwola, A. M. Oladele

Abstract:

This research work is based on using discriminant analysis to forecast crime rate in Nigeria between 1996 and 2008. The work is interested in how gender (male and female) relates to offences committed against the government, against other properties, disturbance in public places, murder/robbery offences and other offences. The data used was collected from the National Bureau of Statistics (NBS). SPSS, the statistical package was used to analyse the data. Time plot was plotted on all the 29 offences gotten from the raw data. Eigenvalues and Multivariate tests, Wilks’ Lambda, standardized canonical discriminant function coefficients and the predicted classifications were estimated. The research shows that the distribution of the scores from each function is standardized to have a mean O and a standard deviation of 1. The magnitudes of the coefficients indicate how strongly the discriminating variable affects the score. In the predicted group membership, 172 cases that were predicted to commit crime against Government group, 66 were correctly predicted and 106 were incorrectly predicted. After going through the predicted classifications, we found out that most groups numbers that were correctly predicted were less than those that were incorrectly predicted.

Keywords: discriminant analysis, DA, multivariate analysis of variance, MANOVA, canonical correlation, and Wilks’ Lambda

Procedia PDF Downloads 475
42942 Statistical Analysis of Cables in Long-Span Cable-Stayed Bridges

Authors: Ceshi Sun, Yueyu Zhao, Yaobing Zhao, Zhiqiang Wang, Jian Peng, Pengxin Guo

Abstract:

With the rapid development of transportation, there are more than 100 cable-stayed bridges with main span larger than 300 m in China. In order to ascertain the statistical relationships among the design parameters of stay cables and their distribution characteristics, 1500 cables were selected from 25 practical long-span cable-stayed bridges. A new relationship between the first order frequency and the length of cable was found by conducting the curve fitting. Then, based on this relationship other interesting relationships were deduced. Several probability density functions (PDFs) were used to investigate the distributions of the parameters of first order frequency, stress level and the Irvine parameter. It was found that these parameters obey the Lognormal distribution, the Weibull distribution and the generalized Pareto distribution, respectively. Scatter diagrams of the three parameters were plotted and their 95% confidence intervals were also investigated.

Keywords: cable, cable-stayed bridge, long-span, statistical analysis

Procedia PDF Downloads 639
42941 Choosing an Optimal Epsilon for Differentially Private Arrhythmia Analysis

Authors: Arin Ghazarian, Cyril Rakovski

Abstract:

Differential privacy has become the leading technique to protect the privacy of individuals in a database while allowing useful analysis to be done and the results to be shared. It puts a guarantee on the amount of privacy loss in the worst-case scenario. Differential privacy is not a toggle between full privacy and zero privacy. It controls the tradeoff between the accuracy of the results and the privacy loss using a single key parameter called

Keywords: arrhythmia, cardiology, differential privacy, ECG, epsilon, medi-cal data, privacy preserving analytics, statistical databases

Procedia PDF Downloads 159
42940 An Extended Inverse Pareto Distribution, with Applications

Authors: Abdel Hadi Ebraheim

Abstract:

This paper introduces a new extension of the Inverse Pareto distribution in the framework of Marshal-Olkin (1997) family of distributions. This model is capable of modeling various shapes of aging and failure data. The statistical properties of the new model are discussed. Several methods are used to estimate the parameters involved. Explicit expressions are derived for different types of moments of value in reliability analysis are obtained. Besides, the order statistics of samples from the new proposed model have been studied. Finally, the usefulness of the new model for modeling reliability data is illustrated using two real data sets with simulation study.

Keywords: pareto distribution, marshal-Olkin, reliability, hazard functions, moments, estimation

Procedia PDF Downloads 86
42939 Investigating the Effects of Data Transformations on a Bi-Dimensional Chi-Square Test

Authors: Alexandru George Vaduva, Adriana Vlad, Bogdan Badea

Abstract:

In this research, we conduct a Monte Carlo analysis on a two-dimensional χ2 test, which is used to determine the minimum distance required for independent sampling in the context of chaotic signals. We investigate the impact of transforming initial data sets from any probability distribution to new signals with a uniform distribution using the Spearman rank correlation on the χ2 test. This transformation removes the randomness of the data pairs, and as a result, the observed distribution of χ2 test values differs from the expected distribution. We propose a solution to this problem and evaluate it using another chaotic signal.

Keywords: chaotic signals, logistic map, Pearson’s test, Chi Square test, bivariate distribution, statistical independence

Procedia PDF Downloads 102
42938 The Analysis of Emergency Shutdown Valves Torque Data in Terms of Its Use as a Health Indicator for System Prognostics

Authors: Ewa M. Laskowska, Jorn Vatn

Abstract:

Industry 4.0 focuses on digital optimization of industrial processes. The idea is to use extracted data in order to build a decision support model enabling use of those data for real time decision making. In terms of predictive maintenance, the desired decision support tool would be a model enabling prognostics of system's health based on the current condition of considered equipment. Within area of system prognostics and health management, a commonly used health indicator is Remaining Useful Lifetime (RUL) of a system. Because the RUL is a random variable, it has to be estimated based on available health indicators. Health indicators can be of different types and come from different sources. They can be process variables, equipment performance variables, data related to number of experienced failures, etc. The aim of this study is the analysis of performance variables of emergency shutdown valves (ESV) used in oil and gas industry. ESV is inspected periodically, and at each inspection torque and time of valve operation are registered. The data will be analyzed by means of machine learning or statistical analysis. The purpose is to investigate whether the available data could be used as a health indicator for a prognostic purpose. The second objective is to examine what is the most efficient way to incorporate the data into predictive model. The idea is to check whether the data can be applied in form of explanatory variables in Markov process or whether other stochastic processes would be a more convenient to build an RUL model based on the information coming from registered data.

Keywords: emergency shutdown valves, health indicator, prognostics, remaining useful lifetime, RUL

Procedia PDF Downloads 96
42937 A Comparative Study on Automatic Feature Classification Methods of Remote Sensing Images

Authors: Lee Jeong Min, Lee Mi Hee, Eo Yang Dam

Abstract:

Geospatial feature extraction is a very important issue in the remote sensing research. In the meantime, the image classification based on statistical techniques, but, in recent years, data mining and machine learning techniques for automated image processing technology is being applied to remote sensing it has focused on improved results generated possibility. In this study, artificial neural network and decision tree technique is applied to classify the high-resolution satellite images, as compared to the MLC processing result is a statistical technique and an analysis of the pros and cons between each of the techniques.

Keywords: remote sensing, artificial neural network, decision tree, maximum likelihood classification

Procedia PDF Downloads 351
42936 Introduction of Robust Multivariate Process Capability Indices

Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian

Abstract:

Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.

Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control

Procedia PDF Downloads 288