Search results for: statistical data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 41552

Search results for: statistical data analysis

41492 Immobilization of Lipase Enzyme by Low Cost Material: A Statistical Approach

Authors: Md. Z. Alam, Devi R. Asih, Md. N. Salleh

Abstract:

Immobilization of lipase enzyme produced from palm oil mill effluent (POME) by the activated carbon (AC) among the low cost support materials was optimized. The results indicated that immobilization of 94% was achieved by AC as the most suitable support material. A sequential optimization strategy based on a statistical experimental design, including one-factor-at-a-time (OFAT) method was used to determine the equilibrium time. Three components influencing lipase immobilization were optimized by the response surface methodology (RSM) based on the face-centered central composite design (FCCCD). On the statistical analysis of the results, the optimum enzyme concentration loading, agitation rate and carbon active dosage were found to be 30 U/ml, 300 rpm and 8 g/L respectively, with a maximum immobilization activity of 3732.9 U/g-AC after 2 hrs of immobilization. Analysis of variance (ANOVA) showed a high regression coefficient (R2) of 0.999, which indicated a satisfactory fit of the model with the experimental data. The parameters were statistically significant at p<0.05.

Keywords: activated carbon, POME based lipase, immobilization, adsorption

Procedia PDF Downloads 228
41491 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators

Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros

Abstract:

Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.

Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis

Procedia PDF Downloads 117
41490 Statistical Analysis of Surface Roughness and Tool Life Using (RSM) in Face Milling

Authors: Mohieddine Benghersallah, Lakhdar Boulanouar, Salim Belhadi

Abstract:

Currently, higher production rate with required quality and low cost is the basic principle in the competitive manufacturing industry. This is mainly achieved by using high cutting speed and feed rates. Elevated temperatures in the cutting zone under these conditions shorten tool life and adversely affect the dimensional accuracy and surface integrity of component. Thus it is necessary to find optimum cutting conditions (cutting speed, feed rate, machining environment, tool material and geometry) that can produce components in accordance with the project and having a relatively high production rate. Response surface methodology is a collection of mathematical and statistical techniques that are useful for modelling and analysis of problems in which a response of interest is influenced by several variables and the objective is to optimize this response. The work presented in this paper examines the effects of cutting parameters (cutting speed, feed rate and depth of cut) on to the surface roughness through the mathematical model developed by using the data gathered from a series of milling experiments performed.

Keywords: Statistical analysis (RSM), Bearing steel, Coating inserts, Tool life, Surface Roughness, End milling.

Procedia PDF Downloads 410
41489 Shock Compressibility of Iron Alloys Calculated in the Framework of Quantum-Statistical Models

Authors: Maxim A. Kadatskiy, Konstantin V. Khishchenko

Abstract:

Iron alloys are widespread components in various types of structural materials which are exposed to intensive thermal and mechanical loads. Various quantum-statistical cell models with the approximation of self-consistent field can be used for the prediction of the behavior of these materials under extreme conditions. The application of these models is even more valid, the higher the temperature and the density of matter. Results of Hugoniot calculation for iron alloys in the framework of three quantum-statistical (the Thomas–Fermi, the Thomas–Fermi with quantum and exchange corrections and the Hartree–Fock–Slater) models are presented. Results of quantum-statistical calculations are compared with results from other reliable models and available experimental data. It is revealed a good agreement between results of calculation and experimental data for terra pascal pressures. Advantages and disadvantages of this approach are shown.

Keywords: alloy, Hugoniot, iron, terapascal pressure

Procedia PDF Downloads 324
41488 Impact of Crises on Official Statistics: Environmental Statistics at Statistical Centre for the Cooperation Council for the Arab Countries of the Gulf during the COVID-19 Pandemic: A Case Study

Authors: Ibtihaj Al-Siyabi

Abstract:

The crisis of COVID-19 posed enormous challenges to the statistical providers. While official statistics were disrupted by the pandemic and related containment measures, there was a growing and pressing need for real-time data and statistics to inform decisions. This paper gives an account of the way the pandemic impacted the operations of the National Statistical Offices (NSOs) in general in terms of data collection and methods used and the main challenges encountered by them based on international surveys. It highlights the performance of the Statistical Centre for the Cooperation Council for the Arab Countries of the Gulf, GCC-STAT, and its responsiveness to the pandemic placing special emphasis on environmental statistics. The paper concludes by confirming the GCC-STAT’s resilience and success in facing the challenges.

Keywords: NSO, COVID-19, statistics, crisis, pandemic

Procedia PDF Downloads 110
41487 Accelerating Side Channel Analysis with Distributed and Parallelized Processing

Authors: Kyunghee Oh, Dooho Choi

Abstract:

Although there is no theoretical weakness in a cryptographic algorithm, Side Channel Analysis can find out some secret data from the physical implementation of a cryptosystem. The analysis is based on extra information such as timing information, power consumption, electromagnetic leaks or even sound which can be exploited to break the system. Differential Power Analysis is one of the most popular analyses, as computing the statistical correlations of the secret keys and power consumptions. It is usually necessary to calculate huge data and takes a long time. It may take several weeks for some devices with countermeasures. We suggest and evaluate the methods to shorten the time to analyze cryptosystems. Our methods include distributed computing and parallelized processing.

Keywords: DPA, distributed computing, parallelized processing, side channel analysis

Procedia PDF Downloads 402
41486 Time-Domain Analysis of Pulse Parameters Effects on Crosstalk in High-Speed Circuits

Authors: Loubna Tani, Nabih Elouzzani

Abstract:

Crosstalk among interconnects and printed-circuit board (PCB) traces is a major limiting factor of signal quality in high-speed digital and communication equipments especially when fast data buses are involved. Such a bus is considered as a planar multiconductor transmission line. This paper will demonstrate how the finite difference time domain (FDTD) method provides an exact solution of the transmission-line equations to analyze the near end and the far end crosstalk. In addition, this study makes it possible to analyze the rise time effect on the near and far end voltages of the victim conductor. The paper also discusses a statistical analysis, based upon a set of several simulations. Such analysis leads to a better understanding of the phenomenon and yields useful information.

Keywords: multiconductor transmission line, crosstalk, finite difference time domain (FDTD), printed-circuit board (PCB), rise time, statistical analysis

Procedia PDF Downloads 413
41485 Implementation of Achterbahn-128 for Images Encryption and Decryption

Authors: Aissa Belmeguenai, Khaled Mansouri

Abstract:

In this work, an efficient implementation of Achterbahn-128 for images encryption and decryption was introduced. The implementation for this simulated project is written by MATLAB.7.5. At first two different original images are used for validate the proposed design. Then our developed program was used to transform the original images data into image digits file. Finally, we used our implemented program to encrypt and decrypt images data. Several tests are done for proving the design performance including visual tests and security analysis; we discuss the security analysis of the proposed image encryption scheme including some important ones like key sensitivity analysis, key space analysis, and statistical attacks.

Keywords: Achterbahn-128, stream cipher, image encryption, security analysis

Procedia PDF Downloads 513
41484 TDApplied: An R Package for Machine Learning and Inference with Persistence Diagrams

Authors: Shael Brown, Reza Farivar

Abstract:

Persistence diagrams capture valuable topological features of datasets that other methods cannot uncover. Still, their adoption in data pipelines has been limited due to the lack of publicly available tools in R (and python) for analyzing groups of them with machine learning and statistical inference. In an easy-to-use and scalable R package called TDApplied, we implement several applied analysis methods tailored to groups of persistence diagrams. The two main contributions of our package are comprehensiveness (most functions do not have implementations elsewhere) and speed (shown through benchmarking against other R packages). We demonstrate applications of the tools on simulated data to illustrate how easily practical analyses of any dataset can be enhanced with topological information.

Keywords: machine learning, persistence diagrams, R, statistical inference

Procedia PDF Downloads 59
41483 A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India

Authors: Nida Rizvi, Deeksha Katyal, Varun Joshi

Abstract:

River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

Keywords: cluster analysis, multivariate statistical techniques, river Hindon, water quality

Procedia PDF Downloads 441
41482 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: share of electricity generation, k-means clustering, discriminant, CO2 emission

Procedia PDF Downloads 399
41481 A Proposed Algorithm for Obtaining the Map of Subscribers’ Density Distribution for a Mobile Wireless Communication Network

Authors: C. Temaneh-Nyah, F. A. Phiri, D. Karegeya

Abstract:

This paper presents an algorithm for obtaining the map of subscriber’s density distribution for a mobile wireless communication network based on the actual subscriber's traffic data obtained from the base station. This is useful in statistical characterization of the mobile wireless network.

Keywords: electromagnetic compatibility, statistical analysis, simulation of communication network, subscriber density

Procedia PDF Downloads 296
41480 On-Line Data-Driven Multivariate Statistical Prediction Approach to Production Monitoring

Authors: Hyun-Woo Cho

Abstract:

Detection of incipient abnormal events in production processes is important to improve safety and reliability of manufacturing operations and reduce losses caused by failures. The construction of calibration models for predicting faulty conditions is quite essential in making decisions on when to perform preventive maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of process measurement data. The calibration model is used to predict faulty conditions from historical reference data. This approach utilizes variable selection techniques, and the predictive performance of several prediction methods are evaluated using real data. The results shows that the calibration model based on supervised probabilistic model yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.

Keywords: calibration model, monitoring, quality improvement, feature selection

Procedia PDF Downloads 340
41479 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors

Authors: Katawut Kaewbanjong

Abstract:

We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.

Keywords: prediction model, statistical analysis, software project, user satisfaction factor

Procedia PDF Downloads 97
41478 A Study on Big Data Analytics, Applications and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 60
41477 A Study on Big Data Analytics, Applications, and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 76
41476 Drying Kinects of Soybean Seeds

Authors: Amanda Rithieli Pereira Dos Santos, Rute Quelvia De Faria, Álvaro De Oliveira Cardoso, Anderson Rodrigo Da Silva, Érica Leão Fernandes Araújo

Abstract:

The study of the kinetics of drying has great importance for the mathematical modeling, allowing to know about the processes of transference of heat and mass between the products and to adjust dryers managing new technologies for these processes. The present work had the objective of studying the kinetics of drying of soybean seeds and adjusting different statistical models to the experimental data varying cultivar and temperature. Soybean seeds were pre-dried in a natural environment in order to reduce and homogenize the water content to the level of 14% (b.s.). Then, drying was carried out in a forced air circulation oven at controlled temperatures of 38, 43, 48, 53 and 58 ± 1 ° C, using two soybean cultivars, BRS 8780 and Sambaíba, until reaching a hygroscopic equilibrium. The experimental design was completely randomized in factorial 5 x 2 (temperature x cultivar) with 3 replicates. To the experimental data were adjusted eleven statistical models used to explain the drying process of agricultural products. Regression analysis was performed using the least squares Gauss-Newton algorithm to estimate the parameters. The degree of adjustment was evaluated from the analysis of the coefficient of determination (R²), the adjusted coefficient of determination (R² Aj.) And the standard error (S.E). The models that best represent the drying kinetics of soybean seeds are those of Midilli and Logarítmico.

Keywords: curve of drying seeds, Glycine max L., moisture ratio, statistical models

Procedia PDF Downloads 600
41475 Evaluation of the Mechanical Behavior of a Retaining Wall Structure on a Weathered Soil through Probabilistic Methods

Authors: P. V. S. Mascarenhas, B. C. P. Albuquerque, D. J. F. Campos, L. L. Almeida, V. R. Domingues, L. C. S. M. Ozelim

Abstract:

Retaining slope structures are increasingly considered in geotechnical engineering projects due to extensive urban cities growth. These kinds of engineering constructions may present instabilities over the time and may require reinforcement or even rebuilding of the structure. In this context, statistical analysis is an important tool for decision making regarding retaining structures. This study approaches the failure probability of the construction of a retaining wall over the debris of an old and collapsed one. The new solution’s extension length will be of approximately 350 m and will be located over the margins of the Lake Paranoá, Brasilia, in the capital of Brazil. The building process must also account for the utilization of the ruins as a caisson. A series of in situ and laboratory experiments defined local soil strength parameters. A Standard Penetration Test (SPT) defined the in situ soil stratigraphy. Also, the parameters obtained were verified using soil data from a collection of masters and doctoral works from the University of Brasília, which is similar to the local soil. Initial studies show that the concrete wall is the proper solution for this case, taking into account the technical, economic and deterministic analysis. On the other hand, in order to better analyze the statistical significance of the factor-of-safety factors obtained, a Monte Carlo analysis was performed for the concrete wall and two more initial solutions. A comparison between the statistical and risk results generated for the different solutions indicated that a Gabion solution would better fit the financial and technical feasibility of the project.

Keywords: economical analysis, probability of failure, retaining walls, statistical analysis

Procedia PDF Downloads 391
41474 Statistical Model of Water Quality in Estero El Macho, Machala-El Oro

Authors: Rafael Zhindon Almeida

Abstract:

Surface water quality is an important concern for the evaluation and prediction of water quality conditions. The objective of this study is to develop a statistical model that can accurately predict the water quality of the El Macho estuary in the city of Machala, El Oro province. The methodology employed in this study is of a basic type that involves a thorough search for theoretical foundations to improve the understanding of statistical modeling for water quality analysis. The research design is correlational, using a multivariate statistical model involving multiple linear regression and principal component analysis. The results indicate that water quality parameters such as fecal coliforms, biochemical oxygen demand, chemical oxygen demand, iron and dissolved oxygen exceed the allowable limits. The water of the El Macho estuary is determined to be below the required water quality criteria. The multiple linear regression model, based on chemical oxygen demand and total dissolved solids, explains 99.9% of the variance of the dependent variable. In addition, principal component analysis shows that the model has an explanatory power of 86.242%. The study successfully developed a statistical model to evaluate the water quality of the El Macho estuary. The estuary did not meet the water quality criteria, with several parameters exceeding the allowable limits. The multiple linear regression model and principal component analysis provide valuable information on the relationship between the various water quality parameters. The findings of the study emphasize the need for immediate action to improve the water quality of the El Macho estuary to ensure the preservation and protection of this valuable natural resource.

Keywords: statistical modeling, water quality, multiple linear regression, principal components, statistical models

Procedia PDF Downloads 68
41473 Georgia Case: Tourism Expenses of International Visitors on the Basis of Growing Attractiveness

Authors: Nino Abesadze, Marine Mindorashvili, Nino Paresashvili

Abstract:

At present actual tourism indicators cannot be calculated in Georgia, making it impossible to perform their quantitative analysis. Therefore, the study conducted by us is highly important from a theoretical as well as practical standpoint. The main purpose of the article is to make complex statistical analysis of tourist expenses of foreign visitors and to calculate statistical attractiveness indices of the tourism potential of Georgia. During the research, the method involving random and proportional selection has been applied. Computer software SPSS was used to compute statistical data for corresponding analysis. Corresponding methodology of tourism statistics was implemented according to international standards. Important information was collected and grouped from major Georgian airports, and a representative population of foreign visitors and a rule of selection of respondents were determined. The results show a trend of growth in tourist numbers and the share of tourists from post-soviet countries are constantly increasing. The level of satisfaction with tourist facilities and quality of service has improved, but still we have a problem of disparity between the service quality and the prices. The design of tourist expenses of foreign visitors is diverse; competitiveness of tourist products of Georgian tourist companies is higher. Attractiveness of popular cities of Georgia has increased by 43%.

Keywords: tourist, expenses, indexes, statistics, analysis

Procedia PDF Downloads 311
41472 Aspects of Tone in the Educated Nigeria Accent of English

Authors: Nkereke Essien

Abstract:

The study seeks to analyze tone in the Educated Nigerian accent of English (ENAE) using the three tones: Low (L), High (H) and Low-High (LH). The aim is to find out whether there are any differences or similarities in the performance of the experimental group and the control. To achieve this, twenty educated Nigerian speakers of English who are educated in the language were selected by a Stratified Random Sampling (SRS) technique from two federal universities in Nigeria. They were given a passage to read and their intonation patterns were compared with that of a native speaker (control). The data were analyzed using Pierrehumbert’s (1980) intonation system of analysis. Three different approaches were employed in the analysis of the intonation Phrase (IP) as used by Pierrehumbert: perceptual, statistical and acoustic. We first analyzed our data from the passage and utterances using Willcoxon Matched Pairs Signs Ranks Test to establish the differences between the performance of the experimental group and the control. Then, the one-way Analysis of variance (ANOVA) statistical and Tukey-Krammar Post Hoc Tests were used to test for any significant difference in the performances of the twenty subjects. The acoustic data were presented to corroborate both the perceptual and statistical findings. Finally, the tonal patterns of the selected subjects in the three categories - A, B, C, were compared with those of the control. Our findings revealed that the tonal pattern of the Educated Nigerian Accent of English (ENAE) is significantly different from the tonal pattern of the Standard British Accent of English (SBAE) as represented by the control. A high preference for unidirectional tones, especially, the high tones was observed in the performance of the experimental group. Also, high tones do not necessarily correspond to stressed syllables and low tones to unstressed syllables.

Keywords: accent, intonation phrase (IP), tonal patterns, tone

Procedia PDF Downloads 203
41471 Analysis of Operating Speed on Four-Lane Divided Highways under Mixed Traffic Conditions

Authors: Chaitanya Varma, Arpan Mehar

Abstract:

The present study demonstrates the procedure to analyse speed data collected on various four-lane divided sections in India. Field data for the study was collected at different straight and curved sections on rural highways with the help of radar speed gun and video camera. The data collected at the sections were analysed and parameters pertain to speed distributions were estimated. The different statistical distribution was analysed on vehicle type speed data and for mixed traffic speed data. It was found that vehicle type speed data was either follows the normal distribution or Log-normal distribution, whereas the mixed traffic speed data follows more than one type of statistical distribution. The most common fit observed on mixed traffic speed data were Beta distribution and Weibull distribution. The separate operating speed model based on traffic and roadway geometric parameters were proposed in the present study. The operating speed model with traffic parameters and curve geometry parameters were established. Two different operating speed models were proposed with variables 1/R and Ln(R) and were found to be realistic with a different range of curve radius. The models developed in the present study are simple and realistic and can be used for forecasting operating speed on four-lane highways.

Keywords: highway, mixed traffic flow, modeling, operating speed

Procedia PDF Downloads 444
41470 Empirical Study of Running Correlations in Exam Marks: Same Statistical Pattern as Chance

Authors: Weisi Guo

Abstract:

It is well established that there may be running correlations in sequential exam marks due to students sitting in the order of course registration patterns. As such, a random and non-sequential sampling of exam marks is a standard recommended practice. Here, the paper examines a large number of exam data stretching several years across different modules to see the degree to which it is true. Using the real mark distribution as a generative process, it was found that random simulated data had no more sequential randomness than the real data. That is to say, the running correlations that one often observes are statistically identical to chance. Digging deeper, it was found that some high running correlations have students that indeed share a common course history and make similar mistakes. However, at the statistical scale of a module question, the combined effect is statistically similar to the random shuffling of papers. As such, there may not be the need to take random samples for marks, but it still remains good practice to mark papers in a random sequence to reduce the repetitive marking bias and errors.

Keywords: data analysis, empirical study, exams, marking

Procedia PDF Downloads 162
41469 Students' Statistical Reasoning and Attitudes towards Statistics in Blended Learning, E-Learning and On-Campus Learning

Authors: Petros Roussos

Abstract:

The present study focused on students' statistical reasoning related to Null Hypothesis Statistical Testing and p-values. Its objective was to test the hypothesis that neither the place (classroom, at a distance, online) nor the medium that actually supports the learning (ICT, internet, books) has an effect on understanding of statistical concepts. In addition, it was expected that students' attitudes towards statistics would not predict understanding of statistical concepts. The sample consisted of 385 undergraduate and postgraduate students from six state and private universities (five in Greece and one in Cyprus). Students were administered two questionnaires: a) the Greek version of the Survey of Attitudes Toward Statistics, and b) a short instrument which measures students' understanding of statistical significance and p-values. Results suggest that attitudes towards statistics do not predict students' understanding of statistical concepts, whereas the medium did not have an effect.

Keywords: attitudes towards statistics, blended learning, e-learning, statistical reasoning

Procedia PDF Downloads 291
41468 Analyzing the Relationship between the Spatial Characteristics of Cultural Structure, Activities, and the Tourism Demand

Authors: Deniz Karagöz

Abstract:

This study is attempt to comprehend the relationship between the spatial characteristics of cultural structure, activities and the tourism demand in Turkey. The analysis divided into four parts. The first part consisted of a cultural structure and cultural activity (CSCA) index provided by principal component analysis. The analysis determined four distinct dimensions, namely, cultural activity/structure, accessing culture, consumption, and cultural management. The exploratory spatial data analysis employed to determine the spatial models of cultural structure and cultural activities in 81 provinces in Turkey. Global Moran I indices is used to ascertain the cultural activities and the structural clusters. Finally, the relationship between the cultural activities/cultural structure and tourism demand was analyzed. The raw/original data of the study official databases. The data on the cultural structure and activities gathered from the Turkish Statistical Institute and the data related to the tourism demand was provided by the Republic of Turkey Ministry of Culture and Tourism.

Keywords: cultural activities, cultural structure, spatial characteristics, tourism demand, Turkey

Procedia PDF Downloads 538
41467 Transportation Accidents Mortality Modeling in Thailand

Authors: W. Sriwattanapongse, S. Prasitwattanaseree, S. Wongtrangan

Abstract:

The transportation accidents mortality is a major problem that leads to loss of human lives, and economic. The objective was to identify patterns of statistical modeling for estimating mortality rates due to transportation accidents in Thailand by using data from 2000 to 2009. The data was taken from the death certificate, vital registration database. The number of deaths and mortality rates were computed classifying by gender, age, year and region. There were 114,790 cases of transportation accidents deaths. The highest average age-specific transport accident mortality rate is 3.11 per 100,000 per year in males, Southern region and the lowest average age-specific transport accident mortality rate is 1.79 per 100,000 per year in females, North-East region. Linear, poisson and negative binomial models were chosen for fitting statistical model. Among the models fitted, the best was chosen based on the analysis of deviance and AIC. The negative binomial model was clearly appropriate fitted.

Keywords: transportation accidents, mortality, modeling, analysis of deviance

Procedia PDF Downloads 227
41466 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 144
41465 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 131
41464 Statistical Characteristics of Code Formula for Design of Concrete Structures

Authors: Inyeol Paik, Ah-Ryang Kim

Abstract:

In this research, a statistical analysis is carried out to examine the statistical properties of the formula given in the design code for concrete structures. The design formulas of the Korea highway bridge design code - the limit state design method (KHBDC) which is the current national bridge design code and the design code for concrete structures by Korea Concrete Institute (KCI) are applied for the analysis. The safety levels provided by the strength formulas of the design codes are defined based on the probabilistic and statistical theory.KHBDC is a reliability-based design code. The load and resistance factors of this code were calibrated to attain the target reliability index. It is essential to define the statistical properties for the design formulas in this calibration process. In general, the statistical characteristics of a member strength are due to the following three factors. The first is due to the difference between the material strength of the actual construction and that used in the design calculation. The second is the difference between the actual dimensions of the constructed sections and those used in design calculation. The third is the difference between the strength of the actual member and the formula simplified for the design calculation. In this paper, the statistical study is focused on the third difference. The formulas for calculating the shear strength of concrete members are presented in different ways in KHBDC and KCI. In this study, the statistical properties of design formulas were obtained through comparison with the database which comprises the experimental results from the reference publications. The test specimen was either reinforced with the shear stirrup or not. For an applied database, the bias factor was about 1.12 and the coefficient of variation was about 0.18. By applying the statistical properties of the design formula to the reliability analysis, it is shown that the resistance factors of the current design codes satisfy the target reliability indexes of both codes. Also, the minimum resistance factors of the KHBDC which is written in the material resistance factor format and KCE which is in the member resistance format are obtained and the results are presented. A further research is underway to calibrate the resistance factors of the high strength and high-performance concrete design guide.

Keywords: concrete design code, reliability analysis, resistance factor, shear strength, statistical property

Procedia PDF Downloads 299
41463 Impact of Gaming Environment in Education

Authors: Md. Ataur Rahman Bhuiyan, Quazi Mahabubul Hasan, Md. Rifat Ullah

Abstract:

In this research, we did explore the effectiveness of the gaming environment in education and compared it with the traditional education system. We take several workshops in both learning environments. We measured student’s performance by providing a grading score (by professional academics) on their attitude in different criteria. We also collect data from survey questionnaires to understand student’s experiences towards education and study. Finally, we examine the impact of the different learning environments by applying statistical hypothesis tests, the T-test, and the ANOVA test.

Keywords: gamification, game-based learning, education, statistical analysis, human-computer interaction

Procedia PDF Downloads 201