Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 43115

Search results for: statistical data analysis

43025 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 174

43024 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 165

43023 Impact of Climate on Sugarcane Yield Over Belagavi District, Karnataka Using Statistical Mode

Authors: Girish Chavadappanavar

Abstract:

The impact of climate on agriculture could result in problems with food security and may threaten the livelihood activities upon which much of the population depends. In the present study, the development of a statistical yield forecast model has been carried out for sugarcane production over Belagavi district, Karnataka using weather variables of crop growing season and past observed yield data for the period of 1971 to 2010. The study shows that this type of statistical yield forecast model could efficiently forecast yield 5 weeks and even 10 weeks in advance of the harvest for sugarcane within an acceptable limit of error. The performance of the model in predicting yields at the district level for sugarcane crops is found quite satisfactory for both validation (2007 and 2008) as well as forecasting (2009 and 2010).In addition to the above study, the climate variability of the area has also been studied, and hence, the data series was tested for Mann Kendall Rank Statistical Test. The maximum and minimum temperatures were found to be significant with opposite trends (decreasing trend in maximum and increasing in minimum temperature), while the other three are found in significant with different trends (rainfall and evening time relative humidity with increasing trend and morning time relative humidity with decreasing trend).

Keywords: climate impact, regression analysis, yield and forecast model, sugar models

Procedia PDF Downloads 76

43022 Regional Flood-Duration-Frequency Models for Norway

Authors: Danielle M. Barna, Kolbjørn Engeland, Thordis Thorarinsdottir, Chong-Yu Xu

Abstract:

Design flood values give estimates of flood magnitude within a given return period and are essential to making adaptive decisions around land use planning, infrastructure design, and disaster mitigation. Often design flood values are needed at locations with insufficient data. Additionally, in hydrologic applications where flood retention is important (e.g., floodplain management and reservoir design), design flood values are required at different flood durations. A statistical approach to this problem is a development of a regression model for extremes where some of the parameters are dependent on flood duration in addition to being covariate-dependent. In hydrology, this is called a regional flood-duration-frequency (regional-QDF) model. Typically, the underlying statistical distribution is chosen to be the Generalized Extreme Value (GEV) distribution. However, as the support of the GEV distribution depends on both its parameters and the range of the data, special care must be taken with the development of the regional model. In particular, we find that the GEV is problematic when developing a GAMLSS-type analysis due to the difficulty of proposing a link function that is independent of the unknown parameters and the observed data. We discuss these challenges in the context of developing a regional QDF model for Norway.

Keywords: design flood values, bayesian statistics, regression modeling of extremes, extreme value analysis, GEV

Procedia PDF Downloads 75

43021 Using Business Intelligence Capabilities to Improve the Quality of Decision-Making: A Case Study of Mellat Bank

Authors: Jalal Haghighat Monfared, Zahra Akbari

Abstract:

Today, business executives need to have useful information to make better decisions. Banks have also been using information tools so that they can direct the decision-making process in order to achieve their desired goals by rapidly extracting information from sources with the help of business intelligence. The research seeks to investigate whether there is a relationship between the quality of decision making and the business intelligence capabilities of Mellat Bank. Each of the factors studied is divided into several components, and these and their relationships are measured by a questionnaire. The statistical population of this study consists of all managers and experts of Mellat Bank's General Departments (including 190 people) who use commercial intelligence reports. The sample size of this study was 123 randomly determined by statistical method. In this research, relevant statistical inference has been used for data analysis and hypothesis testing. In the first stage, using the Kolmogorov-Smirnov test, the normalization of the data was investigated and in the next stage, the construct validity of both variables and their resulting indexes were verified using confirmatory factor analysis. Finally, using the structural equation modeling and Pearson's correlation coefficient, the research hypotheses were tested. The results confirmed the existence of a positive relationship between decision quality and business intelligence capabilities in Mellat Bank. Among the various capabilities, including data quality, correlation with other systems, user access, flexibility and risk management support, the flexibility of the business intelligence system was the most correlated with the dependent variable of the present research. This shows that it is necessary for Mellat Bank to pay more attention to choose the required business intelligence systems with high flexibility in terms of the ability to submit custom formatted reports. Subsequently, the quality of data on business intelligence systems showed the strongest relationship with quality of decision making. Therefore, improving the quality of data, including the source of data internally or externally, the type of data in quantitative or qualitative terms, the credibility of the data and perceptions of who uses the business intelligence system, improves the quality of decision making in Mellat Bank.

Keywords: business intelligence, business intelligence capability, decision making, decision quality

Procedia PDF Downloads 115

43020 R Data Science for Technology Management

Authors: Sunghae Jun

Abstract:

Technology management (TM) is important issue in a company improving the competitiveness. Among many activities of TM, technology analysis (TA) is important factor, because most decisions for management of technology are decided by the results of TA. TA is to analyze the developed results of target technology using statistics or Delphi. TA based on Delphi is depended on the experts’ domain knowledge, in comparison, TA by statistics and machine learning algorithms use objective data such as patent or paper instead of the experts’ knowledge. Many quantitative TA methods based on statistics and machine learning have been studied, and these have been used for technology forecasting, technological innovation, and management of technology. They applied diverse computing tools and many analytical methods case by case. It is not easy to select the suitable software and statistical method for given TA work. So, in this paper, we propose a methodology for quantitative TA using statistical computing software called R and data science to construct a general framework of TA. From the result of case study, we also show how our methodology is applied to real field. This research contributes to R&D planning and technology valuation in TM areas.

Keywords: technology management, R system, R data science, statistics, machine learning

Procedia PDF Downloads 460

43019 Analysis of Lead Time Delays in Supply Chain: A Case Study

Authors: Abdel-Aziz M. Mohamed, Nermeen Coutry

Abstract:

Lead time is an important measure of supply chain performance. It impacts both customer satisfactions as well as the total cost of inventory. This paper presents the result of a study on the analysis of the customer order lead-time for a multinational company. In the study, the lead time was divided into three stages: order entry, order fulfillment, and order delivery. A sample of size 2,425 order lines from the company records were considered for this study. The sample data includes information regarding customer orders from the time of order entry until order delivery. Data regarding the lead time of each sage for different orders were also provided. Summary statistics on lead time data reveals that about 30% of the orders were delivered after the scheduled due date. The result of the multiple linear regression analysis technique revealed that component type, logistics parameter, order size and the customer type have significant impact on lead time. Data analysis on the stages of lead time indicates that stage 2 consumes over 50% of the lead time. Pareto analysis was made to study the reasons for the customer order delay in each of the 3 stages. Recommendation was given to resolve the problem.

Keywords: lead time reduction, customer satisfaction, service quality, statistical analysis

Procedia PDF Downloads 737

43018 Analysis and Prediction of Netflix Viewing History Using Netflixlatte as an Enriched Real Data Pool

Authors: Amir Mabhout, Toktam Ghafarian, Amirhossein Farzin, Zahra Makki, Sajjad Alizadeh, Amirhossein Ghavi

Abstract:

The high number of Netflix subscribers makes it attractive for data scientists to extract valuable knowledge from the viewers' behavioural analyses. This paper presents a set of statistical insights into viewers' viewing history. After that, a deep learning model is used to predict the future watching behaviour of the users based on previous watching history within the Netflixlatte data pool. Netflixlatte in an aggregated and anonymized data pool of 320 Netflix viewers with a length 250 000 data points recorded between 2008-2022. We observe insightful correlations between the distribution of viewing time and the COVID-19 pandemic outbreak. The presented deep learning model predicts future movie and TV series viewing habits with an average loss of 0.175.

Keywords: data analysis, deep learning, LSTM neural network, netflix

Procedia PDF Downloads 264

43017 GPS Refinement in Cities Using Statistical Approach

Authors: Ashwani Kumar

Abstract:

GPS plays an important role in everyday life for safe and convenient transportation. While pedestrians use hand held devices to know their position in a city, vehicles in intelligent transport systems use relatively sophisticated GPS receivers for estimating their current position. However, in urban areas where the GPS satellites are occluded by tall buildings, trees and reflections of GPS signals from nearby vehicles, GPS position estimation becomes poor. In this work, an exhaustive GPS data is collected at a single point in urban area under different times of day and under dynamic environmental conditions. The data is analyzed and statistical refinement methods are used to obtain optimal position estimate among all the measured positions. The results obtained are compared with publically available datasets and obtained position estimation refinement results are promising.

Keywords: global positioning system, statistical approach, intelligent transport systems, least squares estimation

Procedia PDF Downloads 290

43016 Space Telemetry Anomaly Detection Based On Statistical PCA Algorithm

Authors: Bassem Nassar, Wessam Hussein, Medhat Mokhtar

Abstract:

The crucial concern of satellite operations is to ensure the health and safety of satellites. The worst case in this perspective is probably the loss of a mission but the more common interruption of satellite functionality can result in compromised mission objectives. All the data acquiring from the spacecraft are known as Telemetry (TM), which contains the wealth information related to the health of all its subsystems. Each single item of information is contained in a telemetry parameter, which represents a time-variant property (i.e. a status or a measurement) to be checked. As a consequence, there is a continuous improvement of TM monitoring systems in order to reduce the time required to respond to changes in a satellite's state of health. A fast conception of the current state of the satellite is thus very important in order to respond to occurring failures. Statistical multivariate latent techniques are one of the vital learning tools that are used to tackle the aforementioned problem coherently. Information extraction from such rich data sources using advanced statistical methodologies is a challenging task due to the massive volume of data. To solve this problem, in this paper, we present a proposed unsupervised learning algorithm based on Principle Component Analysis (PCA) technique. The algorithm is particularly applied on an actual remote sensing spacecraft. Data from the Attitude Determination and Control System (ADCS) was acquired under two operation conditions: normal and faulty states. The models were built and tested under these conditions and the results shows that the algorithm could successfully differentiate between these operations conditions. Furthermore, the algorithm provides competent information in prediction as well as adding more insight and physical interpretation to the ADCS operation.

Keywords: space telemetry monitoring, multivariate analysis, PCA algorithm, space operations

Procedia PDF Downloads 420

43015 Statistical Convergence for the Approximation of Linear Positive Operators

Authors: Neha Bhardwaj

Abstract:

In this paper, we consider positive linear operators and study the Voronovskaya type result of the operator then obtain an error estimate in terms of the higher order modulus of continuity of the function being approximated and its A-statistical convergence. Also, we compute the corresponding rate of A-statistical convergence for the linear positive operators.

Keywords: Poisson distribution, Voronovskaya, modulus of continuity, a-statistical convergence

Procedia PDF Downloads 337

43014 Statistical Shape Analysis of the Human Upper Airway

Authors: Ramkumar Gunasekaran, John Cater, Vinod Suresh, Haribalan Kumar

Abstract:

The main objective of this project is to develop a statistical shape model using principal component analysis that could be used for analyzing the shape of the human airway. The ultimate goal of this project is to identify geometric risk factors for diagnosis and management of Obstructive Sleep Apnoea (OSA). Anonymous CBCT scans of 25 individuals were obtained from the Otago Radiology Group. The airways were segmented between the hard-palate and the aryepiglottic fold using snake active contour segmentation. The point data cloud of the segmented images was then fitted with a bi-cubic mesh, and pseudo landmarks were placed to perform PCA on the segmented airway to analyze the shape of the airway and to find the relationship between the shape and OSA risk factors. From the PCA results, the first four modes of variation were found to be significant. Mode 1 was interpreted to be the overall length of the airway, Mode 2 was related to the anterior-posterior width of the retroglossal region, Mode 3 was related to the lateral dimension of the oropharyngeal region and Mode 4 was related to the anterior-posterior width of the oropharyngeal region. All these regions are subjected to the risk factors of OSA.

Keywords: medical imaging, image processing, FEM/BEM, statistical modelling

Procedia PDF Downloads 518

43013 Forecasting the Influences of Information and Communication Technology on the Structural Changes of Japanese Industrial Sectors: A Study Using Statistical Analysis

Authors: Ubaidillah Zuhdi, Shunsuke Mori, Kazuhisa Kamegai

Abstract:

The purpose of this study is to forecast the influences of Information and Communication Technology (ICT) on the structural changes of Japanese economies based on Leontief Input-Output (IO) coefficients. This study establishes a statistical analysis to predict the future interrelationships among industries. We employ the Constrained Multivariate Regression (CMR) model to analyze the historical changes of input-output coefficients. Statistical significance of the model is then tested by Likelihood Ratio Test (LRT). In our model, ICT is represented by two explanatory variables, i.e. computers (including main parts and accessories) and telecommunications equipment. A previous study, which analyzed the influences of these variables on the structural changes of Japanese industrial sectors from 1985-2005, concluded that these variables had significant influences on the changes in the business circumstances of Japanese commerce, business services and office supplies, and personal services sectors. The projected future Japanese economic structure based on the above forecast generates the differentiated direct and indirect outcomes of ICT penetration.

Keywords: forecast, ICT, industrial structural changes, statistical analysis

Procedia PDF Downloads 378

43012 Application of Stochastic Models to Annual Extreme Streamflow Data

Authors: Karim Hamidi Machekposhti, Hossein Sedghi

Abstract:

This study was designed to find the best stochastic model (using of time series analysis) for annual extreme streamflow (peak and maximum streamflow) of Karkheh River at Iran. The Auto-regressive Integrated Moving Average (ARIMA) model used to simulate these series and forecast those in future. For the analysis, annual extreme streamflow data of Jelogir Majin station (above of Karkheh dam reservoir) for the years 1958–2005 were used. A visual inspection of the time plot gives a little increasing trend; therefore, series is not stationary. The stationarity observed in Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) plots of annual extreme streamflow was removed using first order differencing (d=1) in order to the development of the ARIMA model. Interestingly, the ARIMA(4,1,1) model developed was found to be most suitable for simulating annual extreme streamflow for Karkheh River. The model was found to be appropriate to forecast ten years of annual extreme streamflow and assist decision makers to establish priorities for water demand. The Statistical Analysis System (SAS) and Statistical Package for the Social Sciences (SPSS) codes were used to determinate of the best model for this series.

Keywords: stochastic models, ARIMA, extreme streamflow, Karkheh river

Procedia PDF Downloads 151

43011 A Cross-Dialect Statistical Analysis of Final Declarative Intonation in Tuvinian

Authors: D. Beziakina, E. Bulgakova

Abstract:

This study continues the research on Tuvinian intonation and presents a general cross-dialect analysis of intonation of Tuvinian declarative utterances, specifically the character of the tone movement in order to test the hypothesis about the prevalence of level tone in some Tuvinian dialects. The results of the analysis of basic pitch characteristics of Tuvinian speech (in general and in comparison with two other Turkic languages - Uzbek and Azerbaijani) are also given in this paper. The goal of our work was to obtain the ranges of pitch parameter values typical for Tuvinian speech. Such language-specific values can be used in speaker identification systems in order to get more accurate results of ethnic speech analysis. We also present the results of a cross-dialect analysis of declarative intonation in the poorly studied Tuvinian language.

Keywords: speech analysis, statistical analysis, speaker recognition, identification of person

Procedia PDF Downloads 476

43010 Improving Road Infrastructure Safety Management Through Statistical Analysis of Road Accident Data. Case Study: Streets in Bucharest

Authors: Dimitriu Corneliu-Ioan, Gheorghe FrațIlă

Abstract:

Romania has one of the highest rates of road deaths among European Union Member States, and there is a concern that the country will not meet its goal of "zero deaths" by 2050. The European Union also aims to halve the number of people seriously injured in road accidents by 2030. Therefore, there is a need to improve road infrastructure safety management in Romania. The aim of this study is to analyze road accident data through statistical methods to assess the current state of road infrastructure safety in Bucharest. The study also aims to identify trends and make forecasts regarding serious road accidents and their consequences. The objective is to provide insights that can help prioritize measures to increase road safety, particularly in urban areas. The research utilizes statistical analysis methods, including exploratory analysis and descriptive statistics. Databases from the Traffic Police and the Romanian Road Authority are analyzed using Excel. Road risks are compared with the main causes of road accidents to identify correlations. The study emphasizes the need for better quality and more diverse collection of road accident data for effective analysis in the field of road infrastructure engineering. The research findings highlight the importance of prioritizing measures to improve road safety in urban areas, where serious accidents and their consequences are more frequent. There is a correlation between the measures ordered by road safety auditors and the main causes of serious accidents in Bucharest. The study also reveals the significant social costs of road accidents, amounting to approximately 3% of GDP, emphasizing the need for collaboration between local and central administrations in allocating resources for road safety. This research contributes to a clearer understanding of the current road infrastructure safety situation in Romania. The findings provide critical insights that can aid decision-makers in allocating resources efficiently and institutionally cooperating to achieve sustainable road safety. The data used for this study are collected from the Traffic Police and the Romanian Road Authority. The data processing involves exploratory analysis and descriptive statistics using the Excel tool. The analysis allows for a better understanding of the factors contributing to the current road safety situation and helps inform managerial decisions to eliminate or reduce road risks. The study addresses the state of road infrastructure safety in Bucharest and analyzes the trends and forecasts regarding serious road accidents and their consequences. It studies the correlation between road safety measures and the main causes of serious accidents. To improve road safety, cooperation between local and central administrations towards joint financial efforts is important. This research highlights the need for statistical data processing methods to substantiate managerial decisions in road infrastructure management. It emphasizes the importance of improving the quality and diversity of road accident data collection. The research findings provide a critical perspective on the current road safety situation in Romania and offer insights to identify appropriate solutions to reduce the number of serious road accidents in the future.

Keywords: road death rate, strategic objective, serious road accidents, road safety, statistical analysis

Procedia PDF Downloads 90

43009 The Inherent Flaw in the NBA Playoff Structure

Authors: Larry Turkish

Abstract:

Introduction: The NBA is an example of mediocrity and this will be evident in the following paper. The study examines and evaluates the characteristics of the NBA champions. As divisions and playoff teams increase, there is an increase in the probability that the champion originates from the mediocre category. Since it’s inception in 1947, the league has been mediocre and continues to this day. Why does a professional league allow any team with a less than 50% winning percentage into the playoffs? As long as the finances flow into the league, owners will not change the current algorithm. The objective of this paper is to determine if the regular season has meaning in finding an NBA champion. Statistical Analysis: The data originates from the NBA website. The following variables are part of the statistical analysis: Rank, the rank of a team relative to other teams in the league based on the regular season win-loss record; Winning Percentage of a team based on the regular season; Divisions, the number of divisions within the league and Playoff Teams, the number of playoff teams relative to a particular season. The following statistical applications are applied to the data: Pearson Product-Moment Correlation, Analysis of Variance, Factor and Regression analysis. Conclusion: The results indicate that the divisional structure and number of playoff teams results in a negative effect on the winning percentage of playoff teams. It also prevents teams with higher winning percentages from accessing the playoffs. Recommendations: 1. Teams that have a winning percentage greater than 1 standard deviation from the mean from the regular season will have access to playoffs. (Eliminates mediocre teams.) 2. Eliminate Divisions (Eliminates weaker teams from access to playoffs.) 3. Eliminate Conferences (Eliminates weaker teams from access to the playoffs.) 4. Have a balanced regular season schedule, (Reduces the number of regular season games, creates equilibrium, reduces bias) that will reduce the need for load management.

Keywords: alignment, mediocrity, regression, z-score

Procedia PDF Downloads 133

43008 Spatial Data Science for Data Driven Urban Planning: The Youth Economic Discomfort Index for Rome

Authors: Iacopo Testi, Diego Pajarito, Nicoletta Roberto, Carmen Greco

Abstract:

Today, a consistent segment of the world’s population lives in urban areas, and this proportion will vastly increase in the next decades. Therefore, understanding the key trends in urbanization, likely to unfold over the coming years, is crucial to the implementation of sustainable urban strategies. In parallel, the daily amount of digital data produced will be expanding at an exponential rate during the following years. The analysis of various types of data sets and its derived applications have incredible potential across different crucial sectors such as healthcare, housing, transportation, energy, and education. Nevertheless, in city development, architects and urban planners appear to rely mostly on traditional and analogical techniques of data collection. This paper investigates the prospective of the data science field, appearing to be a formidable resource to assist city managers in identifying strategies to enhance the social, economic, and environmental sustainability of our urban areas. The collection of different new layers of information would definitely enhance planners' capabilities to comprehend more in-depth urban phenomena such as gentrification, land use definition, mobility, or critical infrastructural issues. Specifically, the research results correlate economic, commercial, demographic, and housing data with the purpose of defining the youth economic discomfort index. The statistical composite index provides insights regarding the economic disadvantage of citizens aged between 18 years and 29 years, and results clearly display that central urban zones and more disadvantaged than peripheral ones. The experimental set up selected the city of Rome as the testing ground of the whole investigation. The methodology aims at applying statistical and spatial analysis to construct a composite index supporting informed data-driven decisions for urban planning.

Keywords: data science, spatial analysis, composite index, Rome, urban planning, youth economic discomfort index

Procedia PDF Downloads 138

43007 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 77

43006 The Use of Multivariate Statistical and GIS for Characterization Groundwater Quality in Laghouat Region, Algeria

Authors: Rouighi Mustapha, Bouzid Laghaa Souad, Rouighi Tahar

Abstract:

Due to rain Shortage and the increase of population in the last years, wells excavation and groundwater use for different purposes had been increased without any planning. This is a great challenge for our country. Moreover, this scarcity of water resources in this region is unfortunately combined with rapid fresh water resources quality deterioration, due to salinity and contamination processes. Therefore, it is necessary to conduct the studies about groundwater quality in Algeria. In this work consists in the identification of the factors which influence the water quality parameters in Laghouat region by using statistical analysis Principal Component Analysis (PCA), Hierarchical Cluster Analysis (HCA) and geographic information system (GIS) in an attempt to discriminate the sources of the variation of water quality variations. The results of PCA technique indicate that variables responsible for water quality composition are mainly related to soluble salts variables; natural processes and the nature of the rock which modifies significantly the water chemistry. Inferred from the positive correlation between K+ and NO3-, NO3- is believed to be human induced rather than naturally originated. In this study, the multivariate statistical analysis and GIS allows the hydrogeologist to have supplementary tools in the characterization and evaluating of aquifers.

Keywords: cluster, analysis, GIS, groundwater, laghouat, quality

Procedia PDF Downloads 329

43005 Simulations to Predict Solar Energy Potential by ERA5 Application at North Africa

Authors: U. Ali Rahoma, Nabil Esawy, Fawzia Ibrahim Moursy, A. H. Hassan, Samy A. Khalil, Ashraf S. Khamees

Abstract:

The design of any solar energy conversion system requires the knowledge of solar radiation data obtained over a long period. Satellite data has been widely used to estimate solar energy where no ground observation of solar radiation is available, yet there are limitations on the temporal coverage of satellite data. Reanalysis is a “retrospective analysis” of the atmosphere parameters generated by assimilating observation data from various sources, including ground observation, satellites, ships, and aircraft observation with the output of NWP (Numerical Weather Prediction) models, to develop an exhaustive record of weather and climate parameters. The evaluation of the performance of reanalysis datasets (ERA-5) for North Africa against high-quality surface measured data was performed using statistical analysis. The estimation of global solar radiation (GSR) distribution over six different selected locations in North Africa during ten years from the period time 2011 to 2020. The root means square error (RMSE), mean bias error (MBE) and mean absolute error (MAE) of reanalysis data of solar radiation range from 0.079 to 0.222, 0.0145 to 0.198, and 0.055 to 0.178, respectively. The seasonal statistical analysis was performed to study seasonal variation of performance of datasets, which reveals the significant variation of errors in different seasons—the performance of the dataset changes by changing the temporal resolution of the data used for comparison. The monthly mean values of data show better performance, but the accuracy of data is compromised. The solar radiation data of ERA-5 is used for preliminary solar resource assessment and power estimation. The correlation coefficient (R2) varies from 0.93 to 99% for the different selected sites in North Africa in the present research. The goal of this research is to give a good representation for global solar radiation to help in solar energy application in all fields, and this can be done by using gridded data from European Centre for Medium-Range Weather Forecasts ECMWF and producing a new model to give a good result.

Keywords: solar energy, solar radiation, ERA-5, potential energy

Procedia PDF Downloads 216

43004 Statistical Assessment of Models for Determination of Soil–Water Characteristic Curves of Sand Soils

Authors: S. J. Matlan, M. Mukhlisin, M. R. Taha

Abstract:

Characterization of the engineering behavior of unsaturated soil is dependent on the soil-water characteristic curve (SWCC), a graphical representation of the relationship between water content or degree of saturation and soil suction. A reasonable description of the SWCC is thus important for the accurate prediction of unsaturated soil parameters. The measurement procedures for determining the SWCC, however, are difficult, expensive, and time-consuming. During the past few decades, researchers have laid a major focus on developing empirical equations for predicting the SWCC, with a large number of empirical models suggested. One of the most crucial questions is how precisely existing equations can represent the SWCC. As different models have different ranges of capability, it is essential to evaluate the precision of the SWCC models used for each particular soil type for better SWCC estimation. It is expected that better estimation of SWCC would be achieved via a thorough statistical analysis of its distribution within a particular soil class. With this in view, a statistical analysis was conducted in order to evaluate the reliability of the SWCC prediction models against laboratory measurement. Optimization techniques were used to obtain the best-fit of the model parameters in four forms of SWCC equation, using laboratory data for relatively coarse-textured (i.e., sandy) soil. The four most prominent SWCCs were evaluated and computed for each sample. The result shows that the Brooks and Corey model is the most consistent in describing the SWCC for sand soil type. The Brooks and Corey model prediction also exhibit compatibility with samples ranging from low to high soil water content in which subjected to the samples that evaluated in this study.

Keywords: soil-water characteristic curve (SWCC), statistical analysis, unsaturated soil, geotechnical engineering

Procedia PDF Downloads 342

43003 Degumming of Eri Silk Fabric with Ionic Liquid

Authors: Shweta K. Vyas, Rakesh Musale, Sanjeev R. Shukla

Abstract:

Eri silk is a non mulberry silk which is obtained without killing the silkworms and hence it is also known as Ahmisa silk. In the present study, the results on degumming of eri silk with alkaline peroxide have been compared with those obtained by using ionic liquid (IL) 1-Butyl-3-methylimidazolium chloride [BMIM]Cl. Experiments were designed to find out the optimum processing parameters for degumming of eri silk by response surface methodology. The statistical software, Design-Expert 6.0 was used for regression analysis and graphical analysis of the responses obtained by running the set of designed experiments. Analysis of variance (ANOVA) was used to estimate the statistical parameters. The polynomial equation of quadratic order was employed to fit the experimental data. The quality and model terms were evaluated by F-test. Three dimensional surface plots were prepared to study the effect of variables on different responses. The optimum conditions for IL treatment were selected from predicted combinations and the experiments were repeated under these conditions to determine the reproducibility.

Keywords: silk degumming, ionic liquid, response surface methodology, ANOVA

Procedia PDF Downloads 596

43002 Multivariate Analysis of Spectroscopic Data for Agriculture Applications

Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman

Abstract:

In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.

Keywords: Brown rot disease, NIR spectroscopy, potato, random forest

Procedia PDF Downloads 193

43001 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 68

43000 Resistance and Sub-Resistances of RC Beams Subjected to Multiple Failure Modes

Authors: F. Sangiorgio, J. Silfwerbrand, G. Mancini

Abstract:

Geometric and mechanical properties all influence the resistance of RC structures and may, in certain combination of property values, increase the risk of a brittle failure of the whole system. This paper presents a statistical and probabilistic investigation on the resistance of RC beams designed according to Eurocodes 2 and 8, and subjected to multiple failure modes, under both the natural variation of material properties and the uncertainty associated with cross-section and transverse reinforcement geometry. A full probabilistic model based on JCSS Probabilistic Model Code is derived. Different beams are studied through material nonlinear analysis via Monte Carlo simulations. The resistance model is consistent with Eurocode 2. Both a multivariate statistical evaluation and the data clustering analysis of outcomes are then performed. Results show that the ultimate load behaviour of RC beams subjected to flexural and shear failure modes seems to be mainly influenced by the combination of the mechanical properties of both longitudinal reinforcement and stirrups, and the tensile strength of concrete, of which the latter appears to affect the overall response of the system in a nonlinear way. The model uncertainty of the resistance model used in the analysis plays undoubtedly an important role in interpreting results.

Keywords: modelling, Monte Carlo simulations, probabilistic models, data clustering, reinforced concrete members, structural design

Procedia PDF Downloads 474

42999 Qualitative Data Analysis for Health Care Services

Authors: Taner Ersoz, Filiz Ersoz

Abstract:

This study was designed enable application of multivariate technique in the interpretation of categorical data for measuring health care services satisfaction in Turkey. The data was collected from a total of 17726 respondents. The establishment of the sample group and collection of the data were carried out by a joint team from The Ministry of Health and Turkish Statistical Institute (Turk Stat) of Turkey. The multiple correspondence analysis (MCA) was used on the data of 2882 respondents who answered the questionnaire in full. The multiple correspondence analysis indicated that, in the evaluation of health services females, public employees, younger and more highly educated individuals were more concerned and complainant than males, private sector employees, older and less educated individuals. Overall 53 % of the respondents were pleased with the improvements in health care services in the past three years. This study demonstrates the public consciousness in health services and health care satisfaction in Turkey. It was found that most the respondents were pleased with the improvements in health care services over the past three years. Awareness of health service quality increases with education levels. Older individuals and males would appear to have lower expectancies in health services.

Keywords: multiple correspondence analysis, multivariate categorical data, health care services, health satisfaction survey

Procedia PDF Downloads 249

42998 Examining the Attitudes of Pre-School Teachers towards Values Education in Terms of Gender, School Type, Professional Seniority and Location

Authors: Hatice Karakoyun, Mustafa Akdag

Abstract:

This study has been made to examine the attitudes of pre-school teachers towards values education. The study has been made as a general scanning model. The study’s working group contains 108 pre-school teachers who worked in Diyarbakır, Turkey. In this study Values Education Attitude Scale (VEAS), which developed by Yaşaroğlu (2014), was used. In order to analyze the data for sociodemographic structure, percentage and frequency values were examined. The Kolmogorov-Smirnov method was used in determination of the normal distribution of data. During analyzing the data, KolmogorovSimirnov test and the normal curved histograms were examined to determine which statistical analyzes would be applied on the scale and it was found that the distribution was not normal. Thus, the Mann Whitney U analysis technique which is one of the nonparametric statistical analysis techniques were used to test the difference of the scores obtained from the scale in terms of independent variables. According to the analyses, it seems that pre-school teachers’ attitudes toward values education are positive. According to the scale with the highest average, it points out that pre-school teachers think that values education is very important for students’ and children’s future. The variables included in the scale (gender, seniority, age group, education, school type, school place) seem to have no effect on the pre-school teachers’ attitude grades which joined to the study.

Keywords: attitude scale, pedagogy, pre-school teacher, values education

Procedia PDF Downloads 250

42997 A Comparation Analysis of Islamic Bank Efficiency in the United Kingdom and Indonesia during Eurozone Crisis Using Data Envelopment Analysis

Authors: Nisful Laila, Fatin Fadhilah Hasib, Puji Sucia Sukmaningrum, Achsania Hendratmi

Abstract:

The purpose of this study is to determine and comparing the level of efficiency of Islamic Banks in Indonesia and United Kingdom during eurozone sovereign debt crisis. This study using a quantitative non-parametric approach with Data Envelopment Analysis (DEA) VRS assumption, and a statistical tool Mann-Whitney U-Test. The samples are 11 Islamic Banks in Indonesia and 4 Islamic Banks in England. This research used mediating approach. Input variable consists of total deposit, asset, and the cost of labour. Output variable consists of financing and profit/loss. This study shows that the efficiency of Islamic Bank in Indonesia and United Kingdom are varied and fluctuated during the observation period. There is no significant different the efficiency performance of Islamic Banks in Indonesia and United Kingdom.

Keywords: data envelopment analysis, efficiency, eurozone crisis, islamic bank

Procedia PDF Downloads 330

42996 The Quality Assessment of Seismic Reflection Survey Data Using Statistical Analysis: A Case Study of Fort Abbas Area, Cholistan Desert, Pakistan

Authors: U. Waqas, M. F. Ahmed, A. Mehmood, M. A. Rashid

Abstract:

In geophysical exploration surveys, the quality of acquired data holds significant importance before executing the data processing and interpretation phases. In this study, 2D seismic reflection survey data of Fort Abbas area, Cholistan Desert, Pakistan was taken as test case in order to assess its quality on statistical bases by using normalized root mean square error (NRMSE), Cronbach’s alpha test (α) and null hypothesis tests (t-test and F-test). The analysis challenged the quality of the acquired data and highlighted the significant errors in the acquired database. It is proven that the study area is plain, tectonically least affected and rich in oil and gas reserves. However, subsurface 3D modeling and contouring by using acquired database revealed high degrees of structural complexities and intense folding. The NRMSE had highest percentage of residuals between the estimated and predicted cases. The outcomes of hypothesis testing also proved the biasness and erraticness of the acquired database. Low estimated value of alpha (α) in Cronbach’s alpha test confirmed poor reliability of acquired database. A very low quality of acquired database needs excessive static correction or in some cases, reacquisition of data is also suggested which is most of the time not feasible on economic grounds. The outcomes of this study could be used to assess the quality of large databases and to further utilize as a guideline to establish database quality assessment models to make much more informed decisions in hydrocarbon exploration field.

Keywords: Data quality, Null hypothesis, Seismic lines, Seismic reflection survey

Procedia PDF Downloads 168

‹
1
2
3
4
5
6
7
8
9
10
...
1437
1438
›