Search results for: statistical data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 41552

Search results for: statistical data analysis

41522 R Statistical Software Applied in Reliability Analysis: Case Study of Diesel Generator Fans

Authors: Jelena Vucicevic

Abstract:

Reliability analysis represents a very important task in different areas of work. In any industry, this is crucial for maintenance, efficiency, safety and monetary costs. There are ways to calculate reliability, unreliability, failure density and failure rate. This paper will try to introduce another way of calculating reliability by using R statistical software. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. The R programming environment is a widely used open source system for statistical analysis and statistical programming. It includes thousands of functions for the implementation of both standard and new statistical methods. R does not limit user only to operation related only to these functions. This program has many benefits over other similar programs: it is free and, as an open source, constantly updated; it has built-in help system; the R language is easy to extend with user-written functions. The significance of the work is calculation of time to failure or reliability in a new way, using statistic. Another advantage of this calculation is that there is no need for technical details and it can be implemented in any part for which we need to know time to fail in order to have appropriate maintenance, but also to maximize usage and minimize costs. In this case, calculations have been made on diesel generator fans but the same principle can be applied to any other part. The data for this paper came from a field engineering study of the time to failure of diesel generator fans. The ultimate goal was to decide whether or not to replace the working fans with a higher quality fan to prevent future failures. Seventy generators were studied. For each one, the number of hours of running time from its first being put into service until fan failure or until the end of the study (whichever came first) was recorded. Dataset consists of two variables: hours and status. Hours show the time of each fan working and status shows the event: 1- failed, 0- censored data. Censored data represent cases when we cannot track the specific case, so it could fail or success. Gaining the result by using R was easy and quick. The program will take into consideration censored data and include this into the results. This is not so easy in hand calculation. For the purpose of the paper results from R program have been compared to hand calculations in two different cases: censored data taken as a failure and censored data taken as a success. In all three cases, results are significantly different. If user decides to use the R for further calculations, it will give more precise results with work on censored data than the hand calculation.

Keywords: censored data, R statistical software, reliability analysis, time to failure

Procedia PDF Downloads 383
41521 A Cross-Gender Statistical Analysis of Tuvinian Intonation Features in Comparison With Uzbek and Azerbaijani

Authors: Daria Beziakina, Elena Bulgakova

Abstract:

The paper deals with cross-gender and cross-linguistic comparison of pitch characteristics for Tuvinian with two other Turkic languages - Uzbek and Azerbaijani, based on the results of statistical analysis of pitch parameter values and intonation patterns used by male and female speakers. The main goal of our work is to obtain the ranges of pitch parameter values typical for Tuvinian speakers for the purpose of automatic language identification. We also propose a cross-gender analysis of declarative intonation in the poorly studied Tuvinian language. The ranges of pitch parameter values were obtained by means of specially developed software that deals with the distribution of pitch values and allows us to obtain statistical language-specific pitch intervals.

Keywords: speech analysis, statistical analysis, speaker recognition, identification of person

Procedia PDF Downloads 329
41520 Analysis of the Statistical Characterization of Significant Wave Data Exceedances for Designing Offshore Structures

Authors: Rui Teixeira, Alan O’Connor, Maria Nogal

Abstract:

The statistical theory of extreme events is progressively a topic of growing interest in all the fields of science and engineering. The changes currently experienced by the world, economic and environmental, emphasized the importance of dealing with extreme occurrences with improved accuracy. When it comes to the design of offshore structures, particularly offshore wind turbines, the importance of efficiently characterizing extreme events is of major relevance. Extreme events are commonly characterized by extreme values theory. As an alternative, the accurate modeling of the tails of statistical distributions and the characterization of the low occurrence events can be achieved with the application of the Peak-Over-Threshold (POT) methodology. The POT methodology allows for a more refined fit of the statistical distribution by truncating the data with a minimum value of a predefined threshold u. For mathematically approximating the tail of the empirical statistical distribution the Generalised Pareto is widely used. Although, in the case of the exceedances of significant wave data (H_s) the 2 parameters Weibull and the Exponential distribution, which is a specific case of the Generalised Pareto distribution, are frequently used as an alternative. The Generalized Pareto, despite the existence of practical cases where it is applied, is not completely recognized as the adequate solution to model exceedances over a certain threshold u. References that set the Generalised Pareto distribution as a secondary solution in the case of significant wave data can be identified in the literature. In this framework, the current study intends to tackle the discussion of the application of statistical models to characterize exceedances of wave data. Comparison of the application of the Generalised Pareto, the 2 parameters Weibull and the Exponential distribution are presented for different values of the threshold u. Real wave data obtained in four buoys along the Irish coast was used in the comparative analysis. Results show that the application of the statistical distributions to characterize significant wave data needs to be addressed carefully and in each particular case one of the statistical models mentioned fits better the data than the others. Depending on the value of the threshold u different results are obtained. Other variables of the fit, as the number of points and the estimation of the model parameters, are analyzed and the respective conclusions were drawn. Some guidelines on the application of the POT method are presented. Modeling the tail of the distributions shows to be, for the present case, a highly non-linear task and, due to its growing importance, should be addressed carefully for an efficient estimation of very low occurrence events.

Keywords: extreme events, offshore structures, peak-over-threshold, significant wave data

Procedia PDF Downloads 251
41519 Lambda-Levelwise Statistical Convergence of a Sequence of Fuzzy Numbers

Authors: F. Berna Benli, Özgür Keskin

Abstract:

Lately, many mathematicians have been studied the statistical convergence of a sequence of fuzzy numbers. We know that Lambda-statistically convergence is a kind of convergence between ordinary convergence and statistical convergence. In this paper, we will introduce the new kind of convergence such as λ-levelwise statistical convergence. Then, we will define the concept of the λ-levelwise statistical cluster and limit points of a sequence of fuzzy numbers. Also, we will discuss the relations between the sets of λ-levelwise statistical cluster points and λ-levelwise statistical limit points of sequences of fuzzy numbers. This work has been extended in this paper, where some relations have been considered such that when lambda-statistical limit inferior and lambda-statistical limit superior for lambda-statistically convergent sequences of fuzzy numbers are equal. Furthermore, lambda-statistical boundedness condition for different sequences of fuzzy numbers has been studied.

Keywords: fuzzy number, λ-levelwise statistical cluster points, λ-levelwise statistical convergence, λ-levelwise statistical limit points, λ-statistical cluster points, λ-statistical convergence, λ-statistical limit points

Procedia PDF Downloads 452
41518 Exploring the Spatial Characteristics of Mortality Map: A Statistical Area Perspective

Authors: Jung-Hong Hong, Jing-Cen Yang, Cai-Yu Ou

Abstract:

The analysis of geographic inequality heavily relies on the use of location-enabled statistical data and quantitative measures to present the spatial patterns of the selected phenomena and analyze their differences. To protect the privacy of individual instance and link to administrative units, point-based datasets are spatially aggregated to area-based statistical datasets, where only the overall status for the selected levels of spatial units is used for decision making. The partition of the spatial units thus has dominant influence on the outcomes of the analyzed results, well known as the Modifiable Areal Unit Problem (MAUP). A new spatial reference framework, the Taiwan Geographical Statistical Classification (TGSC), was recently introduced in Taiwan based on the spatial partition principles of homogeneous consideration of the number of population and households. Comparing to the outcomes of the traditional township units, TGSC provides additional levels of spatial units with finer granularity for presenting spatial phenomena and enables domain experts to select appropriate dissemination level for publishing statistical data. This paper compares the results of respectively using TGSC and township unit on the mortality data and examines the spatial characteristics of their outcomes. For the mortality data between the period of January 1st, 2008 and December 31st, 2010 of the Taitung County, the all-cause age-standardized death rate (ASDR) ranges from 571 to 1757 per 100,000 persons, whereas the 2nd dissemination area (TGSC) shows greater variation, ranged from 0 to 2222 per 100,000. The finer granularity of spatial units of TGSC clearly provides better outcomes for identifying and evaluating the geographic inequality and can be further analyzed with the statistical measures from other perspectives (e.g., population, area, environment.). The management and analysis of the statistical data referring to the TGSC in this research is strongly supported by the use of Geographic Information System (GIS) technology. An integrated workflow that consists of the tasks of the processing of death certificates, the geocoding of street address, the quality assurance of geocoded results, the automatic calculation of statistic measures, the standardized encoding of measures and the geo-visualization of statistical outcomes is developed. This paper also introduces a set of auxiliary measures from a geographic distribution perspective to further examine the hidden spatial characteristics of mortality data and justify the analyzed results. With the common statistical area framework like TGSC, the preliminary results demonstrate promising potential for developing a web-based statistical service that can effectively access domain statistical data and present the analyzed outcomes in meaningful ways to avoid wrong decision making.

Keywords: mortality map, spatial patterns, statistical area, variation

Procedia PDF Downloads 237
41517 South African Students' Statistical Literacy in the Conceptual Understanding about Measures of Central Tendency after Completing Their High School Studies

Authors: Lukanda Kalobo

Abstract:

In South Africa, the High School Mathematics Curriculum provides teachers with specific aims and skills to be developed which involves the understanding about the measures of central tendency. The exploration begins with the definitions of statistical literacy, measurement of central tendency and a discussion on why statistical literacy is essential today. It furthermore discusses the statistical literacy basics involved in understanding the concepts of measures of central tendency. The statistical literacy test on the measures of central tendency, was used to collect data which was administered to 78 first year students direct from high schools. The results indicated that students seemed to have forgotten about the statistical literacy in understanding the concepts of measure of central tendency after completing their high school study. The authors present inferences regarding the alignment between statistical literacy and the understanding of the concepts about the measures of central tendency, leading to the conclusion that there is a need to provide in-service and pre-service training.

Keywords: conceptual understanding, mean, median, mode, statistical literacy

Procedia PDF Downloads 284
41516 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 473
41515 Chemical Variability in the Essential Oils from the Leaves and Buds of Syzygium Species

Authors: Rabia Waseem, Low Kah Hin, Najihah Mohamed Hashim

Abstract:

The variability in the chemical components of the Syzygium species essential oils has been evaluated. The leaves of Syzygium species have been collected from Perak, Malaysia. The essential oils extracted by using the conventional Hydro-distillation extraction procedure and analyzed by using Gas chromatography System attached with Mass Spectrometry (GCMS). Twenty-seven constituents were found in Syzygium species in which the major constituents include: α-Pinene (3.94%), α-Thujene (2.16%), α-Terpineol (2.95%), g-Elemene (2.89%) and D-Limonene (14.59%). The aim of this study was the comparison between the evaluated data and existing literature to fortify the major variability through statistical analysis.

Keywords: chemotaxonomy, cluster analysis, essential oil, medicinal plants, statistical analysis

Procedia PDF Downloads 287
41514 Investigation of the Main Trends of Tourist Expenses in Georgia

Authors: Nino Abesadze, Marine Mindorashvili, Nino Paresashvili

Abstract:

The main purpose of the article is to make complex statistical analysis of tourist expenses of foreign visitors. We used mixed technique of selection that implies rules of random and proportional selection. Computer software SPSS was used to compute statistical data for corresponding analysis. Corresponding methodology of tourism statistics was implemented according to international standards. Important information was collected and grouped from the major Georgian airports. Techniques of statistical observation were prepared. A representative population of foreign visitors and a rule of selection of respondents were determined. We have a trend of growth of tourist numbers and share of tourists from post-soviet countries constantly increases. Level of satisfaction with tourist facilities and quality of service has grown, but still we have a problem of disparity between quality of service and prices. The design of tourist expenses of foreign visitors is diverse; competitiveness of tourist products of Georgian tourist companies is higher.

Keywords: tourist, expenses, methods, statistics, analysis

Procedia PDF Downloads 319
41513 Examining Statistical Monitoring Approach against Traditional Monitoring Techniques in Detecting Data Anomalies during Conduct of Clinical Trials

Authors: Sheikh Omar Sillah

Abstract:

Introduction: Monitoring is an important means of ensuring the smooth implementation and quality of clinical trials. For many years, traditional site monitoring approaches have been critical in detecting data errors but not optimal in identifying fabricated and implanted data as well as non-random data distributions that may significantly invalidate study results. The objective of this paper was to provide recommendations based on best statistical monitoring practices for detecting data-integrity issues suggestive of fabrication and implantation early in the study conduct to allow implementation of meaningful corrective and preventive actions. Methodology: Electronic bibliographic databases (Medline, Embase, PubMed, Scopus, and Web of Science) were used for the literature search, and both qualitative and quantitative studies were sought. Search results were uploaded into Eppi-Reviewer Software, and only publications written in the English language from 2012 were included in the review. Gray literature not considered to present reproducible methods was excluded. Results: A total of 18 peer-reviewed publications were included in the review. The publications demonstrated that traditional site monitoring techniques are not efficient in detecting data anomalies. By specifying project-specific parameters such as laboratory reference range values, visit schedules, etc., with appropriate interactive data monitoring, statistical monitoring can offer early signals of data anomalies to study teams. The review further revealed that statistical monitoring is useful to identify unusual data patterns that might be revealing issues that could impact data integrity or may potentially impact study participants' safety. However, subjective measures may not be good candidates for statistical monitoring. Conclusion: The statistical monitoring approach requires a combination of education, training, and experience sufficient to implement its principles in detecting data anomalies for the statistical aspects of a clinical trial.

Keywords: statistical monitoring, data anomalies, clinical trials, traditional monitoring

Procedia PDF Downloads 53
41512 Solving Dimensionality Problem and Finding Statistical Constructs on Latent Regression Models: A Novel Methodology with Real Data Application

Authors: Sergio Paez Moncaleano, Alvaro Mauricio Montenegro

Abstract:

This paper presents a novel statistical methodology for measuring and founding constructs in Latent Regression Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations on Item Response Theory (IRT). In addition, based on the fundamentals of submodel theory and with a convergence of many ideas of IRT, we propose an algorithm not just to solve the dimensionality problem (nowadays an open discussion) but a new research field that promises more fear and realistic qualifications for examiners and a revolution on IRT and educational research. In the end, the methodology is applied to a set of real data set presenting impressive results for the coherence, speed and precision. Acknowledgments: This research was financed by Colciencias through the project: 'Multidimensional Item Response Theory Models for Practical Application in Large Test Designed to Measure Multiple Constructs' and both authors belong to SICS Research Group from Universidad Nacional de Colombia.

Keywords: item response theory, dimensionality, submodel theory, factorial analysis

Procedia PDF Downloads 347
41511 Secured Embedding of Patient’s Confidential Data in Electrocardiogram Using Chaotic Maps

Authors: Butta Singh

Abstract:

This paper presents a chaotic map based approach for secured embedding of patient’s confidential data in electrocardiogram (ECG) signal. The chaotic map generates predefined locations through the use of selective control parameters. The sample value difference method effectually hides the confidential data in ECG sample pairs at these predefined locations. Evaluation of proposed method on all 48 records of MIT-BIH arrhythmia ECG database demonstrates that the embedding does not alter the diagnostic features of cover ECG. The secret data imperceptibility in stego-ECG is evident through various statistical and clinical performance measures. Statistical metrics comprise of Percentage Root Mean Square Difference (PRD) and Peak Signal to Noise Ratio (PSNR). Further, a comparative analysis between proposed method and existing approaches was also performed. The results clearly demonstrated the superiority of proposed method.

Keywords: chaotic maps, ECG steganography, data embedding, electrocardiogram

Procedia PDF Downloads 166
41510 Presenting a Model in the Analysis of Supply Chain Management Components by Using Statistical Distribution Functions

Authors: Ramin Rostamkhani, Thurasamy Ramayah

Abstract:

One of the most important topics of today’s industrial organizations is the challenging issue of supply chain management. In this field, scientists and researchers have published numerous practical articles and models, especially in the last decade. In this research, to our best knowledge, the discussion of data modeling of supply chain management components using well-known statistical distribution functions has been considered. The world of science owns mathematics, and showing the behavior of supply chain data based on the characteristics of statistical distribution functions is innovative research that has not been published anywhere until the moment of doing this research. In an analytical process, describing different aspects of functions including probability density, cumulative distribution, reliability, and failure function can reach the suitable statistical distribution function for each of the components of the supply chain management. It can be applied to predict the behavior data of the relevant component in the future. Providing a model to adapt the best statistical distribution function in the supply chain management components will be a big revolution in the field of the behavior of the supply chain management elements in today's industrial organizations. Demonstrating the final results of the proposed model by introducing the process capability indices before and after implementing it alongside verifying the approach through the relevant assessment as an acceptable verification is a final step. The introduced approach can save the required time and cost to achieve the organizational goals. Moreover, it can increase added value in the organization.

Keywords: analyzing, process capability indices, statistical distribution functions, supply chain management components

Procedia PDF Downloads 75
41509 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 548
41508 Quantum Statistical Machine Learning and Quantum Time Series

Authors: Omar Alzeley, Sergey Utev

Abstract:

Minimizing a constrained multivariate function is the fundamental of Machine learning, and these algorithms are at the core of data mining and data visualization techniques. The decision function that maps input points to output points is based on the result of optimization. This optimization is the central of learning theory. One approach to complex systems where the dynamics of the system is inferred by a statistical analysis of the fluctuations in time of some associated observable is time series analysis. The purpose of this paper is a mathematical transition from the autoregressive model of classical time series to the matrix formalization of quantum theory. Firstly, we have proposed a quantum time series model (QTS). Although Hamiltonian technique becomes an established tool to detect a deterministic chaos, other approaches emerge. The quantum probabilistic technique is used to motivate the construction of our QTS model. The QTS model resembles the quantum dynamic model which was applied to financial data. Secondly, various statistical methods, including machine learning algorithms such as the Kalman filter algorithm, are applied to estimate and analyses the unknown parameters of the model. Finally, simulation techniques such as Markov chain Monte Carlo have been used to support our investigations. The proposed model has been examined by using real and simulated data. We establish the relation between quantum statistical machine and quantum time series via random matrix theory. It is interesting to note that the primary focus of the application of QTS in the field of quantum chaos was to find a model that explain chaotic behaviour. Maybe this model will reveal another insight into quantum chaos.

Keywords: machine learning, simulation techniques, quantum probability, tensor product, time series

Procedia PDF Downloads 446
41507 Statistical and Land Planning Study of Tourist Arrivals in Greece during 2005-2016

Authors: Dimitra Alexiou

Abstract:

During the last 10 years, in spite of the economic crisis, the number of tourists arriving in Greece has increased, particularly during the tourist season from April to October. In this paper, the number of annual tourist arrivals is studied to explore their preferences with regard to the month of travel, the selected destinations, as well the amount of money spent. The collected data are processed with statistical methods, yielding numerical and graphical results. From the computation of statistical parameters and the forecasting with exponential smoothing, useful conclusions are arrived at that can be used by the Greek tourism authorities, as well as by tourist organizations, for planning purposes for the coming years. The results of this paper and the computed forecast can also be used for decision making by private tourist enterprises that are investing in Greece. With regard to the statistical methods, the method of Simple Exponential Smoothing of time series of data is employed. The search for a best forecast for 2017 and 2018 provides the value of the smoothing coefficient. For all statistical computations and graphics Microsoft Excel is used.

Keywords: tourism, statistical methods, exponential smoothing, land spatial planning, economy

Procedia PDF Downloads 238
41506 Quantitative Assessment of Soft Tissues by Statistical Analysis of Ultrasound Backscattered Signals

Authors: Da-Ming Huang, Ya-Ting Tsai, Shyh-Hau Wang

Abstract:

Ultrasound signals backscattered from the soft tissues are mainly depending on the size, density, distribution, and other elastic properties of scatterers in the interrogated sample volume. The quantitative analysis of ultrasonic backscattering is frequently implemented using the statistical approach due to that of backscattering signals tends to be with the nature of the random variable. Thus, the statistical analysis, such as Nakagami statistics, has been applied to characterize the density and distribution of scatterers of a sample. Yet, the accuracy of statistical analysis could be readily affected by the receiving signals associated with the nature of incident ultrasound wave and acoustical properties of samples. Thus, in the present study, efforts were made to explore such effects as the ultrasound operational modes and attenuation of biological tissue on the estimation of corresponding Nakagami statistical parameter (m parameter). In vitro measurements were performed from healthy and pathological fibrosis porcine livers using different single-element ultrasound transducers and duty cycles of incident tone burst ranging respectively from 3.5 to 7.5 MHz and 10 to 50%. Results demonstrated that the estimated m parameter tends to be sensitively affected by the use of ultrasound operational modes as well as the tissue attenuation. The healthy and pathological tissues may be characterized quantitatively by m parameter under fixed measurement conditions and proper calibration.

Keywords: ultrasound backscattering, statistical analysis, operational mode, attenuation

Procedia PDF Downloads 298
41505 The Relationships between Market Orientation and Competitiveness of Companies in Banking Sector

Authors: Patrik Jangl, Milan Mikuláštík

Abstract:

The objective of the paper is to measure and compare market orientation of Swiss and Czech banks, as well as examine statistically the degree of influence it has on competitiveness of the institutions. The analysis of market orientation is based on the collecting, analysis and correct interpretation of the data. Descriptive analysis of market orientation describe current situation. Research of relation of competitiveness and market orientation in the sector of big international banks is suggested with the expectation of existence of a strong relationship. Partially, the work served as reconfirmation of suitability of classic methodologies to measurement of banks’ market orientation. Two types of data were gathered. Firstly, by measuring subjectively perceived market orientation of a company and secondly, by quantifying its competitiveness. All data were collected from a sample of small, mid-sized and large banks. We used numerical secondary character data from the international statistical financial Bureau Van Dijk’s BANKSCOPE database. Statistical analysis led to the following results. Assuming classical market orientation measures to be scientifically justified, Czech banks are statistically less market-oriented than Swiss banks. Secondly, among small Swiss banks, which are not broadly internationally active, small relationship exist between market orientation measures and market share based competitiveness measures. Thirdly, among all Swiss banks, a strong relationship exists between market orientation measures and market share based competitiveness measures. Above results imply existence of a strong relation of this measure in sector of big international banks. A strong statistical relationship has been proven to exist between market orientation measures and equity/total assets ratio in Switzerland.

Keywords: market orientation, competitiveness, marketing strategy, measurement of market orientation, relation between market orientation and competitiveness, banking sector

Procedia PDF Downloads 451
41504 Wind Farm Power Performance Verification Using Non-Parametric Statistical Inference

Authors: M. Celeska, K. Najdenkoski, V. Dimchev, V. Stoilkov

Abstract:

Accurate determination of wind turbine performance is necessary for economic operation of a wind farm. At present, the procedure to carry out the power performance verification of wind turbines is based on a standard of the International Electrotechnical Commission (IEC). In this paper, nonparametric statistical inference is applied to designing a simple, inexpensive method of verifying the power performance of a wind turbine. A statistical test is explained, examined, and the adequacy is tested over real data. The methods use the information that is collected by the SCADA system (Supervisory Control and Data Acquisition) from the sensors embedded in the wind turbines in order to carry out the power performance verification of a wind farm. The study has used data on the monthly output of wind farm in the Republic of Macedonia, and the time measuring interval was from January 1, 2016, to December 31, 2016. At the end, it is concluded whether the power performance of a wind turbine differed significantly from what would be expected. The results of the implementation of the proposed methods showed that the power performance of the specific wind farm under assessment was acceptable.

Keywords: canonical correlation analysis, power curve, power performance, wind energy

Procedia PDF Downloads 315
41503 The Importance of Knowledge Innovation for External Audit on Anti-Corruption

Authors: Adel M. Qatawneh

Abstract:

This paper aimed to determine the importance of knowledge innovation for external audit on anti-corruption in the entire Jordanian bank companies are listed in Amman Stock Exchange (ASE). The study importance arises from the need to recognize the Knowledge innovation for external audit and anti-corruption as the development in the world of business, the variables that will be affected by external audit innovation are: reliability of financial data, relevantly of financial data, consistency of the financial data, Full disclosure of financial data and protecting the rights of investors to achieve the objectives of the study a questionnaire was designed and distributed to the society of the Jordanian bank are listed in Amman Stock Exchange. The data analysis found out that the banks in Jordan have a positive importance of Knowledge innovation for external audit on anti-corruption. They agree on the benefit of Knowledge innovation for external audit on anti-corruption. The statistical analysis showed that Knowledge innovation for external audit had a positive impact on the anti-corruption and that external audit has a significantly statistical relationship with anti-corruption, reliability of financial data, consistency of the financial data, a full disclosure of financial data and protecting the rights of investors.

Keywords: knowledge innovation, external audit, anti-corruption, Amman Stock Exchange

Procedia PDF Downloads 446
41502 Verification of Satellite and Observation Measurements to Build Solar Energy Projects in North Africa

Authors: Samy A. Khalil, U. Ali Rahoma

Abstract:

The measurements of solar radiation, satellite data has been routinely utilize to estimate solar energy. However, the temporal coverage of satellite data has some limits. The reanalysis, also known as "retrospective analysis" of the atmosphere's parameters, is produce by fusing the output of NWP (Numerical Weather Prediction) models with observation data from a variety of sources, including ground, and satellite, ship, and aircraft observation. The result is a comprehensive record of the parameters affecting weather and climate. The effectiveness of reanalysis datasets (ERA-5) for North Africa was evaluate against high-quality surfaces measured using statistical analysis. Estimating the distribution of global solar radiation (GSR) over five chosen areas in North Africa through ten-years during the period time from 2011 to 2020. To investigate seasonal change in dataset performance, a seasonal statistical analysis was conduct, which showed a considerable difference in mistakes throughout the year. By altering the temporal resolution of the data used for comparison, the performance of the dataset is alter. Better performance is indicate by the data's monthly mean values, but data accuracy is degraded. Solar resource assessment and power estimation are discuses using the ERA-5 solar radiation data. The average values of mean bias error (MBE), root mean square error (RMSE) and mean absolute error (MAE) of the reanalysis data of solar radiation vary from 0.079 to 0.222, 0.055 to 0.178, and 0.0145 to 0.198 respectively during the period time in the present research. The correlation coefficient (R2) varies from 0.93 to 99% during the period time in the present research. This research's objective is to provide a reliable representation of the world's solar radiation to aid in the use of solar energy in all sectors.

Keywords: solar energy, ERA-5 analysis data, global solar radiation, North Africa

Procedia PDF Downloads 80
41501 Reduction in Hot Metal Silicon through Statistical Analysis at G-Blast Furnace, Tata Steel Jamshedpur

Authors: Shoumodip Roy, Ankit Singhania, Santanu Mallick, Abhiram Jha, M. K. Agarwal, R. V. Ramna, Uttam Singh

Abstract:

The quality of hot metal at any blast furnace is judged by the silicon content in it. Lower hot metal silicon not only enhances process efficiency at steel melting shops but also reduces hot metal costs. The Hot metal produced at G-Blast furnace Tata Steel Jamshedpur has a significantly higher Si content than Benchmark Blast furnaces. The higher content of hot metal Si is mainly due to inferior raw material quality than those used in benchmark blast furnaces. With minimum control over raw material quality, the only option left to control hot metal Si is via optimizing the furnace parameters. Therefore, in order to identify the levers to reduce hot metal Si, Data mining was carried out, and multiple regression models were developed. The statistical analysis revealed that Slag B3{(CaO+MgO)/SiO2}, Slag Alumina and Hot metal temperature are key controllable parameters affecting hot metal silicon. Contour Plots were used to determine the optimum range of levels identified through statistical analysis. A trial plan was formulated to operate relevant parameters, at G blast furnace, in the identified range to reduce hot metal silicon. This paper details out the process followed and subsequent reduction in hot metal silicon by 15% at G blast furnace.

Keywords: blast furnace, optimization, silicon, statistical tools

Procedia PDF Downloads 203
41500 Investigation on Performance of Change Point Algorithm in Time Series Dynamical Regimes and Effect of Data Characteristics

Authors: Farhad Asadi, Mohammad Javad Mollakazemi

Abstract:

In this paper, Bayesian online inference in models of data series are constructed by change-points algorithm, which separated the observed time series into independent series and study the change and variation of the regime of the data with related statistical characteristics. variation of statistical characteristics of time series data often represent separated phenomena in the some dynamical system, like a change in state of brain dynamical reflected in EEG signal data measurement or a change in important regime of data in many dynamical system. In this paper, prediction algorithm for studying change point location in some time series data is simulated. It is verified that pattern of proposed distribution of data has important factor on simpler and smother fluctuation of hazard rate parameter and also for better identification of change point locations. Finally, the conditions of how the time series distribution effect on factors in this approach are explained and validated with different time series databases for some dynamical system.

Keywords: time series, fluctuation in statistical characteristics, optimal learning, change-point algorithm

Procedia PDF Downloads 406
41499 Nonparametric Path Analysis with Truncated Spline Approach in Modeling Rural Poverty in Indonesia

Authors: Usriatur Rohma, Adji Achmad Rinaldo Fernandes

Abstract:

Nonparametric path analysis is a statistical method that does not rely on the assumption that the curve is known. The purpose of this study is to determine the best nonparametric truncated spline path function between linear and quadratic polynomial degrees with 1, 2, and 3-knot points and to determine the significance of estimating the best nonparametric truncated spline path function in the model of the effect of population migration and agricultural economic growth on rural poverty through the variable unemployment rate using the t-test statistic at the jackknife resampling stage. The data used in this study are secondary data obtained from statistical publications. The results showed that the best model of nonparametric truncated spline path analysis is quadratic polynomial degree with 3-knot points. In addition, the significance of the best-truncated spline nonparametric path function estimation using jackknife resampling shows that all exogenous variables have a significant influence on the endogenous variables.

Keywords: nonparametric path analysis, truncated spline, linear, quadratic, rural poverty, jackknife resampling

Procedia PDF Downloads 13
41498 Analysis on Prediction Models of TBM Performance and Selection of Optimal Input Parameters

Authors: Hang Lo Lee, Ki Il Song, Hee Hwan Ryu

Abstract:

An accurate prediction of TBM(Tunnel Boring Machine) performance is very difficult for reliable estimation of the construction period and cost in preconstruction stage. For this purpose, the aim of this study is to analyze the evaluation process of various prediction models published since 2000 for TBM performance, and to select the optimal input parameters for the prediction model. A classification system of TBM performance prediction model and applied methodology are proposed in this research. Input and output parameters applied for prediction models are also represented. Based on these results, a statistical analysis is performed using the collected data from shield TBM tunnel in South Korea. By performing a simple regression and residual analysis utilizinFg statistical program, R, the optimal input parameters are selected. These results are expected to be used for development of prediction model of TBM performance.

Keywords: TBM performance prediction model, classification system, simple regression analysis, residual analysis, optimal input parameters

Procedia PDF Downloads 291
41497 Generation of Quasi-Measurement Data for On-Line Process Data Analysis

Authors: Hyun-Woo Cho

Abstract:

For ensuring the safety of a manufacturing process one should quickly identify an assignable cause of a fault in an on-line basis. To this end, many statistical techniques including linear and nonlinear methods have been frequently utilized. However, such methods possessed a major problem of small sample size, which is mostly attributed to the characteristics of empirical models used for reference models. This work presents a new method to overcome the insufficiency of measurement data in the monitoring and diagnosis tasks. Some quasi-measurement data are generated from existing data based on the two indices of similarity and importance. The performance of the method is demonstrated using a real data set. The results turn out that the presented methods are able to handle the insufficiency problem successfully. In addition, it is shown to be quite efficient in terms of computational speed and memory usage, and thus on-line implementation of the method is straightforward for monitoring and diagnosis purposes.

Keywords: data analysis, diagnosis, monitoring, process data, quality control

Procedia PDF Downloads 459
41496 Enabling Quantitative Urban Sustainability Assessment with Big Data

Authors: Changfeng Fu

Abstract:

Sustainable urban development has been widely accepted a common sense in the modern urban planning and design. However, the measurement and assessment of urban sustainability, especially the quantitative assessment have been always an issue obsessing planning and design professionals. This paper will present an on-going research on the principles and technologies to develop a quantitative urban sustainability assessment principles and techniques which aim to integrate indicators, geospatial and geo-reference data, and assessment techniques together into a mechanism. It is based on the principles and techniques of geospatial analysis with GIS and statistical analysis methods. The decision-making technologies and methods such as AHP and SMART are also adopted to address overall assessment conclusions. The possible interfaces and presentation of data and quantitative assessment results are also described. This research is based on the knowledge, situations and data sources of UK, but it is potentially adaptable to other countries or regions. The implementation potentials of the mechanism are also discussed.

Keywords: urban sustainability assessment, quantitative analysis, sustainability indicator, geospatial data, big data

Procedia PDF Downloads 340
41495 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures

Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat

Abstract:

In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.

Keywords: association rules, clustering, similarity measure, statistical approaches

Procedia PDF Downloads 299
41494 Ontological Modeling Approach for Statistical Databases Publication in Linked Open Data

Authors: Bourama Mane, Ibrahima Fall, Mamadou Samba Camara, Alassane Bah

Abstract:

At the level of the National Statistical Institutes, there is a large volume of data which is generally in a format which conditions the method of publication of the information they contain. Each household or business data collection project includes a dissemination platform for its implementation. Thus, these dissemination methods previously used, do not promote rapid access to information and especially does not offer the option of being able to link data for in-depth processing. In this paper, we present an approach to modeling these data to publish them in a format intended for the Semantic Web. Our objective is to be able to publish all this data in a single platform and offer the option to link with other external data sources. An application of the approach will be made on data from major national surveys such as the one on employment, poverty, child labor and the general census of the population of Senegal.

Keywords: Semantic Web, linked open data, database, statistic

Procedia PDF Downloads 161
41493 Microarray Data Visualization and Preprocessing Using R and Bioconductor

Authors: Ruchi Yadav, Shivani Pandey, Prachi Srivastava

Abstract:

Microarrays provide a rich source of data on the molecular working of cells. Each microarray reports on the abundance of tens of thousands of mRNAs. Virtually every human disease is being studied using microarrays with the hope of finding the molecular mechanisms of disease. Bioinformatics analysis plays an important part of processing the information embedded in large-scale expression profiling studies and for laying the foundation for biological interpretation. A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. One of the most popular platforms for microarray analysis is Bioconductor, an open source and open development software project based on the R programming language. This paper describes specific procedures for conducting quality assessment, visualization and preprocessing of Affymetrix Gene Chip and also details the different bioconductor packages used to analyze affymetrix microarray data and describe the analysis and outcome of each plots.

Keywords: microarray analysis, R language, affymetrix visualization, bioconductor

Procedia PDF Downloads 457