Search results for: statistical data analysis
42788 Analyzing Large Scale Recurrent Event Data with a Divide-And-Conquer Approach
Authors: Jerry Q. Cheng
Abstract:
Currently, in analyzing large-scale recurrent event data, there are many challenges such as memory limitations, unscalable computing time, etc. In this research, a divide-and-conquer method is proposed using parametric frailty models. Specifically, the data is randomly divided into many subsets, and the maximum likelihood estimator from each individual data set is obtained. Then a weighted method is proposed to combine these individual estimators as the final estimator. It is shown that this divide-and-conquer estimator is asymptotically equivalent to the estimator based on the full data. Simulation studies are conducted to demonstrate the performance of this proposed method. This approach is applied to a large real dataset of repeated heart failure hospitalizations.Keywords: big data analytics, divide-and-conquer, recurrent event data, statistical computing
Procedia PDF Downloads 17242787 Short Life Cycle Time Series Forecasting
Authors: Shalaka Kadam, Dinesh Apte, Sagar Mainkar
Abstract:
The life cycle of products is becoming shorter and shorter due to increased competition in market, shorter product development time and increased product diversity. Short life cycles are normal in retail industry, style business, entertainment media, and telecom and semiconductor industry. The subject of accurate forecasting for demand of short lifecycle products is of special enthusiasm for many researchers and organizations. Due to short life cycle of products the amount of historical data that is available for forecasting is very minimal or even absent when new or modified products are launched in market. The companies dealing with such products want to increase the accuracy in demand forecasting so that they can utilize the full potential of the market at the same time do not oversupply. This provides the challenge to develop a forecasting model that can forecast accurately while handling large variations in data and consider the complex relationships between various parameters of data. Many statistical models have been proposed in literature for forecasting time series data. Traditional time series forecasting models do not work well for short life cycles due to lack of historical data. Also artificial neural networks (ANN) models are very time consuming to perform forecasting. We have studied the existing models that are used for forecasting and their limitations. This work proposes an effective and powerful forecasting approach for short life cycle time series forecasting. We have proposed an approach which takes into consideration different scenarios related to data availability for short lifecycle products. We then suggest a methodology which combines statistical analysis with structured judgement. Also the defined approach can be applied across domains. We then describe the method of creating a profile from analogous products. This profile can then be used for forecasting products with historical data of analogous products. We have designed an application which combines data, analytics and domain knowledge using point-and-click technology. The forecasting results generated are compared using MAPE, MSE and RMSE error scores. Conclusion: Based on the results it is observed that no one approach is sufficient for short life-cycle forecasting and we need to combine two or more approaches for achieving the desired accuracy.Keywords: forecast, short life cycle product, structured judgement, time series
Procedia PDF Downloads 36342786 Content-Based Color Image Retrieval Based on the 2-D Histogram and Statistical Moments
Authors: El Asnaoui Khalid, Aksasse Brahim, Ouanan Mohammed
Abstract:
In this paper, we are interested in the problem of finding similar images in a large database. For this purpose we propose a new algorithm based on a combination of the 2-D histogram intersection in the HSV space and statistical moments. The proposed histogram is based on a 3x3 window and not only on the intensity of the pixel. This approach can overcome the drawback of the conventional 1-D histogram which is ignoring the spatial distribution of pixels in the image, while the statistical moments are used to escape the effects of the discretisation of the color space which is intrinsic to the use of histograms. We compare the performance of our new algorithm to various methods of the state of the art and we show that it has several advantages. It is fast, consumes little memory and requires no learning. To validate our results, we apply this algorithm to search for similar images in different image databases.Keywords: 2-D histogram, statistical moments, indexing, similarity distance, histograms intersection
Procedia PDF Downloads 46042785 Advances in Mathematical Sciences: Unveiling the Power of Data Analytics
Authors: Zahid Ullah, Atlas Khan
Abstract:
The rapid advancements in data collection, storage, and processing capabilities have led to an explosion of data in various domains. In this era of big data, mathematical sciences play a crucial role in uncovering valuable insights and driving informed decision-making through data analytics. The purpose of this abstract is to present the latest advances in mathematical sciences and their application in harnessing the power of data analytics. This abstract highlights the interdisciplinary nature of data analytics, showcasing how mathematics intersects with statistics, computer science, and other related fields to develop cutting-edge methodologies. It explores key mathematical techniques such as optimization, mathematical modeling, network analysis, and computational algorithms that underpin effective data analysis and interpretation. The abstract emphasizes the role of mathematical sciences in addressing real-world challenges across different sectors, including finance, healthcare, engineering, social sciences, and beyond. It showcases how mathematical models and statistical methods extract meaningful insights from complex datasets, facilitating evidence-based decision-making and driving innovation. Furthermore, the abstract emphasizes the importance of collaboration and knowledge exchange among researchers, practitioners, and industry professionals. It recognizes the value of interdisciplinary collaborations and the need to bridge the gap between academia and industry to ensure the practical application of mathematical advancements in data analytics. The abstract highlights the significance of ongoing research in mathematical sciences and its impact on data analytics. It emphasizes the need for continued exploration and innovation in mathematical methodologies to tackle emerging challenges in the era of big data and digital transformation. In summary, this abstract sheds light on the advances in mathematical sciences and their pivotal role in unveiling the power of data analytics. It calls for interdisciplinary collaboration, knowledge exchange, and ongoing research to further unlock the potential of mathematical methodologies in addressing complex problems and driving data-driven decision-making in various domains.Keywords: mathematical sciences, data analytics, advances, unveiling
Procedia PDF Downloads 9742784 Statistical Analysis for Overdispersed Medical Count Data
Authors: Y. N. Phang, E. F. Loh
Abstract:
Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling over-dispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling over-dispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling over-dispersed medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling over-dispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian, and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling over-dispersed medical count data when ZIP and ZINB are inadequate.Keywords: zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit
Procedia PDF Downloads 55242783 Transport Related Air Pollution Modeling Using Artificial Neural Network
Authors: K. D. Sharma, M. Parida, S. S. Jain, Anju Saini, V. K. Katiyar
Abstract:
Air quality models form one of the most important components of an urban air quality management plan. Various statistical modeling techniques (regression, multiple regression and time series analysis) have been used to predict air pollution concentrations in the urban environment. These models calculate pollution concentrations due to observed traffic, meteorological and pollution data after an appropriate relationship has been obtained empirically between these parameters. Artificial neural network (ANN) is increasingly used as an alternative tool for modeling the pollutants from vehicular traffic particularly in urban areas. In the present paper, an attempt has been made to model traffic air pollution, specifically CO concentration using neural networks. In case of CO concentration, two scenarios were considered. First, with only classified traffic volume input and the second with both classified traffic volume and meteorological variables. The results showed that CO concentration can be predicted with good accuracy using artificial neural network (ANN).Keywords: air quality management, artificial neural network, meteorological variables, statistical modeling
Procedia PDF Downloads 53042782 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.Keywords: classification algorithms, data mining, knowledge discovery, tourism
Procedia PDF Downloads 30142781 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills
Authors: Kyle De Freitas, Margaret Bernard
Abstract:
Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.Keywords: educational data mining, learning management system, learning analytics, EDM framework
Procedia PDF Downloads 33042780 Experimental Investigation of On-Body Channel Modelling at 2.45 GHz
Authors: Hasliza A. Rahim, Fareq Malek, Nur A. M. Affendi, Azuwa Ali, Norshafinash Saudin, Latifah Mohamed
Abstract:
This paper presents the experimental investigation of on-body channel fading at 2.45 GHz considering two effects of the user body movement; stationary and mobile. A pair of body-worn antennas was utilized in this measurement campaign. A statistical analysis was performed by comparing the measured on-body path loss to five well-known distributions; lognormal, normal, Nakagami, Weibull and Rayleigh. The results showed that the average path loss of moving arm varied higher than the path loss in sitting position for upper-arm-to-left-chest link, up to 3.5 dB. The analysis also concluded that the Nakagami distribution provided the best fit for most of on-body static link path loss in standing still and sitting position, while the arm movement can be best described by log-normal distribution.Keywords: on-body channel communications, fading characteristics, statistical model, body movement
Procedia PDF Downloads 35942779 Lineup Optimization Model of Basketball Players Based on the Prediction of Recursive Neural Networks
Authors: Wang Yichen, Haruka Yamashita
Abstract:
In recent years, in the field of sports, decision making such as member in the game and strategy of the game based on then analysis of the accumulated sports data are widely attempted. In fact, in the NBA basketball league where the world's highest level players gather, to win the games, teams analyze the data using various statistical techniques. However, it is difficult to analyze the game data for each play such as the ball tracking or motion of the players in the game, because the situation of the game changes rapidly, and the structure of the data should be complicated. Therefore, it is considered that the analysis method for real time game play data is proposed. In this research, we propose an analytical model for "determining the optimal lineup composition" using the real time play data, which is considered to be difficult for all coaches. In this study, because replacing the entire lineup is too complicated, and the actual question for the replacement of players is "whether or not the lineup should be changed", and “whether or not Small Ball lineup is adopted”. Therefore, we propose an analytical model for the optimal player selection problem based on Small Ball lineups. In basketball, we can accumulate scoring data for each play, which indicates a player's contribution to the game, and the scoring data can be considered as a time series data. In order to compare the importance of players in different situations and lineups, we combine RNN (Recurrent Neural Network) model, which can analyze time series data, and NN (Neural Network) model, which can analyze the situation on the field, to build the prediction model of score. This model is capable to identify the current optimal lineup for different situations. In this research, we collected all the data of accumulated data of NBA from 2019-2020. Then we apply the method to the actual basketball play data to verify the reliability of the proposed model.Keywords: recurrent neural network, players lineup, basketball data, decision making model
Procedia PDF Downloads 13642778 Quantile Coherence Analysis: Application to Precipitation Data
Authors: Yaeji Lim, Hee-Seok Oh
Abstract:
The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.Keywords: coherence, cross periodogram, spectrum, quantile
Procedia PDF Downloads 39542777 Materialized View Effect on Query Performance
Authors: Yusuf Ziya Ayık, Ferhat Kahveci
Abstract:
Currently, database management systems have various tools such as backup and maintenance, and also provide statistical information such as resource usage and security. In terms of query performance, this paper covers query optimization, views, indexed tables, pre-computation materialized view, query performance analysis in which query plan alternatives can be created and the least costly one selected to optimize a query. Indexes and views can be created for related table columns. The literature review of this study showed that, in the course of time, despite the growing capabilities of the database management system, only database administrators are aware of the need for dealing with archival and transactional data types differently. These data may be constantly changing data used in everyday life, and also may be from the completed questionnaire whose data input was completed. For both types of data, the database uses its capabilities; but as shown in the findings section, instead of repeating similar heavy calculations which are carrying out same results with the same query over a survey results, using materialized view results can be in a more simple way. In this study, this performance difference was observed quantitatively considering the cost of the query.Keywords: cost of query, database management systems, materialized view, query performance
Procedia PDF Downloads 28242776 Students’ Speech Anxiety in Blended Learning
Authors: Mary Jane B. Suarez
Abstract:
Public speaking anxiety (PSA), also known as speech anxiety, is innumerably persistent in any traditional communication classes, especially for students who learn English as a second language. The speech anxiety intensifies when communication skills assessments have taken their toll in an online or a remote mode of learning due to the perils of the COVID-19 virus. Both teachers and students have experienced vast ambiguity on how to realize a still effective way to teach and learn speaking skills amidst the pandemic. Communication skills assessments like public speaking, oral presentations, and student reporting have defined their new meaning using Google Meet, Zoom, and other online platforms. Though using such technologies has paved for more creative ways for students to acquire and develop communication skills, the effectiveness of using such assessment tools stands in question. This mixed method study aimed to determine the factors that affected the public speaking skills of students in a communication class, to probe on the assessment gaps in assessing speaking skills of students attending online classes vis-à-vis the implementation of remote and blended modalities of learning, and to recommend ways on how to address the public speaking anxieties of students in performing a speaking task online and to bridge the assessment gaps based on the outcome of the study in order to achieve a smooth segue from online to on-ground instructions maneuvering towards a much better post-pandemic academic milieu. Using a convergent parallel design, both quantitative and qualitative data were reconciled by probing on the public speaking anxiety of students and the potential assessment gaps encountered in an online English communication class under remote and blended learning. There were four phases in applying the convergent parallel design. The first phase was the data collection, where both quantitative and qualitative data were collected using document reviews and focus group discussions. The second phase was data analysis, where quantitative data was treated using statistical testing, particularly frequency, percentage, and mean by using Microsoft Excel application and IBM Statistical Package for Social Sciences (SPSS) version 19, and qualitative data was examined using thematic analysis. The third phase was the merging of data analysis results to amalgamate varying comparisons between desired learning competencies versus the actual learning competencies of students. Finally, the fourth phase was the interpretation of merged data that led to the findings that there was a significantly high percentage of students' public speaking anxiety whenever students would deliver speaking tasks online. There were also assessment gaps identified by comparing the desired learning competencies of the formative and alternative assessments implemented and the actual speaking performances of students that showed evidence that public speaking anxiety of students was not properly identified and processed.Keywords: blended learning, communication skills assessment, public speaking anxiety, speech anxiety
Procedia PDF Downloads 10742775 Social Anxiety Connection with Individual Characteristics: Theory of Mind, Verbal Irony Comprehension and Personal Traits
Authors: Anano Tenieshvili, Teona Lodia
Abstract:
Social anxiety disorder (SAD) is one of the most common mental health problems not only in adults but also in adolescents. Individuals with SAD exhibit difficulties in interpersonal relationships, understanding emotions, and regulating them as well. For social and emotional adaptation, it is crucial to identify, understand, accept and manage emotions correctly. Researchers actively learn those factors that contribute to the development and maintenance of this condition. Therefore, the main purpose of this study is to acquire knowledge about the association between social anxiety and individual characteristics, such as theory of mind (ToM), verbal irony comprehension, and personal traits. 112 adolescents aged from 12 to 18 were selected for this research. 15 of them are diagnosed with Social anxiety disorder. Statistical analysis was performed on the entire sample, and furthermore, two groups, adolescents with and without social anxiety disorder, were compared separately. Social anxiety and personal traits were assessed by questionnaires. Theory of mind and comprehension of verbal irony were measured using tests. Statistical analysis indicated a positive relationship between social anxiety and comprehension of ironic criticism. Moreover, social anxiety was significantly positively correlated with neuroticism and isolation tendency, whereas it was negatively related to extraversion and frustration tolerance. On top of that, statistical analysis revealed a positive relationship between ToM and verbal irony comprehension. However, the relationship between social anxiety and ToM was not statistically significant. In conclusion, the current research expands knowledge about social anxiety and supports the results of some previous studies.Keywords: personal traits, social anxiety, theory of mind, verbal irony comprehension
Procedia PDF Downloads 21042774 Social Anxiety Connection with Individual Characteristics: Theory of Mind, Verbal Irony Comprehension and Personal Traits
Authors: Anano Tenieshvili, Teona Lodia
Abstract:
Social anxiety disorder (SAD) is one of the most common mental health problems not only in adults but also in adolescents. Individuals with SAD exhibit difficulties in interpersonal relationships, understanding emotions and regulating them as well. For social and emotional adaptation, it is crucial to identify, understand, accept and manage emotions correctly. Researchers actively learn those factors that contribute to the development and maintenance of this condition. Therefore, the main purpose of this study is to acquire knowledge about the association between social anxiety and individual characteristics, such as the theory of mind (ToM), verbal irony comprehension and personal traits. 112 adolescents aged from 12 to 18 were selected for this research. 15 of them are diagnosed with Social anxiety disorder. Statistical analysis was performed on the entire sample and furthermore, two groups, adolescents with and without a social anxiety disorder, were compared separately. Social anxiety and personal traits were assessed by questionnaires. Theory of mind and comprehension of verbal irony was measured using tests. Statistical analysis indicated a positive relationship between social anxiety and comprehension of ironic criticism. Moreover, social anxiety was significantly positively correlated with neuroticism and isolation tendency, whereas it was negatively related to extraversion and frustration tolerance. On top of that, statistical analysis revealed a positive relationship between ToM and verbal irony comprehension. However, the relationship between social anxiety and ToM was not statistically significant. In conclusion, the current research expands knowledge about social anxiety and supports the results of some previous studies.Keywords: personal traits, social anxiety, theory of mind, verbal irony comprehension
Procedia PDF Downloads 13142773 Study and Simulation of a Sever Dust Storm over West and South West of Iran
Authors: Saeed Farhadypour, Majid Azadi, Habibolla Sayyari, Mahmood Mosavi, Shahram Irani, Aliakbar Bidokhti, Omid Alizadeh Choobari, Ziba Hamidi
Abstract:
In the recent decades, frequencies of dust events have increased significantly in west and south west of Iran. First, a survey on the dust events during the period (1990-2013) is investigated using historical dust data collected at 6 weather stations scattered over west and south-west of Iran. After statistical analysis of the observational data, one of the most severe dust storm event that occurred in the region from 3rd to 6th July 2009, is selected and analyzed. WRF-Chem model is used to simulate the amount of PM10 and how to transport it to the areas. The initial and lateral boundary conditions for model obtained from GFS data with 0.5°×0.5° spatial resolution. In the simulation, two aerosol schemas (GOCART and MADE/SORGAM) with 3 options (chem_opt=106,300 and 303) were evaluated. Results of the statistical analysis of the historical data showed that south west of Iran has high frequency of dust events, so that Bushehr station has the highest frequency between stations and Urmia station has the lowest frequency. Also in the period of 1990 to 2013, the years 2009 and 1998 with the amounts of 3221 and 100 respectively had the highest and lowest dust events and according to the monthly variation, June and July had the highest frequency of dust events and December had the lowest frequency. Besides, model results showed that the MADE / SORGAM scheme has predicted values and trends of PM10 better than the other schemes and has showed the better performance in comparison with the observations. Finally, distribution of PM10 and the wind surface maps obtained from numerical modeling showed that the formation of dust plums formed in Iraq and Syria and also transportation of them to the West and Southwest of Iran. In addition, comparing the MODIS satellite image acquired on 4th July 2009 with model output at the same time showed the good ability of WRF-Chem in simulating spatial distribution of dust.Keywords: dust storm, MADE/SORGAM scheme, PM10, WRF-Chem
Procedia PDF Downloads 27242772 Troubleshooting Petroleum Equipment Based on Wireless Sensors Based on Bayesian Algorithm
Authors: Vahid Bayrami Rad
Abstract:
In this research, common methods and techniques have been investigated with a focus on intelligent fault finding and monitoring systems in the oil industry. In fact, remote and intelligent control methods are considered a necessity for implementing various operations in the oil industry, but benefiting from the knowledge extracted from countless data generated with the help of data mining algorithms. It is a avoid way to speed up the operational process for monitoring and troubleshooting in today's big oil companies. Therefore, by comparing data mining algorithms and checking the efficiency and structure and how these algorithms respond in different conditions, The proposed (Bayesian) algorithm using data clustering and their analysis and data evaluation using a colored Petri net has provided an applicable and dynamic model from the point of view of reliability and response time. Therefore, by using this method, it is possible to achieve a dynamic and consistent model of the remote control system and prevent the occurrence of leakage in oil pipelines and refineries and reduce costs and human and financial errors. Statistical data The data obtained from the evaluation process shows an increase in reliability, availability and high speed compared to other previous methods in this proposed method.Keywords: wireless sensors, petroleum equipment troubleshooting, Bayesian algorithm, colored Petri net, rapid miner, data mining-reliability
Procedia PDF Downloads 7242771 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis
Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales
Abstract:
This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis
Procedia PDF Downloads 20142770 On the Fourth-Order Hybrid Beta Polynomial Kernels in Kernel Density Estimation
Authors: Benson Ade Eniola Afere
Abstract:
This paper introduces a family of fourth-order hybrid beta polynomial kernels developed for statistical analysis. The assessment of these kernels' performance centers on two critical metrics: asymptotic mean integrated squared error (AMISE) and kernel efficiency. Through the utilization of both simulated and real-world datasets, a comprehensive evaluation was conducted, facilitating a thorough comparison with conventional fourth-order polynomial kernels. The evaluation procedure encompassed the computation of AMISE and efficiency values for both the proposed hybrid kernels and the established classical kernels. The consistently observed trend was the superior performance of the hybrid kernels when compared to their classical counterparts. This trend persisted across diverse datasets, underscoring the resilience and efficacy of the hybrid approach. By leveraging these performance metrics and conducting evaluations on both simulated and real-world data, this study furnishes compelling evidence in favour of the superiority of the proposed hybrid beta polynomial kernels. The discernible enhancement in performance, as indicated by lower AMISE values and higher efficiency scores, strongly suggests that the proposed kernels offer heightened suitability for statistical analysis tasks when compared to traditional kernels.Keywords: AMISE, efficiency, fourth-order Kernels, hybrid Kernels, Kernel density estimation
Procedia PDF Downloads 7442769 The Value of Dynamic Priorities in Motor Learning between Some Basic Skills in Beginner's Basketball, U14 Years
Authors: Guebli Abdelkader, Regiueg Madani, Sbaa Bouabdellah
Abstract:
The goals of this study are to find ways to determine the value of dynamic priorities in motor learning between some basic skills in beginner’s basketball (U14), based on skills of shooting and defense against the shooter. Our role is to expose the statistical results in compare & correlation between samples of study in tests skills for the shooting and defense against the shooter. In order to achieve this objective, we have chosen 40 boys in middle school represented in four groups, two controls group’s (CS1, CS2) ,and two experimental groups (ES1: training on skill of shooting, skill of defense against the shooter, ES2: experimental group training on skill of defense against the shooter, skill of shooting). For the statistical analysis, we have chosen (F & T) tests for the statistical differences, and test (R) for the correlation analysis. Based on the analyses statistics, we confirm the importance of classifying priorities of basketball basic skills during the motor learning process. Admit that the benefits of experimental group training are to economics in the time needed for acquiring new motor kinetic skills in basketball. In the priority of ES2 as successful dynamic motor learning method to enhance the basic skills among beginner’s basketball.Keywords: basic skills, basketball, motor learning, children
Procedia PDF Downloads 17342768 Unlocking the Puzzle of Borrowing Adult Data for Designing Hybrid Pediatric Clinical Trials
Authors: Rajesh Kumar G
Abstract:
A challenging aspect of any clinical trial is to carefully plan the study design to meet the study objective in optimum way and to validate the assumptions made during protocol designing. And when it is a pediatric study, there is the added challenge of stringent guidelines and difficulty in recruiting the necessary subjects. Unlike adult trials, there is not much historical data available for pediatrics, which is required to validate assumptions for planning pediatric trials. Typically, pediatric studies are initiated as soon as approval is obtained for a drug to be marketed for adults, so with the adult study historical information and with the available pediatric pilot study data or simulated pediatric data, the pediatric study can be well planned. Generalizing the historical adult study for new pediatric study is a tedious task; however, it is possible by integrating various statistical techniques and utilizing the advantage of hybrid study design, which will help to achieve the study objective in a smoother way even with the presence of many constraints. This research paper will explain how well the hybrid study design can be planned along with integrated technique (SEV) to plan the pediatric study; In brief the SEV technique (Simulation, Estimation (using borrowed adult data and applying Bayesian methods)) incorporates the use of simulating the planned study data and getting the desired estimates to Validate the assumptions.This method of validation can be used to improve the accuracy of data analysis, ensuring that results are as valid and reliable as possible, which allow us to make informed decisions well ahead of study initiation. With professional precision, this technique based on the collected data allows to gain insight into best practices when using data from historical study and simulated data alike.Keywords: adaptive design, simulation, borrowing data, bayesian model
Procedia PDF Downloads 8142767 Comparative Evaluation of Equity Indicators in the Matikiw Community-Based Forest Management Project in Pakil, Laguna and the Minayutan and Bacong Sigsigan Community-Based Forest Management Project in Famy, Laguna
Authors: Katherine Arquio
Abstract:
Community-based Forest Management (CBFM) is one of the integrative programs that slowly turned the course of forest management from traditional corporate to community-based practice resulting to people empowerment. As such, one of its goals is to promote socio-economic welfare among the people in the community in which social equity is included. This study aims to look at the equity aspect of the program, particularly if there are equity differences between two CBFM sites- Matikiw in Pakil, Laguna and Minayutan and Bacong Sigsigan in Famy, Laguna. Equity indicators were identified first, since these will be the basis of the questions that will be asked on the survey, after this, the survey proper was conducted, and finally, the analysis. Two tailed t-test was used as statistical tool since the difference between the two sites is the focus of the study. Statistical analysis was done through the use of STATA program, a statistical software. There were 32 indicators identified and results showed that, out of these indicators, only 13 were found significantly different between the two. The 13 indicators were significantly observed only in Matikiw; the other 19 indicators were commonly observed in both areas and are conducive as equity indicators for the CBFM program.Keywords: social equity, CBFM, social forestry, equity indicators
Procedia PDF Downloads 38742766 An Improved Two-dimensional Ordered Statistical Constant False Alarm Detection
Authors: Weihao Wang, Zhulin Zong
Abstract:
Two-dimensional ordered statistical constant false alarm detection is a widely used method for detecting weak target signals in radar signal processing applications. The method is based on analyzing the statistical characteristics of the noise and clutter present in the radar signal and then using this information to set an appropriate detection threshold. In this approach, the reference cell of the unit to be detected is divided into several reference subunits. These subunits are used to estimate the noise level and adjust the detection threshold, with the aim of minimizing the false alarm rate. By using an ordered statistical approach, the method is able to effectively suppress the influence of clutter and noise, resulting in a low false alarm rate. The detection process involves a number of steps, including filtering the input radar signal to remove any noise or clutter, estimating the noise level based on the statistical characteristics of the reference subunits, and finally, setting the detection threshold based on the estimated noise level. One of the main advantages of two-dimensional ordered statistical constant false alarm detection is its ability to detect weak target signals in the presence of strong clutter and noise. This is achieved by carefully analyzing the statistical properties of the signal and using an ordered statistical approach to estimate the noise level and adjust the detection threshold. In conclusion, two-dimensional ordered statistical constant false alarm detection is a powerful technique for detecting weak target signals in radar signal processing applications. By dividing the reference cell into several subunits and using an ordered statistical approach to estimate the noise level and adjust the detection threshold, this method is able to effectively suppress the influence of clutter and noise and maintain a low false alarm rate.Keywords: two-dimensional, ordered statistical, constant false alarm, detection, weak target signals
Procedia PDF Downloads 8442765 Quantification of the Non-Registered Electrical and Electronic Equipment for Domestic Consumption and Enhancing E-Waste Estimation: A Case Study on TVs in Vietnam
Authors: Ha Phuong Tran, Feng Wang, Jo Dewulf, Hai Trung Huynh, Thomas Schaubroeck
Abstract:
The fast increase and complex components have made waste of electrical and electronic equipment (or e-waste) one of the most problematic waste streams worldwide. Precise information on its size on national, regional and global level has therefore been highlighted as prerequisite to obtain a proper management system. However, this is a very challenging task, especially in developing countries where both formal e-waste management system and necessary statistical data for e-waste estimation, i.e. data on the production, sale and trade of electrical and electronic equipment (EEE), are often lacking. Moreover, there is an inflow of non-registered electronic and electric equipment, which ‘invisibly’ enters the EEE domestic market and then is used for domestic consumption. The non-registration/invisibility and (in most of the case) illicit nature of this flow make it difficult or even impossible to be captured in any statistical system. The e-waste generated from it is thus often uncounted in current e-waste estimation based on statistical market data. Therefore, this study focuses on enhancing e-waste estimation in developing countries and proposing a calculation pathway to quantify the magnitude of the non-registered EEE inflow. An advanced Input-Out Analysis model (i.e. the Sale–Stock–Lifespan model) has been integrated in the calculation procedure. In general, Sale-Stock-Lifespan model assists to improve the quality of input data for modeling (i.e. perform data consolidation to create more accurate lifespan profile, model dynamic lifespan to take into account its changes over time), via which the quality of e-waste estimation can be improved. To demonstrate the above objectives, a case study on televisions (TVs) in Vietnam has been employed. The results show that the amount of waste TVs in Vietnam has increased four times since 2000 till now. This upward trend is expected to continue in the future. In 2035, a total of 9.51 million TVs are predicted to be discarded. Moreover, estimation of non-registered TV inflow shows that it might on average contribute about 15% to the total TVs sold on the Vietnamese market during the whole period of 2002 to 2013. To tackle potential uncertainties associated with estimation models and input data, sensitivity analysis has been applied. The results show that both estimations of waste and non-registered inflow depend on two parameters i.e. number of TVs used in household and the lifespan. Particularly, with a 1% increase in the TV in-use rate, the average market share of non-register inflow in the period 2002-2013 increases 0.95%. However, it decreases from 27% to 15% when the constant unadjusted lifespan is replaced by the dynamic adjusted lifespan. The effect of these two parameters on the amount of waste TV generation for each year is more complex and non-linear over time. To conclude, despite of remaining uncertainty, this study is the first attempt to apply the Sale-Stock-Lifespan model to improve the e-waste estimation in developing countries and to quantify the non-registered EEE inflow to domestic consumption. It therefore can be further improved in future with more knowledge and data.Keywords: e-waste, non-registered electrical and electronic equipment, TVs, Vietnam
Procedia PDF Downloads 24942764 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence
Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno
Abstract:
Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index
Procedia PDF Downloads 17142763 Factors Influencing Savings of People between 30-40 Years Old in Dusit District, Bangkok Metropolis
Authors: Charawee Butbumrung
Abstract:
The purpose of this research were to study the factors influencing savings of people between 30-40 years old in Dusit District, Bangkok Metropolis. The statistic used in data analysis were frequency, mean and standard deviation, t-test, one-way ANOVA and Pearson’s correlation coefficient based on social science statistic program. Result of hypothesis testing showed that married people earning different monthly salary generally saved by depositing into the bank at different level. People of different occupation saved in form of life insurance at different level at statistical significance 0.05. Result of influence testing between saving motivation was found that people saved for use upon sickness or getting older, saved for the children. Worthiness and certainty influenced saving in the same direction at high level while saving motivation in public relation, annual tax reduction, inducement by the others, bonus gift influenced saving in the same direction at moderate level at statistical significance 0.05.Keywords: Dusit District, factors, saving, Bangkok Metropolis
Procedia PDF Downloads 24942762 Health Monitoring and Failure Detection of Electronic and Structural Components in Small Unmanned Aerial Vehicles
Authors: Gopi Kandaswamy, P. Balamuralidhar
Abstract:
Fully autonomous small Unmanned Aerial Vehicles (UAVs) are increasingly being used in many commercial applications. Although a lot of research has been done to develop safe, reliable and durable UAVs, accidents due to electronic and structural failures are not uncommon and pose a huge safety risk to the UAV operators and the public. Hence there is a strong need for an automated health monitoring system for UAVs with a view to minimizing mission failures thereby increasing safety. This paper describes our approach to monitoring the electronic and structural components in a small UAV without the need for additional sensors to do the monitoring. Our system monitors data from four sources; sensors, navigation algorithms, control inputs from the operator and flight controller outputs. It then does statistical analysis on the data and applies a rule based engine to detect failures. This information can then be fed back into the UAV and a decision to continue or abort the mission can be taken automatically by the UAV and independent of the operator. Our system has been verified using data obtained from real flights over the past year from UAVs of various sizes that have been designed and deployed by us for various applications.Keywords: fault detection, health monitoring, unmanned aerial vehicles, vibration analysis
Procedia PDF Downloads 26742761 Improvement of the Q-System Using the Rock Engineering System: A Case Study of Water Conveyor Tunnel of Azad Dam
Authors: Sahand Golmohammadi, Sana Hosseini Shirazi
Abstract:
Because the status and mechanical parameters of discontinuities in the rock mass are included in the calculations, various methods of rock engineering classification are often used as a starting point for the design of different types of structures. The Q-system is one of the most frequently used methods for stability analysis and determination of support systems of underground structures in rock, including tunnel. In this method, six main parameters of the rock mass, namely, the rock quality designation (RQD), joint set number (Jn), joint roughness number (Jr), joint alteration number (Ja), joint water parameter (Jw) and stress reduction factor (SRF) are required. In this regard, in order to achieve a reasonable and optimal design, identifying the effective parameters for the stability of the mentioned structures is one of the most important goals and the most necessary actions in rock engineering. Therefore, it is necessary to study the relationships between the parameters of a system and how they interact with each other and, ultimately, the whole system. In this research, it has attempted to determine the most effective parameters (key parameters) from the six parameters of rock mass in the Q-system using the rock engineering system (RES) method to improve the relationships between the parameters in the calculation of the Q value. The RES system is, in fact, a method by which one can determine the degree of cause and effect of a system's parameters by making an interaction matrix. In this research, the geomechanical data collected from the water conveyor tunnel of Azad Dam were used to make the interaction matrix of the Q-system. For this purpose, instead of using the conventional methods that are always accompanied by defects such as uncertainty, the Q-system interaction matrix is coded using a technique that is actually a statistical analysis of the data and determining the correlation coefficient between them. So, the effect of each parameter on the system is evaluated with greater certainty. The results of this study show that the formed interaction matrix provides a reasonable estimate of the effective parameters in the Q-system. Among the six parameters of the Q-system, the SRF and Jr parameters have the maximum and minimum impact on the system, respectively, and also the RQD and Jw parameters have the maximum and minimum impact on the system, respectively. Therefore, by developing this method, we can obtain a more accurate relation to the rock mass classification by weighting the required parameters in the Q-system.Keywords: Q-system, rock engineering system, statistical analysis, rock mass, tunnel
Procedia PDF Downloads 7642760 Quality of Age Reporting from Tanzania 2012 Census Results: An Assessment Using Whipple’s Index, Myer’s Blended Index, and Age-Sex Accuracy Index
Authors: A. Sathiya Susuman, Hamisi F. Hamisi
Abstract:
Background: Many socio-economic and demographic data are age-sex attributed. However, a variety of irregularities and misstatement are noted with respect to age-related data and less to sex data because of its biological differences between the genders. Noting the misstatement/misreporting of age data regardless of its significance importance in demographics and epidemiological studies, this study aims at assessing the quality of 2012 Tanzania Population and Housing Census Results. Methods: Data for the analysis are downloaded from Tanzania National Bureau of Statistics. Age heaping and digit preference were measured using summary indices viz., Whipple’s index, Myers’ blended index, and Age-Sex Accuracy index. Results: The recorded Whipple’s index for both sexes was 154.43; male has the lowest index of about 152.65 while female has the highest index of about 156.07. For Myers’ blended index, the preferences were at digits ‘0’ and ‘5’ while avoidance were at digits ‘1’ and ‘3’ for both sexes. Finally, Age-sex index stood at 59.8 where sex ratio score was 5.82 and age ratio scores were 20.89 and 21.4 for males and female respectively. Conclusion: The evaluation of the 2012 PHC data using the demographic techniques has qualified the data inaccurate as the results of systematic heaping and digit preferences/avoidances. Thus, innovative methods in data collection along with measuring and minimizing errors using statistical techniques should be used to ensure accuracy of age data.Keywords: age heaping, digit preference/avoidance, summary indices, Whipple’s index, Myer’s index, age-sex accuracy index
Procedia PDF Downloads 47942759 Prompt Design for Code Generation in Data Analysis Using Large Language Models
Authors: Lu Song Ma Li Zhi
Abstract:
With the rapid advancement of artificial intelligence technology, large language models (LLMs) have become a milestone in the field of natural language processing, demonstrating remarkable capabilities in semantic understanding, intelligent question answering, and text generation. These models are gradually penetrating various industries, particularly showcasing significant application potential in the data analysis domain. However, retraining or fine-tuning these models requires substantial computational resources and ample downstream task datasets, which poses a significant challenge for many enterprises and research institutions. Without modifying the internal parameters of the large models, prompt engineering techniques can rapidly adapt these models to new domains. This paper proposes a prompt design strategy aimed at leveraging the capabilities of large language models to automate the generation of data analysis code. By carefully designing prompts, data analysis requirements can be described in natural language, which the large language model can then understand and convert into executable data analysis code, thereby greatly enhancing the efficiency and convenience of data analysis. This strategy not only lowers the threshold for using large models but also significantly improves the accuracy and efficiency of data analysis. Our approach includes requirements for the precision of natural language descriptions, coverage of diverse data analysis needs, and mechanisms for immediate feedback and adjustment. Experimental results show that with this prompt design strategy, large language models perform exceptionally well in multiple data analysis tasks, generating high-quality code and significantly shortening the data analysis cycle. This method provides an efficient and convenient tool for the data analysis field and demonstrates the enormous potential of large language models in practical applications.Keywords: large language models, prompt design, data analysis, code generation
Procedia PDF Downloads 48