Search results for: jackknife resampling
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32

Search results for: jackknife resampling

32 Nonparametric Path Analysis with Truncated Spline Approach in Modeling Rural Poverty in Indonesia

Authors: Usriatur Rohma, Adji Achmad Rinaldo Fernandes

Abstract:

Nonparametric path analysis is a statistical method that does not rely on the assumption that the curve is known. The purpose of this study is to determine the best nonparametric truncated spline path function between linear and quadratic polynomial degrees with 1, 2, and 3-knot points and to determine the significance of estimating the best nonparametric truncated spline path function in the model of the effect of population migration and agricultural economic growth on rural poverty through the variable unemployment rate using the t-test statistic at the jackknife resampling stage. The data used in this study are secondary data obtained from statistical publications. The results showed that the best model of nonparametric truncated spline path analysis is quadratic polynomial degree with 3-knot points. In addition, the significance of the best-truncated spline nonparametric path function estimation using jackknife resampling shows that all exogenous variables have a significant influence on the endogenous variables.

Keywords: nonparametric path analysis, truncated spline, linear, quadratic, rural poverty, jackknife resampling

Procedia PDF Downloads 46
31 Nonparametric Path Analysis with a Truncated Spline Approach in Modeling Waste Management Behavior Patterns

Authors: Adji Achmad Rinaldo Fernandes, Usriatur Rohma

Abstract:

Nonparametric path analysis is a statistical method that does not rely on the assumption that the curve is known. The purpose of this study is to determine the best truncated spline nonparametric path function between linear and quadratic polynomial degrees with 1, 2, and 3 knot points and to determine the significance of estimating the best truncated spline nonparametric path function in the model of the effect of perceived benefits and perceived convenience on behavior to convert waste into economic value through the intention variable of changing people's mindset about waste using the t test statistic at the jackknife resampling stage. The data used in this study are primary data obtained from research grants. The results showed that the best model of nonparametric truncated spline path analysis is quadratic polynomial degree with 3 knot points. In addition, the significance of the best truncated spline nonparametric path function estimation using jackknife resampling shows that all exogenous variables have a significant influence on the endogenous variables.

Keywords: nonparametric path analysis, truncated spline, linear, kuadratic, behavior to turn waste into economic value, jackknife resampling

Procedia PDF Downloads 46
30 Cross-Validation of the Data Obtained for ω-6 Linoleic and ω-3 α-Linolenic Acids Concentration of Hemp Oil Using Jackknife and Bootstrap Resampling

Authors: Vibha Devi, Shabina Khanam

Abstract:

Hemp (Cannabis sativa) possesses a rich content of ω-6 linoleic and ω-3 linolenic essential fatty acid in the ratio of 3:1, which is a rare and most desired ratio that enhances the quality of hemp oil. These components are beneficial for the development of cell and body growth, strengthen the immune system, possess anti-inflammatory action, lowering the risk of heart problem owing to its anti-clotting property and a remedy for arthritis and various disorders. The present study employs supercritical fluid extraction (SFE) approach on hemp seed at various conditions of parameters; temperature (40 - 80) °C, pressure (200 - 350) bar, flow rate (5 - 15) g/min, particle size (0.430 - 1.015) mm and amount of co-solvent (0 - 10) % of solvent flow rate through central composite design (CCD). CCD suggested 32 sets of experiments, which was carried out. As SFE process includes large number of variables, the present study recommends the application of resampling techniques for cross-validation of the obtained data. Cross-validation refits the model on each data to achieve the information regarding the error, variability, deviation etc. Bootstrap and jackknife are the most popular resampling techniques, which create a large number of data through resampling from the original dataset and analyze these data to check the validity of the obtained data. Jackknife resampling is based on the eliminating one observation from the original sample of size N without replacement. For jackknife resampling, the sample size is 31 (eliminating one observation), which is repeated by 32 times. Bootstrap is the frequently used statistical approach for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. For bootstrap resampling, the sample size is 32, which was repeated by 100 times. Estimands for these resampling techniques are considered as mean, standard deviation, variation coefficient and standard error of the mean. For ω-6 linoleic acid concentration, mean value was approx. 58.5 for both resampling methods, which is the average (central value) of the sample mean of all data points. Similarly, for ω-3 linoleic acid concentration, mean was observed as 22.5 through both resampling. Variance exhibits the spread out of the data from its mean. Greater value of variance exhibits the large range of output data, which is 18 for ω-6 linoleic acid (ranging from 48.85 to 63.66 %) and 6 for ω-3 linoleic acid (ranging from 16.71 to 26.2 %). Further, low value of standard deviation (approx. 1 %), low standard error of the mean (< 0.8) and low variance coefficient (< 0.2) reflect the accuracy of the sample for prediction. All the estimator value of variance coefficients, standard deviation and standard error of the mean are found within the 95 % of confidence interval.

Keywords: resampling, supercritical fluid extraction, hemp oil, cross-validation

Procedia PDF Downloads 140
29 Analysis of Path Nonparametric Truncated Spline Maximum Cubic Order in Farmers Loyalty Modeling

Authors: Adji Achmad Rinaldo Fernandes

Abstract:

Path analysis tests the relationship between variables through cause and effect. Before conducting further tests on path analysis, the assumption of linearity must be met. If the shape of the relationship is not linear and the shape of the curve is unknown, then use a nonparametric approach, one of which is a truncated spline. The purpose of this study is to estimate the function and get the best model on the nonparametric truncated spline path of linear, quadratic, and cubic orders with 1 and 2-knot points and determine the significance of the best function estimator in modeling farmer loyalty through the jackknife resampling method. This study uses secondary data through questionnaires to farmers in Sumbawa Regency who use SP-36 subsidized fertilizer products as many as 100 respondents. Based on the results of the analysis, it is known that the best-truncated spline nonparametric path model is the quadratic order of 2 knots with a coefficient of determination of 85.50%; the significance of the best-truncated spline nonparametric path estimator shows that all exogenous variables have a significant effect on endogenous variables.

Keywords: nonparametric path analysis, farmer loyalty, jackknife resampling, truncated spline

Procedia PDF Downloads 45
28 Software Verification of Systematic Resampling for Optimization of Particle Filters

Authors: Osiris Terry, Kenneth Hopkinson, Laura Humphrey

Abstract:

Systematic resampling is the most popularly used resampling method in particle filters. This paper seeks to further the understanding of systematic resampling by defining a formula made up of variables from the sampling equation and the particle weights. The formula is then verified via SPARK, a software verification language. The verified systematic resampling formula states that the minimum/maximum number of possible samples taken of a particle is equal to the floor/ceiling value of particle weight divided by the sampling interval, respectively. This allows for the creation of a randomness spectrum that each resampling method can fall within. Methods on the lower end, e.g., systematic resampling, have less randomness and, thus, are quicker to reach an estimate. Although lower randomness allows for error by having a larger bias towards the size of the weight, having this bias creates vulnerabilities to the noise in the environment, e.g., jamming. Conclusively, this is the first step in characterizing each resampling method. This will allow target-tracking engineers to pick the best resampling method for their environment instead of choosing the most popularly used one.

Keywords: SPARK, software verification, resampling, systematic resampling, particle filter, tracking

Procedia PDF Downloads 83
27 Approximate Confidence Interval for Effect Size Base on Bootstrap Resampling Method

Authors: S. Phanyaem

Abstract:

This paper presents the confidence intervals for the effect size base on bootstrap resampling method. The meta-analytic confidence interval for effect size is proposed that are easy to compute. A Monte Carlo simulation study was conducted to compare the performance of the proposed confidence intervals with the existing confidence intervals. The best confidence interval method will have a coverage probability close to 0.95. Simulation results have shown that our proposed confidence intervals perform well in terms of coverage probability and expected length.

Keywords: effect size, confidence interval, bootstrap method, resampling

Procedia PDF Downloads 595
26 Robust Shrinkage Principal Component Parameter Estimator for Combating Multicollinearity and Outliers’ Problems in a Poisson Regression Model

Authors: Arum Kingsley Chinedu, Ugwuowo Fidelis Ifeanyi, Oranye Henrietta Ebele

Abstract:

The Poisson regression model (PRM) is a nonlinear model that belongs to the exponential family of distribution. PRM is suitable for studying count variables using appropriate covariates and sometimes experiences the problem of multicollinearity in the explanatory variables and outliers on the response variable. This study aims to address the problem of multicollinearity and outliers jointly in a Poisson regression model. We developed an estimator called the robust modified jackknife PCKL parameter estimator by combining the principal component estimator, modified jackknife KL and transformed M-estimator estimator to address both problems in a PRM. The superiority conditions for this estimator were established, and the properties of the estimator were also derived. The estimator inherits the characteristics of the combined estimators, thereby making it efficient in addressing both problems. And will also be of immediate interest to the research community and advance this study in terms of novelty compared to other studies undertaken in this area. The performance of the estimator (robust modified jackknife PCKL) with other existing estimators was compared using mean squared error (MSE) as a performance evaluation criterion through a Monte Carlo simulation study and the use of real-life data. The results of the analytical study show that the estimator outperformed other existing estimators compared with by having the smallest MSE across all sample sizes, different levels of correlation, percentages of outliers and different numbers of explanatory variables.

Keywords: jackknife modified KL, outliers, multicollinearity, principal component, transformed M-estimator.

Procedia PDF Downloads 64
25 Exploring Data Leakage in EEG Based Brain-Computer Interfaces: Overfitting Challenges

Authors: Khalida Douibi, Rodrigo Balp, Solène Le Bars

Abstract:

In the medical field, applications related to human experiments are frequently linked to reduced samples size, which makes the training of machine learning models quite sensitive and therefore not very robust nor generalizable. This is notably the case in Brain-Computer Interface (BCI) studies, where the sample size rarely exceeds 20 subjects or a few number of trials. To address this problem, several resampling approaches are often used during the data preparation phase, which is an overly critical step in a data science analysis process. One of the naive approaches that is usually applied by data scientists consists in the transformation of the entire database before the resampling phase. However, this can cause model’ s performance to be incorrectly estimated when making predictions on unseen data. In this paper, we explored the effect of data leakage observed during our BCI experiments for device control through the real-time classification of SSVEPs (Steady State Visually Evoked Potentials). We also studied potential ways to ensure optimal validation of the classifiers during the calibration phase to avoid overfitting. The results show that the scaling step is crucial for some algorithms, and it should be applied after the resampling phase to avoid data leackage and improve results.

Keywords: data leackage, data science, machine learning, SSVEP, BCI, overfitting

Procedia PDF Downloads 152
24 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 379
23 Deep Learning-Based Classification of 3D CT Scans with Real Clinical Data; Impact of Image format

Authors: Maryam Fallahpoor, Biswajeet Pradhan

Abstract:

Background: Artificial intelligence (AI) serves as a valuable tool in mitigating the scarcity of human resources required for the evaluation and categorization of vast quantities of medical imaging data. When AI operates with optimal precision, it minimizes the demand for human interpretations and, thereby, reduces the burden on radiologists. Among various AI approaches, deep learning (DL) stands out as it obviates the need for feature extraction, a process that can impede classification, especially with intricate datasets. The advent of DL models has ushered in a new era in medical imaging, particularly in the context of COVID-19 detection. Traditional 2D imaging techniques exhibit limitations when applied to volumetric data, such as Computed Tomography (CT) scans. Medical images predominantly exist in one of two formats: neuroimaging informatics technology initiative (NIfTI) and digital imaging and communications in medicine (DICOM). Purpose: This study aims to employ DL for the classification of COVID-19-infected pulmonary patients and normal cases based on 3D CT scans while investigating the impact of image format. Material and Methods: The dataset used for model training and testing consisted of 1245 patients from IranMehr Hospital. All scans shared a matrix size of 512 × 512, although they exhibited varying slice numbers. Consequently, after loading the DICOM CT scans, image resampling and interpolation were performed to standardize the slice count. All images underwent cropping and resampling, resulting in uniform dimensions of 128 × 128 × 60. Resolution uniformity was achieved through resampling to 1 mm × 1 mm × 1 mm, and image intensities were confined to the range of (−1000, 400) Hounsfield units (HU). For classification purposes, positive pulmonary COVID-19 involvement was designated as 1, while normal images were assigned a value of 0. Subsequently, a U-net-based lung segmentation module was applied to obtain 3D segmented lung regions. The pre-processing stage included normalization, zero-centering, and shuffling. Four distinct 3D CNN models (ResNet152, ResNet50, DensNet169, and DensNet201) were employed in this study. Results: The findings revealed that the segmentation technique yielded superior results for DICOM images, which could be attributed to the potential loss of information during the conversion of original DICOM images to NIFTI format. Notably, ResNet152 and ResNet50 exhibited the highest accuracy at 90.0%, and the same models achieved the best F1 score at 87%. ResNet152 also secured the highest Area under the Curve (AUC) at 0.932. Regarding sensitivity and specificity, DensNet201 achieved the highest values at 93% and 96%, respectively. Conclusion: This study underscores the capacity of deep learning to classify COVID-19 pulmonary involvement using real 3D hospital data. The results underscore the significance of employing DICOM format 3D CT images alongside appropriate pre-processing techniques when training DL models for COVID-19 detection. This approach enhances the accuracy and reliability of diagnostic systems for COVID-19 detection.

Keywords: deep learning, COVID-19 detection, NIFTI format, DICOM format

Procedia PDF Downloads 87
22 Separating Landform from Noise in High-Resolution Digital Elevation Models through Scale-Adaptive Window-Based Regression

Authors: Anne M. Denton, Rahul Gomes, David W. Franzen

Abstract:

High-resolution elevation data are becoming increasingly available, but typical approaches for computing topographic features, like slope and curvature, still assume small sliding windows, for example, of size 3x3. That means that the digital elevation model (DEM) has to be resampled to the scale of the landform features that are of interest. Any higher resolution is lost in this resampling. When the topographic features are computed through regression that is performed at the resolution of the original data, the accuracy can be much higher, and the reported result can be adjusted to the length scale that is relevant locally. Slope and variance are calculated for overlapping windows, meaning that one regression result is computed per raster point. The number of window centers per area is the same for the output as for the original DEM. Slope and variance are computed by performing regression on the points in the surrounding window. Such an approach is computationally feasible because of the additive nature of regression parameters and variance. Any doubling of window size in each direction only takes a single pass over the data, corresponding to a logarithmic scaling of the resulting algorithm as a function of the window size. Slope and variance are stored for each aggregation step, allowing the reported slope to be selected to minimize variance. The approach thereby adjusts the effective window size to the landform features that are characteristic to the area within the DEM. Starting with a window size of 2x2, each iteration aggregates 2x2 non-overlapping windows from the previous iteration. Regression results are stored for each iteration, and the slope at minimal variance is reported in the final result. As such, the reported slope is adjusted to the length scale that is characteristic of the landform locally. The length scale itself and the variance at that length scale are also visualized to aid in interpreting the results for slope. The relevant length scale is taken to be half of the window size of the window over which the minimum variance was achieved. The resulting process was evaluated for 1-meter DEM data and for artificial data that was constructed to have defined length scales and added noise. A comparison with ESRI ArcMap was performed and showed the potential of the proposed algorithm. The resolution of the resulting output is much higher and the slope and aspect much less affected by noise. Additionally, the algorithm adjusts to the scale of interest within the region of the image. These benefits are gained without additional computational cost in comparison with resampling the DEM and computing the slope over 3x3 images in ESRI ArcMap for each resolution. In summary, the proposed approach extracts slope and aspect of DEMs at the lengths scales that are characteristic locally. The result is of higher resolution and less affected by noise than existing techniques.

Keywords: high resolution digital elevation models, multi-scale analysis, slope calculation, window-based regression

Procedia PDF Downloads 127
21 Robust and Transparent Spread Spectrum Audio Watermarking

Authors: Ali Akbar Attari, Ali Asghar Beheshti Shirazi

Abstract:

In this paper, we propose a blind and robust audio watermarking scheme based on spread spectrum in Discrete Wavelet Transform (DWT) domain. Watermarks are embedded in the low-frequency coefficients, which is less audible. The key idea is dividing the audio signal into small frames, and magnitude of the 6th level of DWT approximation coefficients is modifying based upon the Direct Sequence Spread Spectrum (DSSS) technique. Also, the psychoacoustic model for enhancing in imperceptibility, as well as Savitsky-Golay filter for increasing accuracy in extraction, is used. The experimental results illustrate high robustness against most common attacks, i.e. Gaussian noise addition, Low pass filter, Resampling, Requantizing, MP3 compression, without significant perceptual distortion (ODG is higher than -1). The proposed scheme has about 83 bps data payload.

Keywords: audio watermarking, spread spectrum, discrete wavelet transform, psychoacoustic, Savitsky-Golay filter

Procedia PDF Downloads 199
20 Invasive Ranges of Gorse (Ulex europaeus) in South Australia and Sri Lanka Using Species Distribution Modelling

Authors: Champika S. Kariyawasam

Abstract:

The distribution of gorse (Ulex europaeus) plants in South Australia has been modelled using 126 presence-only location data as a function of seven climate parameters. The predicted range of U. europaeus is mainly along the Mount Lofty Ranges in the Adelaide Hills and on Kangaroo Island. Annual precipitation and yearly average aridity index appeared to be the highest contributing variables to the final model formulation. The Jackknife procedure was employed to identify the contribution of different variables to gorse model outputs and response curves were used to predict changes with changing environmental variables. Based on this analysis, it was revealed that the combined effect of one or more variables could make a completely different impact to the original variables on their own to the model prediction. This work also demonstrates the need for a careful approach when selecting environmental variables for projecting correlative models to climatically distinct area. Maxent acts as a robust model when projecting the fitted species distribution model to another area with changing climatic conditions, whereas the generalized linear model, bioclim, and domain models to be less robust in this regard. These findings are important not only for predicting and managing invasive alien gorse in South Australia and Sri Lanka but also in other countries of the invasive range.

Keywords: invasive species, Maxent, species distribution modelling, Ulex europaeus

Procedia PDF Downloads 133
19 Efficacy of Conservation Strategies for Endangered Garcinia gummi gutta under Climate Change in Western Ghats

Authors: Malay K. Pramanik

Abstract:

Climate change is continuously affecting the ecosystem, species distribution as well as global biodiversity. The assessment of the species potential distribution and the spatial changes under various climate change scenarios is a significant step towards the conservation and mitigation of habitat shifts, and species' loss and vulnerability. In this context, the present study aimed to predict the influence of current and future climate on an ecologically vulnerable medicinal species, Garcinia gummi-gutta, of the southern Western Ghats using Maximum Entropy (MaxEnt) modeling. The future projections were made for the period of 2050 and 2070 with RCP (Representative Concentration Pathways) scenario of 4.5 and 8.5 using 84 species occurrence data, and climatic variables from three different models of Intergovernmental Panel for Climate Change (IPCC) fifth assessment. Climatic variables contributions were assessed using jackknife test and AOC value 0.888 indicates the model perform with high accuracy. The major influencing variables will be annual precipitation, precipitation of coldest quarter, precipitation seasonality, and precipitation of driest quarter. The model result shows that the current high potential distribution of the species is around 1.90% of the study area, 7.78% is good potential; about 90.32% is moderate to very low potential for species suitability. Finally, the results of all model represented that there will be a drastic decline in the suitable habitat distribution by 2050 and 2070 for all the RCP scenarios. The study signifies that MaxEnt model might be an efficient tool for ecosystem management, biodiversity protection, and species re-habitation planning under climate change.

Keywords: Garcinia gummi gutta, maximum entropy modeling, medicinal plants, climate change, western ghats, MaxEnt

Procedia PDF Downloads 390
18 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 418
17 Effects of Fire on Vegetation of the Prairies and Black Oak Sand Savannas of Kankakee, Illinois

Authors: Megan Alkazoff, Charles Ruffner

Abstract:

Tallgrass prairies and sand savannas, once covering northern to central Illinois, are ecosystems in need of restoration and conservation in the Midwestern United States. The Nature Conservancy manages five sites containing fragments of remaining tallgrass prairies and sand savannas within the Kankakee Sands using techniques such as prescribed burning and invasive species removal. The objective of this study was to conduct a ten-year resampling of transects established on these five sites during previous studies to assess whether the management tools applied there are helping maintain the tallgrass prairie and sand savannas. During the summer of 2020, permanent transect lines were sampled using a quadrat to determine the % Cover Class of each species rooted in the quadrat. Data gathered was analyzed using linear regression to illustrate the relationship between fire occurrence and species composition on the landscape. The fire frequency had a highly significant effect (P= 0.0025) on the species richness of all sites. The frequency of fire had a non-significant effect (P>0.05) on the Floristic Quality Index, percent C value 4-10, and bare-ground percentage of a site. These results suggest that fire on the landscape, both wild and prescribed, have increased biodiversity on all five sites but has not affected the Floristic Quality Index, percent C value 4-10, and the percentage of bare-ground on the sites.

Keywords: fire, floristic quality assessment, sand savanna, species richness, tallgrass prairie

Procedia PDF Downloads 178
16 Bioclimatic Niches of Endangered Garcinia indica Species on the Western Ghats: Predicting Habitat Suitability under Current and Future Climate

Authors: Malay K. Pramanik

Abstract:

In recent years, climate change has become a major threat and has been widely documented in the geographic distribution of many plant species. However, the impacts of climate change on the distribution of ecologically vulnerable medicinal species remain largely unknown. The identification of a suitable habitat for a species under climate change scenario is a significant step towards the mitigation of biodiversity decline. The study, therefore, aims to predict the impact of current, and future climatic scenarios on the distribution of the threatened Garcinia indica across the northern Western Ghats using Maximum Entropy (MaxEnt) modelling. The future projections were made for the year 2050 and 2070 with all Representative Concentration Pathways (RCPs) scenario (2.6, 4.5, 6.0, and 8.5) using 56 species occurrence data, and 19 bioclimatic predictors from the BCC-CSM1.1 model of the Intergovernmental Panel for Climate Change’s (IPCC) 5th assessment. The bioclimatic variables were minimised to a smaller number of variables after a multicollinearity test, and their contributions were assessed using jackknife test. The AUC value of 0.956 ± 0.023 indicates that the model performs with excellent accuracy. The study identified that temperature seasonality (39.5 ± 3.1%), isothermality (19.2 ± 1.6%), and annual precipitation (12.7 ± 1.7%) would be the major influencing variables in the current and future distribution. The model predicted 10.5% (19318.7 sq. km) of the study area as moderately to very highly suitable, while 82.60% (151904 sq. km) of the study area was identified as ‘unsuitable’ or ‘very low suitable’. Our predictions of climate change impact on habitat suitability suggest that there will be a drastic reduction in the suitability by 5.29% and 5.69% under RCP 8.5 for 2050 and 2070, respectively. Finally, the results signify that the model might be an effective tool for biodiversity protection, ecosystem management, and species re-habitation planning under future climate change scenarios.

Keywords: Garcinia Indica, maximum entropy modelling, climate change, MaxEnt, Western Ghats, medicinal plants

Procedia PDF Downloads 155
15 Internet Use, Social Networks, Loneliness and Quality of Life among Adults Aged 50 and Older: Mediating and Moderating Effects

Authors: Rabia Khaliala, Adi Vitman-Schorr

Abstract:

Background: The increase in longevity of people on one hand, and on the other hand the fact that the social networks in later life become increasingly narrower, highlight the importance of Internet use to enhance quality of life (QoL). However, whether Internet use increases or decreases social networks, loneliness and quality of life is not clear-cut. Purposes: To explore the direct and/or indirect effects of Internet use on QoL, and to examine whether ethnicity and time the elderly spent with family moderate the mediation effect of Internet use on quality of life throughout loneliness. Methods: This descriptive-correlational study was carried out in 2016 by structured interviews with a convenience sample of 502 respondents aged 50 and older, living in northern Israel. Bootstrapping with resampling strategies was used for testing mediation a model. Results: Use of the Internet was found to be positively associated with QoL. However, this relationship was mediated by loneliness, and moderated by the time the elderly spent with family members. In addition, respondents' ethnicity significantly moderated the mediation effect between Internet use and loneliness. Conclusions: Internet use can enhance QoL of older adults directly or indirectly by reducing loneliness. However, these effects are conditional on other variables. The indirect effect moderated by ethnicity, and the direct effect moderated by the time the elderly spend with their families. Researchers and practitioners should be aware of these interactions which can impact loneliness and quality of life of older persons differently.

Keywords: internet use, loneliness, quality of life, social contacts

Procedia PDF Downloads 185
14 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition

Procedia PDF Downloads 122
13 Modelling Conceptual Quantities Using Support Vector Machines

Authors: Ka C. Lam, Oluwafunmibi S. Idowu

Abstract:

Uncertainty in cost is a major factor affecting performance of construction projects. To our knowledge, several conceptual cost models have been developed with varying degrees of accuracy. Incorporating conceptual quantities into conceptual cost models could improve the accuracy of early predesign cost estimates. Hence, the development of quantity models for estimating conceptual quantities of framed reinforced concrete structures using supervised machine learning is the aim of the current research. Using measured quantities of structural elements and design variables such as live loads and soil bearing pressures, response and predictor variables were defined and used for constructing conceptual quantities models. Twenty-four models were developed for comparison using a combination of non-parametric support vector regression, linear regression, and bootstrap resampling techniques. R programming language was used for data analysis and model implementation. Gross soil bearing pressure and gross floor loading were discovered to have a major influence on the quantities of concrete and reinforcement used for foundations. Building footprint and gross floor loading had a similar influence on beams and slabs. Future research could explore the modelling of other conceptual quantities for walls, finishes, and services using machine learning techniques. Estimation of conceptual quantities would assist construction planners in early resource planning and enable detailed performance evaluation of early cost predictions.

Keywords: bootstrapping, conceptual quantities, modelling, reinforced concrete, support vector regression

Procedia PDF Downloads 204
12 Assessing Functional Structure in European Marine Ecosystems Using a Vector-Autoregressive Spatio-Temporal Model

Authors: Katyana A. Vert-Pre, James T. Thorson, Thomas Trancart, Eric Feunteun

Abstract:

In marine ecosystems, spatial and temporal species structure is an important component of ecosystems’ response to anthropological and environmental factors. Although spatial distribution patterns and fish temporal series of abundance have been studied in the past, little research has been allocated to the joint dynamic spatio-temporal functional patterns in marine ecosystems and their use in multispecies management and conservation. Each species represents a function to the ecosystem, and the distribution of these species might not be random. A heterogeneous functional distribution will lead to a more resilient ecosystem to external factors. Applying a Vector-Autoregressive Spatio-Temporal (VAST) model for count data, we estimate the spatio-temporal distribution, shift in time, and abundance of 140 species of the Eastern English Chanel, Bay of Biscay and Mediterranean Sea. From the model outputs, we determined spatio-temporal clusters, calculating p-values for hierarchical clustering via multiscale bootstrap resampling. Then, we designed a functional map given the defined cluster. We found that the species distribution within the ecosystem was not random. Indeed, species evolved in space and time in clusters. Moreover, these clusters remained similar over time deriving from the fact that species of a same cluster often shifted in sync, keeping the overall structure of the ecosystem similar overtime. Knowing the co-existing species within these clusters could help with predicting data-poor species distribution and abundance. Further analysis is being performed to assess the ecological functions represented in each cluster.

Keywords: cluster distribution shift, European marine ecosystems, functional distribution, spatio-temporal model

Procedia PDF Downloads 193
11 Two-Stage Hospital Efficiency Analysis Including Qualitative Evidence: A Greek Case

Authors: Panos Xenos, Milton Nektarios, John Yfantopoulos

Abstract:

Background: Policy makers, professional organizations and payers have introduced a variety of initiatives and reforms for the health systems worldwide, aimed at improving hospital efficiency. Their efforts are concentrated in two main categories: to constrain increasing healthcare costs and to enhance quality of services provided. Research Objectives: This study examines the efficiency of 112 Greek public hospitals for the year 2009, evaluates the importance of bootstrapping techniques and investigates the effect of contextual factors on hospital efficiency. Furthermore, the effect of qualitative evidence, on hospital efficiency is explored using data from 28 large hospitals. Methods: We applied Data Envelopment Analysis, augmented by bootstrapping techniques, to estimate efficiency scores. In order to measure the effect of environmental factors on hospital efficiency we used Tobit regression analysis. The significance of our models is evaluated using statistical tests to compare distributions. Results: The Kolmogorov-Smirnov test between the original and the bootstrap-corrected efficiency indicates that their distributions are significantly different (p-value<0.01). The environmental factors, that seem to influence efficiency, are Occupancy Rating and the ratio between Outpatient Visits and Inpatient Days. Results indicate that the inclusion of the quality variable in DEA modelling generates statistically significant variations in efficiency scores (p-value<0.05). Conclusions: The inclusion of quality variables and the use of bootstrap resampling in efficiency analysis impose a statistically significant effect on the distribution of efficiency scores. As a policy conclusion we highlight the importance of these methods on hospital efficiency analysis and, by implication, on healthcare resource allocation.

Keywords: hospitals, efficiency, quality, data envelopment analysis, Greek public hospital sector

Procedia PDF Downloads 309
10 Efficient Video Compression Technique Using Convolutional Neural Networks and Generative Adversarial Network

Authors: P. Karthick, K. Mahesh

Abstract:

Video has become an increasingly significant component of our digital everyday contact. With the advancement of greater contents and shows of the resolution, its significant volume poses serious obstacles to the objective of receiving, distributing, compressing, and revealing video content of high quality. In this paper, we propose the primary beginning to complete a deep video compression model that jointly upgrades all video compression components. The video compression method involves splitting the video into frames, comparing the images using convolutional neural networks (CNN) to remove duplicates, repeating the single image instead of the duplicate images by recognizing and detecting minute changes using generative adversarial network (GAN) and recorded with long short-term memory (LSTM). Instead of the complete image, the small changes generated using GAN are substituted, which helps in frame level compression. Pixel wise comparison is performed using K-nearest neighbours (KNN) over the frame, clustered with K-means, and singular value decomposition (SVD) is applied for each and every frame in the video for all three color channels [Red, Green, Blue] to decrease the dimension of the utility matrix [R, G, B] by extracting its latent factors. Video frames are packed with parameters with the aid of a codec and converted to video format, and the results are compared with the original video. Repeated experiments on several videos with different sizes, duration, frames per second (FPS), and quality results demonstrate a significant resampling rate. On average, the result produced had approximately a 10% deviation in quality and more than 50% in size when compared with the original video.

Keywords: video compression, K-means clustering, convolutional neural network, generative adversarial network, singular value decomposition, pixel visualization, stochastic gradient descent, frame per second extraction, RGB channel extraction, self-detection and deciding system

Procedia PDF Downloads 187
9 Development of a Robust Protein Classifier to Predict EMT Status of Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) Tumors

Authors: ZhenlinJu, Christopher P. Vellano, RehanAkbani, Yiling Lu, Gordon B. Mills

Abstract:

The epithelial–mesenchymal transition (EMT) is a process by which epithelial cells acquire mesenchymal characteristics, such as profound disruption of cell-cell junctions, loss of apical-basolateral polarity, and extensive reorganization of the actin cytoskeleton to induce cell motility and invasion. A hallmark of EMT is its capacity to promote metastasis, which is due in part to activation of several transcription factors and subsequent downregulation of E-cadherin. Unfortunately, current approaches have yet to uncover robust protein marker sets that can classify tumors as possessing strong EMT signatures. In this study, we utilize reverse phase protein array (RPPA) data and consensus clustering methods to successfully classify a subset of cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) tumors into an EMT protein signaling group (EMT group). The overall survival (OS) of patients in the EMT group is significantly worse than those in the other Hormone and PI3K/AKT signaling groups. In addition to a shrinkage and selection method for linear regression (LASSO), we applied training/test set and Monte Carlo resampling approaches to identify a set of protein markers that predicts the EMT status of CESC tumors. We fit a logistic model to these protein markers and developed a classifier, which was fixed in the training set and validated in the testing set. The classifier robustly predicted the EMT status of the testing set with an area under the curve (AUC) of 0.975 by Receiver Operating Characteristic (ROC) analysis. This method not only identifies a core set of proteins underlying an EMT signature in cervical cancer patients, but also provides a tool to examine protein predictors that drive molecular subtypes in other diseases.

Keywords: consensus clustering, TCGA CESC, Silhouette, Monte Carlo LASSO

Procedia PDF Downloads 467
8 Physical Activity and Academic Achievement: How Physical Activity Should Be Implemented to Enhance Mathematical Achievement and Mathematical Self-Concept

Authors: Laura C. Dapp, Claudia M. Roebers

Abstract:

Being physically active has many benefits for children and adolescents. It is crucial for various aspects of physical and mental health, the development of a healthy self-concept, and also positively influences academic performance and school achievement. In addressing the still incomplete understanding of the link between physical activity (PA) and academic achievement, the current study scrutinized the open issue of how PA has to be implemented to positively affect mathematical outcomes in N = 138 fourth graders. Therefore, the current study distinguished between structured PA (formal, organized, adult-led exercise and deliberate sports practice) and unstructured PA (non-formal, playful, peer-led physically active play and sports activities). Results indicated that especially structured PA has the potential to contribute to mathematical outcomes. Although children spent almost twice as much time engaging in unstructured PA as compared to structured PA, only structured PA was significantly related to mathematical achievement as well as to mathematical self-concept. Furthermore, the pending issue concerning the quantity of PA needed to enhance children’s mathematical achievement was addressed. As to that, results indicated that the amount of time spent in structured PA constitutes a critical factor in accounting for mathematical outcomes, since children engaging in PA for two hours or more a week were shown to be both the ones with the highest mathematical self-concept as well as those attaining the highest mathematical achievement scores. Finally, the present study investigated the indirect effect of PA on mathematical achievement by controlling for the mathematical self-concept as a mediating variable. The results of a maximum likelihood mediation analysis with a 2’000 resampling bootstrapping procedure for the 95% confidence intervals revealed a full mediation, indicating that PA improves mathematical self-concept, which, in turn, positively affects mathematical achievement. Thus, engaging in high amounts of structured PA constitutes an advantageous leisure activity for children and adolescents, not only to enhance their physical health but also to foster their self-concept in a way that is favorable and encouraging for promoting their academic achievement. Note: The content of this abstract is partially based on a paper published elswhere by the authors.

Keywords: Academic Achievement, Mathematical Performance, Physical Activity, Self-Concept

Procedia PDF Downloads 112
7 Utilizing Spatial Uncertainty of On-The-Go Measurements to Design Adaptive Sampling of Soil Electrical Conductivity in a Rice Field

Authors: Ismaila Olabisi Ogundiji, Hakeem Mayowa Olujide, Qasim Usamot

Abstract:

The main reasons for site-specific management for agricultural inputs are to increase the profitability of crop production, to protect the environment and to improve products’ quality. Information about the variability of different soil attributes within a field is highly essential for the decision-making process. Lack of fast and accurate acquisition of soil characteristics remains one of the biggest limitations of precision agriculture due to being expensive and time-consuming. Adaptive sampling has been proven as an accurate and affordable sampling technique for planning within a field for site-specific management of agricultural inputs. This study employed spatial uncertainty of soil apparent electrical conductivity (ECa) estimates to identify adaptive re-survey areas in the field. The original dataset was grouped into validation and calibration groups where the calibration group was sub-grouped into three sets of different measurements pass intervals. A conditional simulation was performed on the field ECa to evaluate the ECa spatial uncertainty estimates by the use of the geostatistical technique. The grouping of high-uncertainty areas for each set was done using image segmentation in MATLAB, then, high and low area value-separate was identified. Finally, an adaptive re-survey was carried out on those areas of high-uncertainty. Adding adaptive re-surveying significantly minimized the time required for resampling whole field and resulted in ECa with minimal error. For the most spacious transect, the root mean square error (RMSE) yielded from an initial crude sampling survey was minimized after an adaptive re-survey, which was close to that value of the ECa yielded with an all-field re-survey. The estimated sampling time for the adaptive re-survey was found to be 45% lesser than that of all-field re-survey. The results indicate that designing adaptive sampling through spatial uncertainty models significantly mitigates sampling cost, and there was still conformity in the accuracy of the observations.

Keywords: soil electrical conductivity, adaptive sampling, conditional simulation, spatial uncertainty, site-specific management

Procedia PDF Downloads 132
6 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 166
5 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 158
4 Efficient Residual Road Condition Segmentation Network Based on Reconstructed Images

Authors: Xiang Shijie, Zhou Dong, Tian Dan

Abstract:

This paper focuses on the application of real-time semantic segmentation technology in complex road condition recognition, aiming to address the critical issue of how to improve segmentation accuracy while ensuring real-time performance. Semantic segmentation technology has broad application prospects in fields such as autonomous vehicle navigation and remote sensing image recognition. However, current real-time semantic segmentation networks face significant technical challenges and optimization gaps in balancing speed and accuracy. To tackle this problem, this paper conducts an in-depth study and proposes an innovative Guided Image Reconstruction Module. By resampling high-resolution images into a set of low-resolution images, this module effectively reduces computational complexity, allowing the network to more efficiently extract features within limited resources, thereby improving the performance of real-time segmentation tasks. In addition, a dual-branch network structure is designed in this paper to fully leverage the advantages of different feature layers. A novel Hybrid Attention Mechanism is also introduced, which can dynamically capture multi-scale contextual information and effectively enhance the focus on important features, thus improving the segmentation accuracy of the network in complex road condition. Compared with traditional methods, the proposed model achieves a better balance between accuracy and real-time performance and demonstrates competitive results in road condition segmentation tasks, showcasing its superiority. Experimental results show that this method not only significantly improves segmentation accuracy while maintaining real-time performance, but also remains stable across diverse and complex road conditions, making it highly applicable in practical scenarios. By incorporating the Guided Image Reconstruction Module, dual-branch structure, and Hybrid Attention Mechanism, this paper presents a novel approach to real-time semantic segmentation tasks, which is expected to further advance the development of this field.

Keywords: hybrid attention mechanism, image reconstruction, real-time, road status recognition

Procedia PDF Downloads 21
3 Association of Temperature Factors with Seropositive Results against Selected Pathogens in Dairy Cow Herds from Central and Northern Greece

Authors: Marina Sofia, Alexios Giannakopoulos, Antonia Touloudi, Dimitris C Chatzopoulos, Zoi Athanasakopoulou, Vassiliki Spyrou, Charalambos Billinis

Abstract:

Fertility of dairy cattle can be affected by heat stress when the ambient temperature increases above 30°C and the relative humidity ranges from 35% to 50%. The present study was conducted on dairy cattle farms during summer months in Greece and aimed to identify the serological profile against pathogens that could affect fertility and to associate the positive serological results at herd level with temperature factors. A total of 323 serum samples were collected from clinically healthy dairy cows of 8 herds, located in Central and Northern Greece. ELISA tests were performed to detect antibodies against selected pathogens that affect fertility, namely Chlamydophila abortus, Coxiella burnetii, Neospora caninum, Toxoplasma gondii and Infectious Bovine Rhinotracheitis Virus (IBRV). Eleven climatic variables were derived from the WorldClim version 1.4. and ArcGIS V.10.1 software was used for analysis of the spatial information. Five different MaxEnt models were applied to associate the temperature variables with the locations of seropositive Chl. abortus, C. burnetii, N. caninum, T. gondii and IBRV herds (one for each pathogen). The logistic outputs were used for the interpretation of the results. ROC analyses were performed to evaluate the goodness of fit of the models’ predictions. Jackknife tests were used to identify the variables with a substantial contribution to each model. The seropositivity rates of pathogens varied among the 8 herds (0.85-4.76% for Chl. abortus, 4.76-62.71% for N. caninum, 3.8-43.47% for C. burnetii, 4.76-39.28% for T. gondii and 47.83-78.57% for IBRV). The variables of annual temperature range, mean diurnal range and maximum temperature of the warmest month gave a contribution to all five models. The regularized training gains, the training AUCs and the unregularized training gains were estimated. The mean diurnal range gave the highest gain when used in isolation and decreased the gain the most when it was omitted in the two models for seropositive Chl.abortus and IBRV herds. The annual temperature range increased the gain when used alone and decreased the gain the most when it was omitted in the models for seropositive C. burnetii, N. caninum and T. gondii herds. In conclusion, antibodies against Chl. abortus, C. burnetii, N. caninum, T. gondii and IBRV were detected in most herds suggesting circulation of pathogens that could cause infertility. The results of the spatial analyses demonstrated that the annual temperature range, mean diurnal range and maximum temperature of the warmest month could affect positively the possible pathogens’ presence. Acknowledgment: This research has been co‐financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH–CREATE–INNOVATE (project code: T1EDK-01078).

Keywords: dairy cows, seropositivity, spatial analysis, temperature factors

Procedia PDF Downloads 197