Search results for: Imbalanced dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1136

Search results for: Imbalanced dataset

116 Applying GIS Geographic Weighted Regression Analysis to Assess Local Factors Impeding Smallholder Farmers from Participating in Agribusiness Markets: A Case Study of Vihiga County, Western Kenya

Authors: Mwehe Mathenge, Ben G. J. S. Sonneveld, Jacqueline E. W. Broerse

Abstract:

Smallholder farmers are important drivers of agriculture productivity, food security, and poverty reduction in Sub-Saharan Africa. However, they are faced with myriad challenges in their efforts at participating in agribusiness markets. How the geographic explicit factors existing at the local level interact to impede smallholder farmers' decision to participates (or not) in agribusiness markets is not well understood. Deconstructing the spatial complexity of the local environment could provide a deeper insight into how geographically explicit determinants promote or impede resource-poor smallholder farmers from participating in agribusiness. This paper’s objective was to identify, map, and analyze local spatial autocorrelation in factors that impede poor smallholders from participating in agribusiness markets. Data were collected using geocoded researcher-administered survey questionnaires from 392 households in Western Kenya. Three spatial statistics methods in geographic information system (GIS) were used to analyze data -Global Moran’s I, Cluster and Outliers Analysis (Anselin Local Moran’s I), and geographically weighted regression. The results of Global Moran’s I reveal the presence of spatial patterns in the dataset that was not caused by spatial randomness of data. Subsequently, Anselin Local Moran’s I result identified spatially and statistically significant local spatial clustering (hot spots and cold spots) in factors hindering smallholder participation. Finally, the geographically weighted regression results unearthed those specific geographic explicit factors impeding market participation in the study area. The results confirm that geographically explicit factors are indispensable in influencing the smallholder farming decisions, and policymakers should take cognizance of them. Additionally, this research demonstrated how geospatial explicit analysis conducted at the local level, using geographically disaggregated data, could help in identifying households and localities where the most impoverished and resource-poor smallholder households reside. In designing spatially targeted interventions, policymakers could benefit from geospatial analysis methods in understanding complex geographic factors and processes that interact to influence smallholder farmers' decision-making processes and choices.

Keywords: agribusiness markets, GIS, smallholder farmers, spatial statistics, disaggregated spatial data

Procedia PDF Downloads 114
115 Real Estate Trend Prediction with Artificial Intelligence Techniques

Authors: Sophia Liang Zhou

Abstract:

For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three-time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy ( > 95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as the most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have a significant influence on the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed the best stable performance overall, with accuracies > 95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.

Keywords: linear regression, random forest, artificial neural network, real estate price prediction

Procedia PDF Downloads 78
114 Frequent Pattern Mining for Digenic Human Traits

Authors: Atsuko Okazaki, Jurg Ott

Abstract:

Some genetic diseases (‘digenic traits’) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa (a genetic form of blindness) occur in the presence of two mutant variants, one in the ROM1 gene and one in the RDS gene, while the occurrence of only one of these mutant variants leads to a completely normal phenotype. Detecting such digenic traits by genetic methods is difficult. A common approach to finding disease-causing variants is to compare 100,000s of variants between individuals with a trait (cases) and those without the trait (controls). Such genome-wide association studies (GWASs) have been very successful but hinge on genetic effects of single variants, that is, there should be a difference in allele or genotype frequencies between cases and controls at a disease-causing variant. Frequent pattern mining (FPM) methods offer an avenue at detecting digenic traits even in the absence of single-variant effects. The idea is to enumerate pairs of genotypes (genotype patterns) with each of the two genotypes originating from different variants that may be located at very different genomic positions. What is needed is for genotype patterns to be significantly more common in cases than in controls. Let Y = 2 refer to cases and Y = 1 to controls, with X denoting a specific genotype pattern. We are seeking association rules, ‘X → Y’, with high confidence, P(Y = 2|X), significantly higher than the proportion of cases, P(Y = 2) in the study. Clearly, generally available FPM methods are very suitable for detecting disease-associated genotype patterns. We use fpgrowth as the basic FPM algorithm and built a framework around it to enumerate high-frequency digenic genotype patterns and to evaluate their statistical significance by permutation analysis. Application to a published dataset on opioid dependence furnished results that could not be found with classical GWAS methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. The single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease. An algorithm called Multifactor Dimension Reduction (MDR) was developed some 20 years ago and has been in use in human genetics ever since. This and our algorithms share some similar properties, but they are also very different in other respects. The main difference seems to be that our algorithm focuses on patterns of genotypes while the main object of inference in MDR is the 3 × 3 table of genotypes at two variants.

Keywords: digenic traits, DNA variants, epistasis, statistical genetics

Procedia PDF Downloads 102
113 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study

Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell

Abstract:

Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.

Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout

Procedia PDF Downloads 160
112 The Trade Flow of Small Association Agreements When Rules of Origin Are Relaxed

Authors: Esmat Kamel

Abstract:

This paper aims to shed light on the extent to which the Agadir Association agreement has fostered inter regional trade between the E.U_26 and the Agadir_4 countries; once that we control for the evolution of Agadir agreement’s exports to the rest of the world. The next valid question will be regarding any remarkable variation in the spatial/sectoral structure of exports, and to what extent has it been induced by the Agadir agreement itself and precisely after the adoption of rules of origin and the PANEURO diagonal cumulative scheme? The paper’s empirical dataset covering a timeframe from [2000 -2009] was designed to account for sector specific export and intermediate flows and the bilateral structured gravity model was custom tailored to capture sector and regime specific rules of origin and the Poisson Pseudo Maximum Likelihood Estimator was used to calculate the gravity equation. The methodological approach of this work is considered to be a threefold one which starts first by conducting a ‘Hierarchal Cluster Analysis’ to classify final export flows showing a certain degree of linkage between each other. The analysis resulted in three main sectoral clusters of exports between Agadir_4 and E.U_26: cluster 1 for Petrochemical related sectors, cluster 2 durable goods and finally cluster 3 for heavy duty machinery and spare parts sectors. Second step continues by taking export flows resulting from the 3 clusters to be subject to treatment with diagonal Rules of origin through ‘The Double Differences Approach’, versus an equally comparable untreated control group. Third step is to verify results through a robustness check applied by ‘Propensity Score Matching’ to validate that the same sectoral final export and intermediate flows increased when rules of origin were relaxed. Through all the previous analysis, a remarkable and partial significance of the interaction term combining both treatment effects and time for the coefficients of 13 out of the 17 covered sectors turned out to be partially significant and it further asserted that treatment with diagonal rules of origin contributed in increasing Agadir’s_4 final and intermediate exports to the E.U._26 on average by 335% and in changing Agadir_4 exports structure and composition to the E.U._26 countries.

Keywords: agadir association agreement, structured gravity model, hierarchal cluster analysis, double differences estimation, propensity score matching, diagonal and relaxed rules of origin

Procedia PDF Downloads 297
111 Development of a Bi-National Thyroid Cancer Clinical Quality Registry

Authors: Liane J. Ioannou, Jonathan Serpell, Joanne Dean, Cino Bendinelli, Jenny Gough, Dean Lisewski, Julie Miller, Win Meyer-Rochow, Stan Sidhu, Duncan Topliss, David Walters, John Zalcberg, Susannah Ahern

Abstract:

Background: The occurrence of thyroid cancer is increasing throughout the developed world, including Australia and New Zealand, and since the 1990s has become the fastest increasing malignancy. Following the success of a number of institutional databases that monitor outcomes after thyroid surgery, the Australian and New Zealand Endocrine Surgeons (ANZES) agreed to auspice the development of a bi-national thyroid cancer registry. Objectives: To establish a bi-national population-based clinical quality registry with the aim of monitoring and improving the quality of care provided to patients diagnosed with thyroid cancer in Australia and New Zealand. Patients and Methods: The Australian and New Zealand Thyroid Cancer Registry (ANZTCR) captures clinical data for all patients, over the age of 18 years, diagnosed with thyroid cancer, confirmed by histopathology report, that have been diagnosed, assessed or treated at a contributing hospital. Data is collected by endocrine surgeons using a web-based interface, REDCap, primarily via direct data entry. Results: A multi-disciplinary Steering Committee was formed, and with operational support from Monash University the ANZTCR was established in early 2017. The pilot phase of the registry is currently operating in Victoria, New South Wales, Queensland, Western Australia and South Australia, with over 30 sites expected to come on board across Australia and New Zealand in 2018. A modified-Delphi process was undertaken to determine the key quality indicators to be reported by the registry, and a minimum dataset was developed comprising information regarding thyroid cancer diagnosis, pathology, surgery, and 30-day follow up. Conclusion: There are very few established thyroid cancer registries internationally, yet clinical quality registries have shown valuable outcomes and patient benefits in other cancers. The establishment of the ANZTCR provides the opportunity for Australia and New Zealand to further understand the current practice in the treatment of thyroid cancer and reasons for variation in outcomes. The engagement of endocrine surgeons in supporting this initiative is crucial. While the pilot registry has a focus on early clinical outcomes, it is anticipated that future collection of longer-term outcome data particularly for patients with the poor prognostic disease will add significant further value to the registry.

Keywords: thyroid cancer, clinical registry, population health, quality improvement

Procedia PDF Downloads 170
110 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries

Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman

Abstract:

There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.

Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems

Procedia PDF Downloads 123
109 Multicollinearity and MRA in Sustainability: Application of the Raise Regression

Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez

Abstract:

Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.

Keywords: multicollinearity, MRA, interaction, raise

Procedia PDF Downloads 75
108 R Statistical Software Applied in Reliability Analysis: Case Study of Diesel Generator Fans

Authors: Jelena Vucicevic

Abstract:

Reliability analysis represents a very important task in different areas of work. In any industry, this is crucial for maintenance, efficiency, safety and monetary costs. There are ways to calculate reliability, unreliability, failure density and failure rate. This paper will try to introduce another way of calculating reliability by using R statistical software. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. The R programming environment is a widely used open source system for statistical analysis and statistical programming. It includes thousands of functions for the implementation of both standard and new statistical methods. R does not limit user only to operation related only to these functions. This program has many benefits over other similar programs: it is free and, as an open source, constantly updated; it has built-in help system; the R language is easy to extend with user-written functions. The significance of the work is calculation of time to failure or reliability in a new way, using statistic. Another advantage of this calculation is that there is no need for technical details and it can be implemented in any part for which we need to know time to fail in order to have appropriate maintenance, but also to maximize usage and minimize costs. In this case, calculations have been made on diesel generator fans but the same principle can be applied to any other part. The data for this paper came from a field engineering study of the time to failure of diesel generator fans. The ultimate goal was to decide whether or not to replace the working fans with a higher quality fan to prevent future failures. Seventy generators were studied. For each one, the number of hours of running time from its first being put into service until fan failure or until the end of the study (whichever came first) was recorded. Dataset consists of two variables: hours and status. Hours show the time of each fan working and status shows the event: 1- failed, 0- censored data. Censored data represent cases when we cannot track the specific case, so it could fail or success. Gaining the result by using R was easy and quick. The program will take into consideration censored data and include this into the results. This is not so easy in hand calculation. For the purpose of the paper results from R program have been compared to hand calculations in two different cases: censored data taken as a failure and censored data taken as a success. In all three cases, results are significantly different. If user decides to use the R for further calculations, it will give more precise results with work on censored data than the hand calculation.

Keywords: censored data, R statistical software, reliability analysis, time to failure

Procedia PDF Downloads 379
107 Secondary Prisonization and Mental Health: A Comparative Study with Elderly Parents of Prisoners Incarcerated in Remote Jails

Authors: Luixa Reizabal, Inaki Garcia, Eneko Sansinenea, Ainize Sarrionandia, Karmele Lopez De Ipina, Elsa Fernandez

Abstract:

Although the effects of incarceration in prisons close to prisoners’ and their families’ residences have been studied, little is known about the effects of remote incarceration. The present study shows the impact of secondary prisonization on mental health of elderly parents of Basque prisoners who are incarcerated in prisons located far away from prisoners’ and their families’ residences. Secondary prisonization refers to the effects that imprisonment of a family member has on relatives. In the study, psychological effects are analyzed by means of comparative methodology. Specifically, levels of psychopathology (depression, anxiety, and stress) and positive mental health (psychological, social, and emotional well-being) are studied in a sample of parents over 65 years old of prisoners incarcerated in prisons located a long distance away (concretely, some of them in a distance of less than 400 km, while others farther than 400 km) from the Basque Country. The dataset consists of data collected through a questionnaire and from a spontaneous speech recording. The statistical and automatic analyses show that levels of psychopathology and positive mental health of elderly parents of prisoners incarcerated in remote jails are affected by the incarceration of their sons or daughters. Concretely, these parents show higher levels of depression, anxiety, and stress and lower levels of emotional (but not psychological or social) wellbeing than parents with no imprisoned daughters or sons. These findings suggest that parents with imprisoned sons or daughters suffer the impact of secondary prisonization on their mental health. When comparing parents with sons or daughters incarcerated within 400 kilometers from home and parents whose sons or daughters are incarcerated farther than 400 kilometers from home, the latter present higher levels of psychopathology, but also higher levels of positive mental health (although the difference between the two groups is not statistically significant). These findings might be explained by resilience. In fact, in traumatic situations, people can develop a force to cope with the situation, and even present a posttraumatic growth. Bearing in mind all these findings, it could be concluded that secondary prisonization implies for elderly parents with sons or daughters incarcerated in remote jails suffering and, in consequence, that changes in the penitentiary policy applied to Basque prisoners are required in order to finish this suffering.

Keywords: automatic spontaneous speech analysis, elderly parents, machine learning, positive mental health, psychopathology, remote incarceration, secondary prisonization

Procedia PDF Downloads 252
106 Predicting Resistance of Commonly Used Antimicrobials in Urinary Tract Infections: A Decision Tree Analysis

Authors: Meera Tandan, Mohan Timilsina, Martin Cormican, Akke Vellinga

Abstract:

Background: In general practice, many infections are treated empirically without microbiological confirmation. Understanding susceptibility of antimicrobials during empirical prescribing can be helpful to reduce inappropriate prescribing. This study aims to apply a prediction model using a decision tree approach to predict the antimicrobial resistance (AMR) of urinary tract infections (UTI) based on non-clinical features of patients over 65 years. Decision tree models are a novel idea to predict the outcome of AMR at an initial stage. Method: Data was extracted from the database of the microbiological laboratory of the University Hospitals Galway on all antimicrobial susceptibility testing (AST) of urine specimens from patients over the age of 65 from January 2011 to December 2014. The primary endpoint was resistance to common antimicrobials (Nitrofurantoin, trimethoprim, ciprofloxacin, co-amoxiclav and amoxicillin) used to treat UTI. A classification and regression tree (CART) model was generated with the outcome ‘resistant infection’. The importance of each predictor (the number of previous samples, age, gender, location (nursing home, hospital, community) and causative agent) on antimicrobial resistance was estimated. Sensitivity, specificity, negative predictive (NPV) and positive predictive (PPV) values were used to evaluate the performance of the model. Seventy-five percent (75%) of the data were used as a training set and validation of the model was performed with the remaining 25% of the dataset. Results: A total of 9805 UTI patients over 65 years had their urine sample submitted for AST at least once over the four years. E.coli, Klebsiella, Proteus species were the most commonly identified pathogens among the UTI patients without catheter whereas Sertia, Staphylococcus aureus; Enterobacter was common with the catheter. The validated CART model shows slight differences in the sensitivity, specificity, PPV and NPV in between the models with and without the causative organisms. The sensitivity, specificity, PPV and NPV for the model with non-clinical predictors was between 74% and 88% depending on the antimicrobial. Conclusion: The CART models developed using non-clinical predictors have good performance when predicting antimicrobial resistance. These models predict which antimicrobial may be the most appropriate based on non-clinical factors. Other CART models, prospective data collection and validation and an increasing number of non-clinical factors will improve model performance. The presented model provides an alternative approach to decision making on antimicrobial prescribing for UTIs in older patients.

Keywords: antimicrobial resistance, urinary tract infection, prediction, decision tree

Procedia PDF Downloads 229
105 Evaluation of Soil Erosion Risk and Prioritization for Implementation of Management Strategies in Morocco

Authors: Lahcen Daoudi, Fatima Zahra Omdi, Abldelali Gourfi

Abstract:

In Morocco, as in most Mediterranean countries, water scarcity is a common situation because of low and unevenly distributed rainfall. The expansions of irrigated lands, as well as the growth of urban and industrial areas and tourist resorts, contribute to an increase of water demand. Therefore in the 1960s Morocco embarked on an ambitious program to increase the number of dams to boost water retention capacity. However, the decrease in the capacity of these reservoirs caused by sedimentation is a major problem; it is estimated at 75 million m3/year. Dams and reservoirs became unusable for their intended purposes due to sedimentation in large rivers that result from soil erosion. Soil erosion presents an important driving force in the process affecting the landscape. It has become one of the most serious environmental problems that raised much interest throughout the world. Monitoring soil erosion risk is an important part of soil conservation practices. The estimation of soil loss risk is the first step for a successful control of water erosion. The aim of this study is to estimate the soil loss risk and its spatial distribution in the different fields of Morocco and to prioritize areas for soil conservation interventions. The approach followed is the Revised Universal Soil Loss Equation (RUSLE) using remote sensing and GIS, which is the most popular empirically based model used globally for erosion prediction and control. This model has been tested in many agricultural watersheds in the world, particularly for large-scale basins due to the simplicity of the model formulation and easy availability of the dataset. The spatial distribution of the annual soil loss was elaborated by the combination of several factors: rainfall erosivity, soil erodability, topography, and land cover. The average annual soil loss estimated in several basins watershed of Morocco varies from 0 to 50t/ha/year. Watersheds characterized by high-erosion-vulnerability are located in the North (Rif Mountains) and more particularly in the Central part of Morocco (High Atlas Mountains). This variation of vulnerability is highly correlated to slope variation which indicates that the topography factor is the main agent of soil erosion within these basin catchments. These results could be helpful for the planning of natural resources management and for implementing sustainable long-term management strategies which are necessary for soil conservation and for increasing over the projected economic life of the dam implemented.

Keywords: soil loss, RUSLE, GIS-remote sensing, watershed, Morocco

Procedia PDF Downloads 434
104 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 91
103 Deep Learning for Image Correction in Sparse-View Computed Tomography

Authors: Shubham Gogri, Lucia Florescu

Abstract:

Medical diagnosis and radiotherapy treatment planning using Computed Tomography (CT) rely on the quantitative accuracy and quality of the CT images. At the same time, requirements for CT imaging include reducing the radiation dose exposure to patients and minimizing scanning time. A solution to this is the sparse-view CT technique, based on a reduced number of projection views. This, however, introduces a new problem— the incomplete projection data results in lower quality of the reconstructed images. To tackle this issue, deep learning methods have been applied to enhance the quality of the sparse-view CT images. A first approach involved employing Mir-Net, a dedicated deep neural network designed for image enhancement. This showed promise, utilizing an intricate architecture comprising encoder and decoder networks, along with the incorporation of the Charbonnier Loss. However, this approach was computationally demanding. Subsequently, a specialized Generative Adversarial Network (GAN) architecture, rooted in the Pix2Pix framework, was implemented. This GAN framework involves a U-Net-based Generator and a Discriminator based on Convolutional Neural Networks. To bolster the GAN's performance, both Charbonnier and Wasserstein loss functions were introduced, collectively focusing on capturing minute details while ensuring training stability. The integration of the perceptual loss, calculated based on feature vectors extracted from the VGG16 network pretrained on the ImageNet dataset, further enhanced the network's ability to synthesize relevant images. A series of comprehensive experiments with clinical CT data were conducted, exploring various GAN loss functions, including Wasserstein, Charbonnier, and perceptual loss. The outcomes demonstrated significant image quality improvements, confirmed through pertinent metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) between the corrected images and the ground truth. Furthermore, learning curves and qualitative comparisons added evidence of the enhanced image quality and the network's increased stability, while preserving pixel value intensity. The experiments underscored the potential of deep learning frameworks in enhancing the visual interpretation of CT scans, achieving outcomes with SSIM values close to one and PSNR values reaching up to 76.

Keywords: generative adversarial networks, sparse view computed tomography, CT image correction, Mir-Net

Procedia PDF Downloads 120
102 Characterization of Mycoplasma Pneumoniae Causing Exacerbation of Asthma: A Prototypical Finding from Sri Lanka

Authors: Lakmini Wijesooriya, Vicki Chalker, Jessica Day, Priyantha Perera, N. P. Sunil-Chandra

Abstract:

M. pneumoniae has been identified as an etiology for exacerbation of asthma (EQA), although viruses play a major role in EOA. M. pneumoniae infection is treated empirically with macrolides, and its antibiotic sensitivity is not detected routinely. Characterization of the organism by genotyping and determination of macrolide resistance is important epidemiologically as it guides the empiric antibiotic treatment. To date, there is no such characterization of M. pneumoniae performed in Sri Lanka. The present study describes the characterization of M. pneumoniae detected from a child with EOA following a screening of 100 children with EOA. Of the hundred children with EOA, M. pneumoniae was identified only in one child by Real-Time polymerase chain reaction (PCR) test for identifying the community-acquired respiratory distress syndrome (CARDS) toxin nucleotide sequences. The M. pneumoniae identified from this patient underwent detection of macrolide resistance via conventional PCR, amplifying and sequencing the region of the 23S rDNA gene that contains single nucleotide polymorphisms that confer resistance. Genotyping of the isolate was performed via nested Multilocus Sequence Typing (MLST) in which eight (8) housekeeping genes (ppa, pgm, gyrB, gmk, glyA, atpA, arcC, and adk) were amplified via nested PCR followed by gene sequencing and analysis. As per MLST analysis, the M. pneumoniae was identified as sequence type 14 (ST14), and no mutations that confer resistance were detected. Resistance to macrolides in M. pneumoniae is an increasing problem globally. Establishing surveillance systems is the key to informing local prescriptions. In the absence of local surveillance data, antibiotics are started empirically. If the relevant microbiological samples are not obtained before antibiotic therapy, as in most occasions in children, the course of antibiotic is completed without a microbiological diagnosis. This happens more frequently in therapy for M. pneumoniae which is treated with a macrolide in most patients. Hence, it is important to understand the macrolide sensitivity of M. pneumoniae in the setting. The M. pneumoniae detected in the present study was macrolide sensitive. Further studies are needed to examine a larger dataset in Sri Lanka to determine macrolide resistance levels to inform the use of macrolides in children with EOA. The MLST type varies in different geographical settings, and it also provides a clue to the existence of macrolide resistance. The present study enhances the database of the global distribution of different genotypes of M. pneumoniae as this is the first such characterization performed with the increased number of samples to determine macrolide resistance level in Sri Lanka. M. pneumoniae detected from a child with exacerbation of asthma in Sri Lanka was characterized as ST14 by MLST and no mutations that confer resistance were detected.

Keywords: mycoplasma pneumoniae, Sri Lanka, characterization, macrolide resistance

Procedia PDF Downloads 157
101 A Narrative Inquiry of Identity Formation of Chinese Fashion Designers

Authors: Lily Ye

Abstract:

The contemporary fashion industry has witnessed the global rise of Chinese fashion designers. China plays more and more important role in this sector globally. One of the key debates in contemporary time is the conception of Chinese fashion. A close look at previous discussions on Chinese fashion reveals that most of them are explored through the lens of cultural knowledge and assumptions, using the dichotomous models of East and West. The results of these studies generate an essentialist and orientalist notion of Chinoiserie and Chinese fashion, which sees individual designers from China as undifferential collective members marked by a unique and fixed set of cultural scripts. This study challenges this essentialist conceptualization and brings fresh insights to the discussion of Chinese fashion identity against the backdrop of globalisation. Different from a culturalist approach to researching Chinese fashion, this paper presents an alternative position to address the research agenda through the mobilisation of Giddens’ (1991) theory of reflexive identity formation, privileging individuals’ agency and reflexivity. This approach to the discussion of identity formation not only challenges the traditional view seeing identity as the distinctive and essential characteristics belonging to any given individual or shared by all members of a particular social category or group but highlights fashion designers’ strategic agency and their role as fashion activist. This study draws evidence from a textual analysis of published stories of a group of established Chinese designers such as Guo Pei, Huishan Zhang, Masha Ma, Uma Wang, and Ma Ke. In line with Giddens’ concept of 'reflexive project of the self', this study uses a narrative methodology. Narratives are verbal accounts or stories relating to experiences of Chinese fashion designers. This approach offers the fashion designers a chance to 'speak' for themselves and show the depths and complexities of their experiences. It also emphasises the nuances of identity formation in fashion designers, whose experiences cannot be captured in neat typologies. Thematic analysis (Braun and Clarke, 2006) is adopted to identify and investigate common themes across the whole dataset. At the centre of the analysis is individuals’ self-articulation of their perceptions, experiences and themselves in relation to culture, fashion and identity. The finding indicates that identity is constructed around anchors such as agency, cultural hybridity, reflexivity and sustainability rather than traditional collective categories such as culture and ethnicity. Thus, the old East-West dichotomy is broken down, and essentialised social categories are challenged by the multiplicity and fragmentation of self and cultural hybridity created within designers’ 'small narratives'.

Keywords: Chinoiserie, fashion identity, fashion activism, narrative inquiry

Procedia PDF Downloads 262
100 Development of a Novel Clinical Screening Tool, Using the BSGE Pain Questionnaire, Clinical Examination and Ultrasound to Predict the Severity of Endometriosis Prior to Laparoscopic Surgery

Authors: Marlin Mubarak

Abstract:

Background: Endometriosis is a complex disabling disease affecting young females in the reproductive period mainly. The aim of this project is to generate a diagnostic model to predict severity and stage of endometriosis prior to Laparoscopic surgery. This will help to improve the pre-operative diagnostic accuracy of stage 3 & 4 endometriosis and as a result, refer relevant women to a specialist centre for complex Laparoscopic surgery. The model is based on the British Society of Gynaecological Endoscopy (BSGE) pain questionnaire, clinical examination and ultrasound scan. Design: This is a prospective, observational, study, in which women completed the BSGE pain questionnaire, a BSGE requirement. Also, as part of the routine preoperative assessment patient had a routine ultrasound scan and when recto-vaginal and deep infiltrating endometriosis was suspected an MRI was performed. Setting: Luton & Dunstable University Hospital. Patients: Symptomatic women (n = 56) scheduled for laparoscopy due to pelvic pain. The age ranged between 17 – 52 years of age (mean 33.8 years, SD 8.7 years). Interventions: None outside the recognised and established endometriosis centre protocol set up by BSGE. Main Outcome Measure(s): Sensitivity and specificity of endometriosis diagnosis predicted by symptoms based on BSGE pain questionnaire, clinical examinations and imaging. Findings: The prevalence of diagnosed endometriosis was calculated to be 76.8% and the prevalence of advanced stage was 55.4%. Deep infiltrating endometriosis in various locations was diagnosed in 32/56 women (57.1%) and some had DIE involving several locations. Logistic regression analysis was performed on 36 clinical variables to create a simple clinical prediction model. After creating the scoring system using variables with P < 0.05, the model was applied to the whole dataset. The sensitivity was 83.87% and specificity 96%. The positive likelihood ratio was 20.97 and the negative likelihood ratio was 0.17, indicating that the model has a good predictive value and could be useful in predicting advanced stage endometriosis. Conclusions: This is a hypothesis-generating project with one operator, but future proposed research would provide validation of the model and establish its usefulness in the general setting. Predictive tools based on such model could help organise the appropriate investigation in clinical practice, reduce risks associated with surgery and improve outcome. It could be of value for future research to standardise the assessment of women presenting with pelvic pain. The model needs further testing in a general setting to assess if the initial results are reproducible.

Keywords: deep endometriosis, endometriosis, minimally invasive, MRI, ultrasound.

Procedia PDF Downloads 320
99 Predicting Football Player Performance: Integrating Data Visualization and Machine Learning

Authors: Saahith M. S., Sivakami R.

Abstract:

In the realm of football analytics, particularly focusing on predicting football player performance, the ability to forecast player success accurately is of paramount importance for teams, managers, and fans. This study introduces an elaborate examination of predicting football player performance through the integration of data visualization methods and machine learning algorithms. The research entails the compilation of an extensive dataset comprising player attributes, conducting data preprocessing, feature selection, model selection, and model training to construct predictive models. The analysis within this study will involve delving into feature significance using methodologies like Select Best and Recursive Feature Elimination (RFE) to pinpoint pertinent attributes for predicting player performance. Various machine learning algorithms, including Random Forest, Decision Tree, Linear Regression, Support Vector Regression (SVR), and Artificial Neural Networks (ANN), will be explored to develop predictive models. The evaluation of each model's performance utilizing metrics such as Mean Squared Error (MSE) and R-squared will be executed to gauge their efficacy in predicting player performance. Furthermore, this investigation will encompass a top player analysis to recognize the top-performing players based on the anticipated overall performance scores. Nationality analysis will entail scrutinizing the player distribution based on nationality and investigating potential correlations between nationality and player performance. Positional analysis will concentrate on examining the player distribution across various positions and assessing the average performance of players in each position. Age analysis will evaluate the influence of age on player performance and identify any discernible trends or patterns associated with player age groups. The primary objective is to predict a football player's overall performance accurately based on their individual attributes, leveraging data-driven insights to enrich the comprehension of player success on the field. By amalgamating data visualization and machine learning methodologies, the aim is to furnish valuable tools for teams, managers, and fans to effectively analyze and forecast player performance. This research contributes to the progression of sports analytics by showcasing the potential of machine learning in predicting football player performance and offering actionable insights for diverse stakeholders in the football industry.

Keywords: football analytics, player performance prediction, data visualization, machine learning algorithms, random forest, decision tree, linear regression, support vector regression, artificial neural networks, model evaluation, top player analysis, nationality analysis, positional analysis

Procedia PDF Downloads 16
98 Covid Medical Imaging Trial: Utilising Artificial Intelligence to Identify Changes on Chest X-Ray of COVID

Authors: Leonard Tiong, Sonit Singh, Kevin Ho Shon, Sarah Lewis

Abstract:

Investigation into the use of artificial intelligence in radiology continues to develop at a rapid rate. During the coronavirus pandemic, the combination of an exponential increase in chest x-rays and unpredictable staff shortages resulted in a huge strain on the department's workload. There is a World Health Organisation estimate that two-thirds of the global population does not have access to diagnostic radiology. Therefore, there could be demand for a program that could detect acute changes in imaging compatible with infection to assist with screening. We generated a conventional neural network and tested its efficacy in recognizing changes compatible with coronavirus infection. Following ethics approval, a deidentified set of 77 normal and 77 abnormal chest x-rays in patients with confirmed coronavirus infection were used to generate an algorithm that could train, validate and then test itself. DICOM and PNG image formats were selected due to their lossless file format. The model was trained with 100 images (50 positive, 50 negative), validated against 28 samples (14 positive, 14 negative), and tested against 26 samples (13 positive, 13 negative). The initial training of the model involved training a conventional neural network in what constituted a normal study and changes on the x-rays compatible with coronavirus infection. The weightings were then modified, and the model was executed again. The training samples were in batch sizes of 8 and underwent 25 epochs of training. The results trended towards an 85.71% true positive/true negative detection rate and an area under the curve trending towards 0.95, indicating approximately 95% accuracy in detecting changes on chest X-rays compatible with coronavirus infection. Study limitations include access to only a small dataset and no specificity in the diagnosis. Following a discussion with our programmer, there are areas where modifications in the weighting of the algorithm can be made in order to improve the detection rates. Given the high detection rate of the program, and the potential ease of implementation, this would be effective in assisting staff that is not trained in radiology in detecting otherwise subtle changes that might not be appreciated on imaging. Limitations include the lack of a differential diagnosis and application of the appropriate clinical history, although this may be less of a problem in day-to-day clinical practice. It is nonetheless our belief that implementing this program and widening its scope to detecting multiple pathologies such as lung masses will greatly assist both the radiology department and our colleagues in increasing workflow and detection rate.

Keywords: artificial intelligence, COVID, neural network, machine learning

Procedia PDF Downloads 66
97 Cross-Validation of the Data Obtained for ω-6 Linoleic and ω-3 α-Linolenic Acids Concentration of Hemp Oil Using Jackknife and Bootstrap Resampling

Authors: Vibha Devi, Shabina Khanam

Abstract:

Hemp (Cannabis sativa) possesses a rich content of ω-6 linoleic and ω-3 linolenic essential fatty acid in the ratio of 3:1, which is a rare and most desired ratio that enhances the quality of hemp oil. These components are beneficial for the development of cell and body growth, strengthen the immune system, possess anti-inflammatory action, lowering the risk of heart problem owing to its anti-clotting property and a remedy for arthritis and various disorders. The present study employs supercritical fluid extraction (SFE) approach on hemp seed at various conditions of parameters; temperature (40 - 80) °C, pressure (200 - 350) bar, flow rate (5 - 15) g/min, particle size (0.430 - 1.015) mm and amount of co-solvent (0 - 10) % of solvent flow rate through central composite design (CCD). CCD suggested 32 sets of experiments, which was carried out. As SFE process includes large number of variables, the present study recommends the application of resampling techniques for cross-validation of the obtained data. Cross-validation refits the model on each data to achieve the information regarding the error, variability, deviation etc. Bootstrap and jackknife are the most popular resampling techniques, which create a large number of data through resampling from the original dataset and analyze these data to check the validity of the obtained data. Jackknife resampling is based on the eliminating one observation from the original sample of size N without replacement. For jackknife resampling, the sample size is 31 (eliminating one observation), which is repeated by 32 times. Bootstrap is the frequently used statistical approach for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. For bootstrap resampling, the sample size is 32, which was repeated by 100 times. Estimands for these resampling techniques are considered as mean, standard deviation, variation coefficient and standard error of the mean. For ω-6 linoleic acid concentration, mean value was approx. 58.5 for both resampling methods, which is the average (central value) of the sample mean of all data points. Similarly, for ω-3 linoleic acid concentration, mean was observed as 22.5 through both resampling. Variance exhibits the spread out of the data from its mean. Greater value of variance exhibits the large range of output data, which is 18 for ω-6 linoleic acid (ranging from 48.85 to 63.66 %) and 6 for ω-3 linoleic acid (ranging from 16.71 to 26.2 %). Further, low value of standard deviation (approx. 1 %), low standard error of the mean (< 0.8) and low variance coefficient (< 0.2) reflect the accuracy of the sample for prediction. All the estimator value of variance coefficients, standard deviation and standard error of the mean are found within the 95 % of confidence interval.

Keywords: resampling, supercritical fluid extraction, hemp oil, cross-validation

Procedia PDF Downloads 121
96 Detailed Ichnofacies and Sedimentological Analysis of the Cambrian Succession (Tal Group) of the Nigalidhar Syncline, Lesser Himalaya, India and the Interpretation of Its Palaeoenvironment

Authors: C. A. Sharma, Birendra P. Singh

Abstract:

Ichnofacies analysis is considered the best paleontological tool for interpreting ancient depositional environments. Nineteen (19) ichnogenera (namely: Bergaueria, Catenichnus, Cochlichnus, Cruziana, Diplichnites, Dimorphichnus, Diplocraterion, Gordia, Guanshanichnus, Lockeia, Merostomichnites, Monomorphichnus, Palaeophycus, Phycodes, Planolites, Psammichnites, Rusophycus, Skolithos and Treptichnus) are recocered from the Tal Group (Cambrian) of the Nigalidhar Syncline. The stratigraphic occurrences of these ichnogenera represent alternating proximal Cruziana and Skolithos ichnofacies along the contact of Sankholi and Koti-Dhaman formations of the Tal Group. Five ichnogenera namely Catenichnus, Guanshanichnus, Lockeia, Merostomichnites and Psammichnites are recorded for the first time from the Nigalidhar Syncline. Cruziana ichnofacies is found in the upper part of the Sankholi Formation to the lower part of the Koti Dhaman Formation in the NigaliDhar Syncline. The preservational characters here indicate a subtidal environmental condition with poorly sorted, unconsolidated substrate. Depositional condition ranging from moderate to high energy levels below the fair weather base but above the storm wave base under nearshore to foreshore setting in a wave dominated shallow water environment is also indicated. The proximal Cruziana-ichnofacies is interrupted by the Skolithos ichnofacies in the Tal Group of the Nigalidhar Syncline which indicate fluctuating high energy condition which was unfavorable for the opportunistic organism which were dominant during the proximal Cruziana ichnofacies. The excursion of Skolithos ichnofacies (as a pipe rock in the upper part of Sankholi Formation) into the proximal Cruziana ichnofacies in the Tal Group indicate that increased energy and allied parameters attributed to the high rate of sedimentation near the proximal part of the basin. The level bearing the Skolithos ichnofacies in the Nigalidhar Syncline at the juncture of Sankholi and Koti-Dhaman formations can be correlated to the level marked as unconformity in between the Deo-Ka-Tibba and the Dhaulagiri formations by the conglomeratic horizon in the Mussoorie Syncline, Lesser Himalaya, India. Thus, the Tal Group of the Nigalidhar syncline at this stratigraphic level represent slightly deeper water condition than the Mussoorie Syncline, where in the later the aerial exposure dominated which leads to the deposition of conglomeratic horizon and subsequent formation of unconformity. The overall ichnological and sedimentological dataset allow us to infer that the Cambrian successions of Nigalidhar Syncline were deposited in a wave-dominated proximal part of the basin under the foreshore to close to upper shoreface regimes of the shallow marine setting.

Keywords: Cambrian, Ichnofacies, Lesser Himalaya, Nigalidhar, Tal Group

Procedia PDF Downloads 224
95 Unequal Traveling: How School District System and School District Housing Characteristics Shape the Duration of Families Commuting

Authors: Geyang Xia

Abstract:

In many countries, governments have responded to the growing demand for educational resources through school district systems, and there is substantial evidence that school district systems have been effective in promoting inter-district and inter-school equity in educational resources. However, the scarcity of quality educational resources has brought about varying levels of education among different school districts, making it a common choice for many parents to buy a house in the school district where a quality school is located, and they are even willing to bear huge commuting costs for this purpose. Moreover, this is evidenced by the fact that parents of families in school districts with quality education resources have longer average commute lengths and longer average commute distances than parents in average school districts. This "unequal traveling" under the influence of the school district system is more common in school districts at the primary level of education. This further reinforces the differential hierarchy of educational resources and raises issues of inequitable educational public services, education-led residential segregation, and gentrification of school district housing. Against this background, this paper takes Nanjing, a famous educational city in China, as a case study and selects the school districts where the top 10 public elementary schools are located. The study first identifies the spatio-temporal behavioral trajectory dataset of these high-quality school district households by using spatial vector data, decrypted cell phone signaling data, and census data. Then, by constructing a "house-school-work (HSW)" commuting pattern of the population in the school district where the high-quality educational resources are located, and based on the classification of the HSW commuting pattern of the population, school districts with long employment hours were identified. Ultimately, the mechanisms and patterns inherent in this unequal commuting are analyzed in terms of six aspects, including the centrality of school district location, functional diversity, and accessibility. The results reveal that the "unequal commuting" of Nanjing's high-quality school districts under the influence of the school district system occurs mainly in the peripheral areas of the city, and the schools matched with these high-quality school districts are mostly branches of prestigious schools in the built-up areas of the city's core. At the same time, the centrality of school district location and the diversity of functions are the most important influencing factors of unequal commuting in high-quality school districts. Based on the research results, this paper proposes strategies to optimize the spatial layout of high-quality educational resources and corresponding transportation policy measures.

Keywords: school-district system, high quality school district, commuting pattern, unequal traveling

Procedia PDF Downloads 66
94 Principal Well-Being at Hong Kong: A Quantitative Investigation

Authors: Junjun Chen, Yingxiu Li

Abstract:

The occupational well-being of school principals has played a vital role in the pursuit of individual and school wellness and success. However, principals’ well-being worldwide is under increasing threat because of the challenging and complex nature of their work and growing demands for school standardisation and accountability. Pressure is particularly acute in the post-pandemicfuture as principals attempt to deal with the impact of the pandemic on top of more regular demands. This is particularly true in Hong Kong, as school principals are increasingly wedged between unparalleled political, social, and academic responsibilities. Recognizing the semantic breadth of well-being, scholars have not determined a single, mutually agreeable definition but agreed that the concept of well-being has multiple dimensions across various disciplines. The multidimensional approach promises more precise assessments of the relationships between well-being and other concepts than the ‘affect-only’ approach or other single domains for capturing the essence of principal well-being. The multiple-dimension well-being concept is adopted in this project to understand principal well-being in this study. This study aimed to understand the situation of principal well-being and its influential drivers with a sample of 670 principals from Hong Kong and Mainland China. An online survey was sent to the participants after the breakout of COVID-19 by the researchers. All participants were well informed about the purposes and procedure of the project and the confidentiality of the data prior to filling in the questionnaire. Confirmatory factor analysis and structural equation modelling performed with Mplus were employed to deal with the dataset. The data analysis procedure involved the following three steps. First, the descriptive statistics (e.g., mean and standard deviation) were calculated. Second, confirmatory factor analysis (CFA) was used to trim principal well-being measurement performed with maximum likelihood estimation. Third, structural equation modelling (SEM) was employed to test the influential factors of principal well-being. The results of this study indicated that the overall of principal well-being were above the average mean score. The highest ranking in this study given by the principals was to their psychological and social well-being (M = 5.21). This was followed by spiritual (M = 5.14; SD = .77), cognitive (M = 5.14; SD = .77), emotional (M = 4.96; SD = .79), and physical well-being (M = 3.15; SD = .73). Participants ranked their physical well-being the lowest. Moreover, professional autonomy, supervisor and collegial support, school physical conditions, professional networking, and social media have showed a significant impact on principal well-being. The findings of this study will potentially enhance not only principal well-being, but also the functioning of an individual principal and a school without sacrificing principal well-being for quality education in the process. This will eventually move one step forward for a new future - a wellness society advocated by OECD. Importantly, well-being is an inside job that begins with choosing to have wellness, whilst supports to become a wellness principal are also imperative.

Keywords: well-being, school principals, quantitative, influential factors

Procedia PDF Downloads 58
93 Understanding Evidence Dispersal Caused by the Effects of Using Unmanned Aerial Vehicles in Active Indoor Crime Scenes

Authors: Elizabeth Parrott, Harry Pointon, Frederic Bezombes, Heather Panter

Abstract:

Unmanned aerial vehicles (UAV’s) are making a profound effect within policing, forensic and fire service procedures worldwide. These intelligent devices have already proven useful in photographing and recording large-scale outdoor and indoor sites using orthomosaic and three-dimensional (3D) modelling techniques, for the purpose of capturing and recording sites during and post-incident. UAV’s are becoming an established tool as they are extending the reach of the photographer and offering new perspectives without the expense and restrictions of deploying full-scale aircraft. 3D reconstruction quality is directly linked to the resolution of captured images; therefore, close proximity flights are required for more detailed models. As technology advances deployment of UAVs in confined spaces is becoming more common. With this in mind, this study investigates the effects of UAV operation within active crimes scenes with regard to the dispersal of particulate evidence. To date, there has been little consideration given to the potential effects of using UAV’s within active crime scenes aside from a legislation point of view. Although potentially the technology can reduce the likelihood of contamination by replacing some of the roles of investigating practitioners. There is the risk of evidence dispersal caused by the effect of the strong airflow beneath the UAV, from the downwash of the propellers. The initial results of this study are therefore presented to determine the height of least effect at which to fly, and the commercial propeller type to choose to generate the smallest amount of disturbance from the dataset tested. In this study, a range of commercially available 4-inch propellers were chosen as a starting point due to the common availability and their small size makes them well suited for operation within confined spaces. To perform the testing, a rig was configured to support a single motor and propeller powered with a standalone mains power supply and controlled via a microcontroller. This was to mimic a complete throttle cycle and control the device to ensure repeatability. By removing the variances of battery packs and complex UAV structures to allow for a more robust setup. Therefore, the only changing factors were the propeller and operating height. The results were calculated via computer vision analysis of the recorded dispersal of the sample particles placed below the arm-mounted propeller. The aim of this initial study is to give practitioners an insight into the technology to use when operating within confined spaces as well as recognizing some of the issues caused by UAV’s within active crime scenes.

Keywords: dispersal, evidence, propeller, UAV

Procedia PDF Downloads 140
92 A Visualization Classification Method for Identifying the Decayed Citrus Fruit Infected by Fungi Based on Hyperspectral Imaging

Authors: Jiangbo Li, Wenqian Huang

Abstract:

Early detection of fungal infection in citrus fruit is one of the major problems in the postharvest commercialization process. The automatic and nondestructive detection of infected fruits is still a challenge for the citrus industry. At present, the visual inspection of rotten citrus fruits is commonly performed by workers through the ultraviolet induction fluorescence technology or manual sorting in citrus packinghouses to remove fruit subject with fungal infection. However, the former entails a number of problems because exposing people to this kind of lighting is potentially hazardous to human health, and the latter is very inefficient. Orange is used as a research object. This study would focus on this problem and proposed an effective method based on Vis-NIR hyperspectral imaging in the wavelength range of 400-1000 nm with a spectroscopic resolution of 2.8 nm. In this work, three normalization approaches are applied prior to analysis to reduce the effect of sample curvature on spectral profiles, and it is found that mean normalization was the most effective pretreatment for decreasing spectral variability due to curvature. Then, principal component analysis (PCA) was applied to a dataset composing of average spectra from decayed and normal tissue to reduce the dimensionality of data and observe the ability of Vis-NIR hyper-spectra to discriminate data from two classes. In this case, it was observed that normal and decayed spectra were separable along the resultant first principal component (PC1) axis. Subsequently, five wavelengths (band) centered at 577, 702, 751, 808, and 923 nm were selected as the characteristic wavelengths by analyzing the loadings of PC1. A multispectral combination image was generated based on five selected characteristic wavelength images. Based on the obtained multispectral combination image, the intensity slicing pseudocolor image processing method is used to generate a 2-D visual classification image that would enhance the contrast between normal and decayed tissue. Finally, an image segmentation algorithm for detection of decayed fruit was developed based on the pseudocolor image coupled with a simple thresholding method. For the investigated 238 independent set samples including infected fruits infected by Penicillium digitatum and normal fruits, the total success rate is 100% and 97.5%, respectively, and, the proposed algorithm also used to identify the orange infected by penicillium italicum with a 100% identification accuracy, indicating that the proposed multispectral algorithm here is an effective method and it is potential to be applied in citrus industry.

Keywords: citrus fruit, early rotten, fungal infection, hyperspectral imaging

Procedia PDF Downloads 272
91 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 101
90 Adding a Degree of Freedom to Opinion Dynamics Models

Authors: Dino Carpentras, Alejandro Dinkelberg, Michael Quayle

Abstract:

Within agent-based modeling, opinion dynamics is the field that focuses on modeling people's opinions. In this prolific field, most of the literature is dedicated to the exploration of the two 'degrees of freedom' and how they impact the model’s properties (e.g., the average final opinion, the number of final clusters, etc.). These degrees of freedom are (1) the interaction rule, which determines how agents update their own opinion, and (2) the network topology, which defines the possible interaction among agents. In this work, we show that the third degree of freedom exists. This can be used to change a model's output up to 100% of its initial value or to transform two models (both from the literature) into each other. Since opinion dynamics models are representations of the real world, it is fundamental to understand how people’s opinions can be measured. Even for abstract models (i.e., not intended for the fitting of real-world data), it is important to understand if the way of numerically representing opinions is unique; and, if this is not the case, how the model dynamics would change by using different representations. The process of measuring opinions is non-trivial as it requires transforming real-world opinion (e.g., supporting most of the liberal ideals) to a number. Such a process is usually not discussed in opinion dynamics literature, but it has been intensively studied in a subfield of psychology called psychometrics. In psychometrics, opinion scales can be converted into each other, similarly to how meters can be converted to feet. Indeed, psychometrics routinely uses both linear and non-linear transformations of opinion scales. Here, we analyze how this transformation affects opinion dynamics models. We analyze this effect by using mathematical modeling and then validating our analysis with agent-based simulations. Firstly, we study the case of perfect scales. In this way, we show that scale transformations affect the model’s dynamics up to a qualitative level. This means that if two researchers use the same opinion dynamics model and even the same dataset, they could make totally different predictions just because they followed different renormalization processes. A similar situation appears if two different scales are used to measure opinions even on the same population. This effect may be as strong as providing an uncertainty of 100% on the simulation’s output (i.e., all results are possible). Still, by using perfect scales, we show that scales transformations can be used to perfectly transform one model to another. We test this using two models from the standard literature. Finally, we test the effect of scale transformation in the case of finite precision using a 7-points Likert scale. In this way, we show how a relatively small-scale transformation introduces both changes at the qualitative level (i.e., the most shared opinion at the end of the simulation) and in the number of opinion clusters. Thus, scale transformation appears to be a third degree of freedom of opinion dynamics models. This result deeply impacts both theoretical research on models' properties and on the application of models on real-world data.

Keywords: degrees of freedom, empirical validation, opinion scale, opinion dynamics

Procedia PDF Downloads 102
89 Social Value of Travel Time Savings in Sub-Saharan Africa

Authors: Richard Sogah

Abstract:

The significance of transport infrastructure investments for economic growth and development has been central to the World Bank’s strategy for poverty reduction. Among the conventional surface transport infrastructures, road infrastructure is significant in facilitating the movement of human capital goods and services. When transport projects (i.e., roads, super-highways) are implemented, they come along with some negative social values (costs), such as increased noise and air pollution for local residents living near these facilities, displaced individuals, etc. However, these projects also facilitate better utilization of existing capital stock and generate other observable benefits that can be easily quantified. For example, the improvement or construction of roads creates employment, stimulates revenue generation (toll), reduces vehicle operating costs and accidents, increases accessibility, trade expansion, safety improvement, etc. Aside from these benefits, travel time savings (TTSs) which are the major economic benefits of urban and inter-urban transport projects and therefore integral in the economic assessment of transport projects, are often overlooked and omitted when estimating the benefits of transport projects, especially in developing countries. The absence of current and reliable domestic travel data and the inability of replicated models from the developed world to capture the actual value of travel time savings due to the large unemployment, underemployment, and other labor-induced distortions has contributed to the failure to assign value to travel time savings when estimating the benefits of transport schemes in developing countries. This omission of the value of travel time savings from the benefits of transport projects in developing countries poses problems for investors and stakeholders to either accept or dismiss projects based on schemes that favor reduced vehicular operating costs and other parameters rather than those that ease congestion, increase average speed, facilitate walking and handloading, and thus save travel time. Given the complex reality in the estimation of the value of travel time savings and the presence of widespread informal labour activities in Sub-Saharan Africa, we construct a “nationally ranked distribution of time values” and estimate the value of travel time savings based on the area beneath the distribution. Compared with other approaches, our method captures both formal sector workers and individuals/people who work outside the formal sector and hence changes in their time allocation occur in the informal economy and household production activities. The dataset for the estimations is sourced from the World Bank, the International Labour Organization, etc.

Keywords: road infrastructure, transport projects, travel time savings, congestion, Sub-Sahara Africa

Procedia PDF Downloads 79
88 Homeless Population Modeling and Trend Prediction Through Identifying Key Factors and Machine Learning

Authors: Shayla He

Abstract:

Background and Purpose: According to Chamie (2017), it’s estimated that no less than 150 million people, or about 2 percent of the world’s population, are homeless. The homeless population in the United States has grown rapidly in the past four decades. In New York City, the sheltered homeless population has increased from 12,830 in 1983 to 62,679 in 2020. Knowing the trend on the homeless population is crucial at helping the states and the cities make affordable housing plans, and other community service plans ahead of time to better prepare for the situation. This study utilized the data from New York City, examined the key factors associated with the homelessness, and developed systematic modeling to predict homeless populations of the future. Using the best model developed, named HP-RNN, an analysis on the homeless population change during the months of 2020 and 2021, which were impacted by the COVID-19 pandemic, was conducted. Moreover, HP-RNN was tested on the data from Seattle. Methods: The methodology involves four phases in developing robust prediction methods. Phase 1 gathered and analyzed raw data of homeless population and demographic conditions from five urban centers. Phase 2 identified the key factors that contribute to the rate of homelessness. In Phase 3, three models were built using Linear Regression, Random Forest, and Recurrent Neural Network (RNN), respectively, to predict the future trend of society's homeless population. Each model was trained and tuned based on the dataset from New York City for its accuracy measured by Mean Squared Error (MSE). In Phase 4, the final phase, the best model from Phase 3 was evaluated using the data from Seattle that was not part of the model training and tuning process in Phase 3. Results: Compared to the Linear Regression based model used by HUD et al (2019), HP-RNN significantly improved the prediction metrics of Coefficient of Determination (R2) from -11.73 to 0.88 and MSE by 99%. HP-RNN was then validated on the data from Seattle, WA, which showed a peak %error of 14.5% between the actual and the predicted count. Finally, the modeling results were collected to predict the trend during the COVID-19 pandemic. It shows a good correlation between the actual and the predicted homeless population, with the peak %error less than 8.6%. Conclusions and Implications: This work is the first work to apply RNN to model the time series of the homeless related data. The Model shows a close correlation between the actual and the predicted homeless population. There are two major implications of this result. First, the model can be used to predict the homeless population for the next several years, and the prediction can help the states and the cities plan ahead on affordable housing allocation and other community service to better prepare for the future. Moreover, this prediction can serve as a reference to policy makers and legislators as they seek to make changes that may impact the factors closely associated with the future homeless population trend.

Keywords: homeless, prediction, model, RNN

Procedia PDF Downloads 94
87 Determinants of Never Users of Contraception-Results from Pakistan Demographic and Health Survey 2012-13

Authors: Arsalan Jabbar, Wajiha Javed, Nelofer Mehboob, Zahid Memon

Abstract:

Introduction: There are multiple social, individual and cultural factors that influence an individual’s decision to adopt family planning methods especially among non-users in patriarchal societies like Pakistan.Non-users, if targeted efficiently, can contribute significantly to country’s CPR. A research study showed that non-users if convinced to adopt lactational amenorrhea method can shift to long-term methods in future. Research shows that if non-users are targeted efficiently a 59% reduction in unintended pregnancies in Saharan Africa and South-Central and South-East Asia is anticipated. Methods: We did secondary data analysis on Pakistan Demographic Heath Survey (2012-13) dataset. Use of contraception (never-use/ever-use) was the outcome variable. At univariate level Chi-square/Fisher Exact test was used to assess relationship of baseline covariates with contraception use. Then variables to be incorporated in the model were checked for multi-collinearity, confounding, and interaction. Then binary logistic regression (with an urban-rural stratification) was done to find the relationship between contraception use and baseline demographic and social variables. Results: The multivariate analyses of the study showed that younger women (≤ 29 years) were more prone to be never users as compared to those who were > 30 years and this trend was seen in urban areas (AOR 1.92, CI 1.453-2.536) as well as rural areas (AOR 1.809, CI 1.421-2.303). While looking at regional variation, women from urban Sindh (AOR 1.548, CI 1.142-2.099) and urban Balochistan (AOR 2.403, CI 1.504-3.839) had more never users as compared to other urban regions. Women in the rich wealth quintile were more never users and this was seen both in urban and rural localities (urban (AOR 1.106 CI .753-1.624); rural areas (AOR 1.162, CI .887-1.524)) even though these were not statistically significant. Women idealizing more children(> 4) are more never users as compared to those idealizing less children in both urban (AOR 1.854, CI 1.275-2.697) and rural areas (AOR 2.101, CI 1.514-2.916). Women who never lost a pregnancy were more inclined to be non-users in rural areas (AOR 1.394, CI 1.127-1.723) .Women familiar with only traditional or no method had more never users in rural areas (AOR 1.717, CI 1.127-1.723) but in urban areas it wasn’t significant. Women unaware of Lady Health Worker’s presence in their area were more never users especially in rural areas (AOR 1.276, CI 1.014-1.607). Women who did not visit any care provider were more never users (urban (AOR 11.738, CI 9.112-15.121) rural areas (AOR 7.832, CI 6.243-9.826)). Discussion/Conclusion: This study concluded that government, policy makers and private sector family planning programs should focus on the untapped pool of never users (younger women from underserved provinces, in higher wealth quintiles, who desire more children.). We need to make sure to cover catchment areas where there are less LHWs and less providers as ignorance to modern methods and never been visited by an LHW are important determinants of never use. This all is in sync with previous literate from similar developing countries.

Keywords: contraception, demographic and health survey, family planning, never users

Procedia PDF Downloads 384