Search results for: Imbalanced dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1142

Search results for: Imbalanced dataset

122 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources

Authors: Mustafa Alhamdi

Abstract:

Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.

Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification

Procedia PDF Downloads 123
121 Systematic Evaluation of Convolutional Neural Network on Land Cover Classification from Remotely Sensed Images

Authors: Eiman Kattan, Hong Wei

Abstract:

In using Convolutional Neural Network (CNN) for classification, there is a set of hyperparameters available for the configuration purpose. This study aims to evaluate the impact of a range of parameters in CNN architecture i.e. AlexNet on land cover classification based on four remotely sensed datasets. The evaluation tests the influence of a set of hyperparameters on the classification performance. The parameters concerned are epoch values, batch size, and convolutional filter size against input image size. Thus, a set of experiments were conducted to specify the effectiveness of the selected parameters using two implementing approaches, named pertained and fine-tuned. We first explore the number of epochs under several selected batch size values (32, 64, 128 and 200). The impact of kernel size of convolutional filters (1, 3, 5, 7, 10, 15, 20, 25 and 30) was evaluated against the image size under testing (64, 96, 128, 180 and 224), which gave us insight of the relationship between the size of convolutional filters and image size. To generalise the validation, four remote sensing datasets, AID, RSD, UCMerced and RSCCN, which have different land covers and are publicly available, were used in the experiments. These datasets have a wide diversity of input data, such as number of classes, amount of labelled data, and texture patterns. A specifically designed interactive deep learning GPU training platform for image classification (Nvidia Digit) was employed in the experiments. It has shown efficiency in both training and testing. The results have shown that increasing the number of epochs leads to a higher accuracy rate, as expected. However, the convergence state is highly related to datasets. For the batch size evaluation, it has shown that a larger batch size slightly decreases the classification accuracy compared to a small batch size. For example, selecting the value 32 as the batch size on the RSCCN dataset achieves the accuracy rate of 90.34 % at the 11th epoch while decreasing the epoch value to one makes the accuracy rate drop to 74%. On the other extreme, setting an increased value of batch size to 200 decreases the accuracy rate at the 11th epoch is 86.5%, and 63% when using one epoch only. On the other hand, selecting the kernel size is loosely related to data set. From a practical point of view, the filter size 20 produces 70.4286%. The last performed image size experiment shows a dependency in the accuracy improvement. However, an expensive performance gain had been noticed. The represented conclusion opens the opportunities toward a better classification performance in various applications such as planetary remote sensing.

Keywords: CNNs, hyperparamters, remote sensing, land cover, land use

Procedia PDF Downloads 147
120 Cartographic Depiction and Visualization of Wetlands Changes in the North-Western States of India

Authors: Bansal Ashwani

Abstract:

Cartographic depiction and visualization of wetland changes is an important tool to map spatial-temporal information about the wetland dynamics effectively and to comprehend the response of these water bodies in maintaining the groundwater and surrounding ecosystem. This is true for the states of North Western India, i.e., J&K, Himachal, Punjab, and Haryana that are bestowed upon with several natural wetlands in the flood plains or on the courses of its rivers. Thus, the present study documents, analyses and reconstructs the lost wetlands, which existed in the flood plains of the major river basins of these states, i.e., Chenab, Jhelum, Satluj, Beas, Ravi, and Ghagar, in the beginning of the 20th century. To achieve the objective, the study has used multi-temporal datasets since the 1960s using high to medium resolution satellite datasets, e.g., Corona (1960s/70s), Landsat (1990s-2017) and Sentinel (2017). The Sentinel (2017) satellite image has been used for making the wetland inventory owing to its comparatively higher spatial resolution with multi-spectral bands. In addition, historical records, repeated photographs, historical maps, field observations including geomorphological evidence were also used. The water index techniques, i.e., band rationing, normalized difference water index (NDWI), modified NDWI (MNDWI) have been compared and used to map the wetlands. The wetland types found in the north-western states have been categorized under 19 classes suggested by Space Application Centre, India. These enable the researcher to provide with the wetlands inventory and a series of cartographic representation that includes overlaying multiple temporal wetlands extent vectors. A preliminary result shows the general state of wetland shrinkage since the 1960s with varying area shrinkage rate from one wetland to another. In addition, it is observed that majority of wetlands have not been documented so far and even do not have names. Moreover, the purpose is to emphasize their elimination in addition to establishing a baseline dataset that can be a tool for wetland planning and management. Finally, the applicability of cartographic depiction and visualization, historical map sources, repeated photographs and remote sensing data for reconstruction of long term wetlands fluctuations, especially in the northern part of India, will be addressed.

Keywords: cartographic depiction and visualization, wetland changes, NDWI/MDWI, geomorphological evidence and remote sensing

Procedia PDF Downloads 232
119 Escalation of Commitment and Turnover in Top Management Teams

Authors: Dmitriy V. Chulkov

Abstract:

Escalation of commitment is defined as continuation of a project after receiving negative information about it. While literature in management and psychology identified various factors contributing to escalation behavior, this phenomenon has received little analysis in economics, potentially due to the apparent irrationality of escalation. In this study, we present an economic model of escalation with asymmetric information in a principal-agent setup where the agents are responsible for a project selection decision and discover the outcome of the project before the principal. Our theoretical model complements the existing literature on several accounts. First, we link the incentive to escalate commitment to a project with the turnover decision by the manager. When a manager learns the outcome of the project and stops it that reveals that a mistake was made. There is an incentive to continue failing projects and avoid admitting the mistake. This incentive is enhanced when the agent may voluntarily resign from the firm before the outcome of the failing project is revealed, and thus not bear the full extent of reputation damage due to project failure. As long as some successful managers leave the firm for extraneous reasons, outside firms find it difficult to link failing projects with certainty to managers that left a firm. Second, we demonstrate that non-CEO managers have reputation concerns separate from those of the CEO, and thus may escalate commitment to projects they oversee, when such escalation can attenuate damage to reputation from impending project failure. Such incentive for escalation will be present for non-CEO managers if the CEO delegates responsibility for a project to a non-CEO executive. If reputation matters for promotion to the CEO, the incentive for a rising executive to escalate in order to protect reputation is distinct from that of a CEO. Third, our theoretical model is supported by empirical analysis of changes in the firm’s operations measured by the presence of discontinued operations at the time of turnover among the top four members of the top management team. Discontinued operations are indicative of termination of failing projects at a firm. The empirical results demonstrate that in a large dataset of over three thousand publicly traded U.S. firms for a period from 1993 to 2014 turnover by top executives significantly increases the likelihood that the firm discontinues operations. Furthermore, the type of turnover matters as this effect is strongest when at least one non-CEO member of the top management team leaves the firm and when the CEO departure is due to a voluntary resignation and not to a retirement or illness. Empirical results are consistent with the predictions of the theoretical model and suggest that escalation of commitment is primarily observed in decisions by non-CEO members of the top management team.

Keywords: discontinued operations, escalation of commitment, executive turnover, top management teams

Procedia PDF Downloads 346
118 Application of Improved Semantic Communication Technology in Remote Sensing Data Transmission

Authors: Tingwei Shu, Dong Zhou, Chengjun Guo

Abstract:

Semantic communication is an emerging form of communication that realize intelligent communication by extracting semantic information of data at the source and transmitting it, and recovering the data at the receiving end. It can effectively solve the problem of data transmission under the situation of large data volume, low SNR and restricted bandwidth. With the development of Deep Learning, semantic communication further matures and is gradually applied in the fields of the Internet of Things, Uumanned Air Vehicle cluster communication, remote sensing scenarios, etc. We propose an improved semantic communication system for the situation where the data volume is huge and the spectrum resources are limited during the transmission of remote sensing images. At the transmitting, we need to extract the semantic information of remote sensing images, but there are some problems. The traditional semantic communication system based on Convolutional Neural Network cannot take into account the global semantic information and local semantic information of the image, which results in less-than-ideal image recovery at the receiving end. Therefore, we adopt the improved vision-Transformer-based structure as the semantic encoder instead of the mainstream one using CNN to extract the image semantic features. In this paper, we first perform pre-processing operations on remote sensing images to improve the resolution of the images in order to obtain images with more semantic information. We use wavelet transform to decompose the image into high-frequency and low-frequency components, perform bilinear interpolation on the high-frequency components and bicubic interpolation on the low-frequency components, and finally perform wavelet inverse transform to obtain the preprocessed image. We adopt the improved Vision-Transformer structure as the semantic coder to extract and transmit the semantic information of remote sensing images. The Vision-Transformer structure can better train the huge data volume and extract better image semantic features, and adopt the multi-layer self-attention mechanism to better capture the correlation between semantic features and reduce redundant features. Secondly, to improve the coding efficiency, we reduce the quadratic complexity of the self-attentive mechanism itself to linear so as to improve the image data processing speed of the model. We conducted experimental simulations on the RSOD dataset and compared the designed system with a semantic communication system based on CNN and image coding methods such as BGP and JPEG to verify that the method can effectively alleviate the problem of excessive data volume and improve the performance of image data communication.

Keywords: semantic communication, transformer, wavelet transform, data processing

Procedia PDF Downloads 55
117 Applying GIS Geographic Weighted Regression Analysis to Assess Local Factors Impeding Smallholder Farmers from Participating in Agribusiness Markets: A Case Study of Vihiga County, Western Kenya

Authors: Mwehe Mathenge, Ben G. J. S. Sonneveld, Jacqueline E. W. Broerse

Abstract:

Smallholder farmers are important drivers of agriculture productivity, food security, and poverty reduction in Sub-Saharan Africa. However, they are faced with myriad challenges in their efforts at participating in agribusiness markets. How the geographic explicit factors existing at the local level interact to impede smallholder farmers' decision to participates (or not) in agribusiness markets is not well understood. Deconstructing the spatial complexity of the local environment could provide a deeper insight into how geographically explicit determinants promote or impede resource-poor smallholder farmers from participating in agribusiness. This paper’s objective was to identify, map, and analyze local spatial autocorrelation in factors that impede poor smallholders from participating in agribusiness markets. Data were collected using geocoded researcher-administered survey questionnaires from 392 households in Western Kenya. Three spatial statistics methods in geographic information system (GIS) were used to analyze data -Global Moran’s I, Cluster and Outliers Analysis (Anselin Local Moran’s I), and geographically weighted regression. The results of Global Moran’s I reveal the presence of spatial patterns in the dataset that was not caused by spatial randomness of data. Subsequently, Anselin Local Moran’s I result identified spatially and statistically significant local spatial clustering (hot spots and cold spots) in factors hindering smallholder participation. Finally, the geographically weighted regression results unearthed those specific geographic explicit factors impeding market participation in the study area. The results confirm that geographically explicit factors are indispensable in influencing the smallholder farming decisions, and policymakers should take cognizance of them. Additionally, this research demonstrated how geospatial explicit analysis conducted at the local level, using geographically disaggregated data, could help in identifying households and localities where the most impoverished and resource-poor smallholder households reside. In designing spatially targeted interventions, policymakers could benefit from geospatial analysis methods in understanding complex geographic factors and processes that interact to influence smallholder farmers' decision-making processes and choices.

Keywords: agribusiness markets, GIS, smallholder farmers, spatial statistics, disaggregated spatial data

Procedia PDF Downloads 116
116 Real Estate Trend Prediction with Artificial Intelligence Techniques

Authors: Sophia Liang Zhou

Abstract:

For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three-time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy ( > 95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as the most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have a significant influence on the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed the best stable performance overall, with accuracies > 95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.

Keywords: linear regression, random forest, artificial neural network, real estate price prediction

Procedia PDF Downloads 80
115 Frequent Pattern Mining for Digenic Human Traits

Authors: Atsuko Okazaki, Jurg Ott

Abstract:

Some genetic diseases (‘digenic traits’) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa (a genetic form of blindness) occur in the presence of two mutant variants, one in the ROM1 gene and one in the RDS gene, while the occurrence of only one of these mutant variants leads to a completely normal phenotype. Detecting such digenic traits by genetic methods is difficult. A common approach to finding disease-causing variants is to compare 100,000s of variants between individuals with a trait (cases) and those without the trait (controls). Such genome-wide association studies (GWASs) have been very successful but hinge on genetic effects of single variants, that is, there should be a difference in allele or genotype frequencies between cases and controls at a disease-causing variant. Frequent pattern mining (FPM) methods offer an avenue at detecting digenic traits even in the absence of single-variant effects. The idea is to enumerate pairs of genotypes (genotype patterns) with each of the two genotypes originating from different variants that may be located at very different genomic positions. What is needed is for genotype patterns to be significantly more common in cases than in controls. Let Y = 2 refer to cases and Y = 1 to controls, with X denoting a specific genotype pattern. We are seeking association rules, ‘X → Y’, with high confidence, P(Y = 2|X), significantly higher than the proportion of cases, P(Y = 2) in the study. Clearly, generally available FPM methods are very suitable for detecting disease-associated genotype patterns. We use fpgrowth as the basic FPM algorithm and built a framework around it to enumerate high-frequency digenic genotype patterns and to evaluate their statistical significance by permutation analysis. Application to a published dataset on opioid dependence furnished results that could not be found with classical GWAS methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. The single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease. An algorithm called Multifactor Dimension Reduction (MDR) was developed some 20 years ago and has been in use in human genetics ever since. This and our algorithms share some similar properties, but they are also very different in other respects. The main difference seems to be that our algorithm focuses on patterns of genotypes while the main object of inference in MDR is the 3 × 3 table of genotypes at two variants.

Keywords: digenic traits, DNA variants, epistasis, statistical genetics

Procedia PDF Downloads 102
114 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study

Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell

Abstract:

Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.

Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout

Procedia PDF Downloads 160
113 The Trade Flow of Small Association Agreements When Rules of Origin Are Relaxed

Authors: Esmat Kamel

Abstract:

This paper aims to shed light on the extent to which the Agadir Association agreement has fostered inter regional trade between the E.U_26 and the Agadir_4 countries; once that we control for the evolution of Agadir agreement’s exports to the rest of the world. The next valid question will be regarding any remarkable variation in the spatial/sectoral structure of exports, and to what extent has it been induced by the Agadir agreement itself and precisely after the adoption of rules of origin and the PANEURO diagonal cumulative scheme? The paper’s empirical dataset covering a timeframe from [2000 -2009] was designed to account for sector specific export and intermediate flows and the bilateral structured gravity model was custom tailored to capture sector and regime specific rules of origin and the Poisson Pseudo Maximum Likelihood Estimator was used to calculate the gravity equation. The methodological approach of this work is considered to be a threefold one which starts first by conducting a ‘Hierarchal Cluster Analysis’ to classify final export flows showing a certain degree of linkage between each other. The analysis resulted in three main sectoral clusters of exports between Agadir_4 and E.U_26: cluster 1 for Petrochemical related sectors, cluster 2 durable goods and finally cluster 3 for heavy duty machinery and spare parts sectors. Second step continues by taking export flows resulting from the 3 clusters to be subject to treatment with diagonal Rules of origin through ‘The Double Differences Approach’, versus an equally comparable untreated control group. Third step is to verify results through a robustness check applied by ‘Propensity Score Matching’ to validate that the same sectoral final export and intermediate flows increased when rules of origin were relaxed. Through all the previous analysis, a remarkable and partial significance of the interaction term combining both treatment effects and time for the coefficients of 13 out of the 17 covered sectors turned out to be partially significant and it further asserted that treatment with diagonal rules of origin contributed in increasing Agadir’s_4 final and intermediate exports to the E.U._26 on average by 335% and in changing Agadir_4 exports structure and composition to the E.U._26 countries.

Keywords: agadir association agreement, structured gravity model, hierarchal cluster analysis, double differences estimation, propensity score matching, diagonal and relaxed rules of origin

Procedia PDF Downloads 299
112 Development of a Bi-National Thyroid Cancer Clinical Quality Registry

Authors: Liane J. Ioannou, Jonathan Serpell, Joanne Dean, Cino Bendinelli, Jenny Gough, Dean Lisewski, Julie Miller, Win Meyer-Rochow, Stan Sidhu, Duncan Topliss, David Walters, John Zalcberg, Susannah Ahern

Abstract:

Background: The occurrence of thyroid cancer is increasing throughout the developed world, including Australia and New Zealand, and since the 1990s has become the fastest increasing malignancy. Following the success of a number of institutional databases that monitor outcomes after thyroid surgery, the Australian and New Zealand Endocrine Surgeons (ANZES) agreed to auspice the development of a bi-national thyroid cancer registry. Objectives: To establish a bi-national population-based clinical quality registry with the aim of monitoring and improving the quality of care provided to patients diagnosed with thyroid cancer in Australia and New Zealand. Patients and Methods: The Australian and New Zealand Thyroid Cancer Registry (ANZTCR) captures clinical data for all patients, over the age of 18 years, diagnosed with thyroid cancer, confirmed by histopathology report, that have been diagnosed, assessed or treated at a contributing hospital. Data is collected by endocrine surgeons using a web-based interface, REDCap, primarily via direct data entry. Results: A multi-disciplinary Steering Committee was formed, and with operational support from Monash University the ANZTCR was established in early 2017. The pilot phase of the registry is currently operating in Victoria, New South Wales, Queensland, Western Australia and South Australia, with over 30 sites expected to come on board across Australia and New Zealand in 2018. A modified-Delphi process was undertaken to determine the key quality indicators to be reported by the registry, and a minimum dataset was developed comprising information regarding thyroid cancer diagnosis, pathology, surgery, and 30-day follow up. Conclusion: There are very few established thyroid cancer registries internationally, yet clinical quality registries have shown valuable outcomes and patient benefits in other cancers. The establishment of the ANZTCR provides the opportunity for Australia and New Zealand to further understand the current practice in the treatment of thyroid cancer and reasons for variation in outcomes. The engagement of endocrine surgeons in supporting this initiative is crucial. While the pilot registry has a focus on early clinical outcomes, it is anticipated that future collection of longer-term outcome data particularly for patients with the poor prognostic disease will add significant further value to the registry.

Keywords: thyroid cancer, clinical registry, population health, quality improvement

Procedia PDF Downloads 171
111 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries

Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman

Abstract:

There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.

Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems

Procedia PDF Downloads 125
110 Multicollinearity and MRA in Sustainability: Application of the Raise Regression

Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez

Abstract:

Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.

Keywords: multicollinearity, MRA, interaction, raise

Procedia PDF Downloads 75
109 R Statistical Software Applied in Reliability Analysis: Case Study of Diesel Generator Fans

Authors: Jelena Vucicevic

Abstract:

Reliability analysis represents a very important task in different areas of work. In any industry, this is crucial for maintenance, efficiency, safety and monetary costs. There are ways to calculate reliability, unreliability, failure density and failure rate. This paper will try to introduce another way of calculating reliability by using R statistical software. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. The R programming environment is a widely used open source system for statistical analysis and statistical programming. It includes thousands of functions for the implementation of both standard and new statistical methods. R does not limit user only to operation related only to these functions. This program has many benefits over other similar programs: it is free and, as an open source, constantly updated; it has built-in help system; the R language is easy to extend with user-written functions. The significance of the work is calculation of time to failure or reliability in a new way, using statistic. Another advantage of this calculation is that there is no need for technical details and it can be implemented in any part for which we need to know time to fail in order to have appropriate maintenance, but also to maximize usage and minimize costs. In this case, calculations have been made on diesel generator fans but the same principle can be applied to any other part. The data for this paper came from a field engineering study of the time to failure of diesel generator fans. The ultimate goal was to decide whether or not to replace the working fans with a higher quality fan to prevent future failures. Seventy generators were studied. For each one, the number of hours of running time from its first being put into service until fan failure or until the end of the study (whichever came first) was recorded. Dataset consists of two variables: hours and status. Hours show the time of each fan working and status shows the event: 1- failed, 0- censored data. Censored data represent cases when we cannot track the specific case, so it could fail or success. Gaining the result by using R was easy and quick. The program will take into consideration censored data and include this into the results. This is not so easy in hand calculation. For the purpose of the paper results from R program have been compared to hand calculations in two different cases: censored data taken as a failure and censored data taken as a success. In all three cases, results are significantly different. If user decides to use the R for further calculations, it will give more precise results with work on censored data than the hand calculation.

Keywords: censored data, R statistical software, reliability analysis, time to failure

Procedia PDF Downloads 379
108 Secondary Prisonization and Mental Health: A Comparative Study with Elderly Parents of Prisoners Incarcerated in Remote Jails

Authors: Luixa Reizabal, Inaki Garcia, Eneko Sansinenea, Ainize Sarrionandia, Karmele Lopez De Ipina, Elsa Fernandez

Abstract:

Although the effects of incarceration in prisons close to prisoners’ and their families’ residences have been studied, little is known about the effects of remote incarceration. The present study shows the impact of secondary prisonization on mental health of elderly parents of Basque prisoners who are incarcerated in prisons located far away from prisoners’ and their families’ residences. Secondary prisonization refers to the effects that imprisonment of a family member has on relatives. In the study, psychological effects are analyzed by means of comparative methodology. Specifically, levels of psychopathology (depression, anxiety, and stress) and positive mental health (psychological, social, and emotional well-being) are studied in a sample of parents over 65 years old of prisoners incarcerated in prisons located a long distance away (concretely, some of them in a distance of less than 400 km, while others farther than 400 km) from the Basque Country. The dataset consists of data collected through a questionnaire and from a spontaneous speech recording. The statistical and automatic analyses show that levels of psychopathology and positive mental health of elderly parents of prisoners incarcerated in remote jails are affected by the incarceration of their sons or daughters. Concretely, these parents show higher levels of depression, anxiety, and stress and lower levels of emotional (but not psychological or social) wellbeing than parents with no imprisoned daughters or sons. These findings suggest that parents with imprisoned sons or daughters suffer the impact of secondary prisonization on their mental health. When comparing parents with sons or daughters incarcerated within 400 kilometers from home and parents whose sons or daughters are incarcerated farther than 400 kilometers from home, the latter present higher levels of psychopathology, but also higher levels of positive mental health (although the difference between the two groups is not statistically significant). These findings might be explained by resilience. In fact, in traumatic situations, people can develop a force to cope with the situation, and even present a posttraumatic growth. Bearing in mind all these findings, it could be concluded that secondary prisonization implies for elderly parents with sons or daughters incarcerated in remote jails suffering and, in consequence, that changes in the penitentiary policy applied to Basque prisoners are required in order to finish this suffering.

Keywords: automatic spontaneous speech analysis, elderly parents, machine learning, positive mental health, psychopathology, remote incarceration, secondary prisonization

Procedia PDF Downloads 256
107 Predicting Resistance of Commonly Used Antimicrobials in Urinary Tract Infections: A Decision Tree Analysis

Authors: Meera Tandan, Mohan Timilsina, Martin Cormican, Akke Vellinga

Abstract:

Background: In general practice, many infections are treated empirically without microbiological confirmation. Understanding susceptibility of antimicrobials during empirical prescribing can be helpful to reduce inappropriate prescribing. This study aims to apply a prediction model using a decision tree approach to predict the antimicrobial resistance (AMR) of urinary tract infections (UTI) based on non-clinical features of patients over 65 years. Decision tree models are a novel idea to predict the outcome of AMR at an initial stage. Method: Data was extracted from the database of the microbiological laboratory of the University Hospitals Galway on all antimicrobial susceptibility testing (AST) of urine specimens from patients over the age of 65 from January 2011 to December 2014. The primary endpoint was resistance to common antimicrobials (Nitrofurantoin, trimethoprim, ciprofloxacin, co-amoxiclav and amoxicillin) used to treat UTI. A classification and regression tree (CART) model was generated with the outcome ‘resistant infection’. The importance of each predictor (the number of previous samples, age, gender, location (nursing home, hospital, community) and causative agent) on antimicrobial resistance was estimated. Sensitivity, specificity, negative predictive (NPV) and positive predictive (PPV) values were used to evaluate the performance of the model. Seventy-five percent (75%) of the data were used as a training set and validation of the model was performed with the remaining 25% of the dataset. Results: A total of 9805 UTI patients over 65 years had their urine sample submitted for AST at least once over the four years. E.coli, Klebsiella, Proteus species were the most commonly identified pathogens among the UTI patients without catheter whereas Sertia, Staphylococcus aureus; Enterobacter was common with the catheter. The validated CART model shows slight differences in the sensitivity, specificity, PPV and NPV in between the models with and without the causative organisms. The sensitivity, specificity, PPV and NPV for the model with non-clinical predictors was between 74% and 88% depending on the antimicrobial. Conclusion: The CART models developed using non-clinical predictors have good performance when predicting antimicrobial resistance. These models predict which antimicrobial may be the most appropriate based on non-clinical factors. Other CART models, prospective data collection and validation and an increasing number of non-clinical factors will improve model performance. The presented model provides an alternative approach to decision making on antimicrobial prescribing for UTIs in older patients.

Keywords: antimicrobial resistance, urinary tract infection, prediction, decision tree

Procedia PDF Downloads 231
106 Evaluation of Soil Erosion Risk and Prioritization for Implementation of Management Strategies in Morocco

Authors: Lahcen Daoudi, Fatima Zahra Omdi, Abldelali Gourfi

Abstract:

In Morocco, as in most Mediterranean countries, water scarcity is a common situation because of low and unevenly distributed rainfall. The expansions of irrigated lands, as well as the growth of urban and industrial areas and tourist resorts, contribute to an increase of water demand. Therefore in the 1960s Morocco embarked on an ambitious program to increase the number of dams to boost water retention capacity. However, the decrease in the capacity of these reservoirs caused by sedimentation is a major problem; it is estimated at 75 million m3/year. Dams and reservoirs became unusable for their intended purposes due to sedimentation in large rivers that result from soil erosion. Soil erosion presents an important driving force in the process affecting the landscape. It has become one of the most serious environmental problems that raised much interest throughout the world. Monitoring soil erosion risk is an important part of soil conservation practices. The estimation of soil loss risk is the first step for a successful control of water erosion. The aim of this study is to estimate the soil loss risk and its spatial distribution in the different fields of Morocco and to prioritize areas for soil conservation interventions. The approach followed is the Revised Universal Soil Loss Equation (RUSLE) using remote sensing and GIS, which is the most popular empirically based model used globally for erosion prediction and control. This model has been tested in many agricultural watersheds in the world, particularly for large-scale basins due to the simplicity of the model formulation and easy availability of the dataset. The spatial distribution of the annual soil loss was elaborated by the combination of several factors: rainfall erosivity, soil erodability, topography, and land cover. The average annual soil loss estimated in several basins watershed of Morocco varies from 0 to 50t/ha/year. Watersheds characterized by high-erosion-vulnerability are located in the North (Rif Mountains) and more particularly in the Central part of Morocco (High Atlas Mountains). This variation of vulnerability is highly correlated to slope variation which indicates that the topography factor is the main agent of soil erosion within these basin catchments. These results could be helpful for the planning of natural resources management and for implementing sustainable long-term management strategies which are necessary for soil conservation and for increasing over the projected economic life of the dam implemented.

Keywords: soil loss, RUSLE, GIS-remote sensing, watershed, Morocco

Procedia PDF Downloads 435
105 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 93
104 Deep Learning for Image Correction in Sparse-View Computed Tomography

Authors: Shubham Gogri, Lucia Florescu

Abstract:

Medical diagnosis and radiotherapy treatment planning using Computed Tomography (CT) rely on the quantitative accuracy and quality of the CT images. At the same time, requirements for CT imaging include reducing the radiation dose exposure to patients and minimizing scanning time. A solution to this is the sparse-view CT technique, based on a reduced number of projection views. This, however, introduces a new problem— the incomplete projection data results in lower quality of the reconstructed images. To tackle this issue, deep learning methods have been applied to enhance the quality of the sparse-view CT images. A first approach involved employing Mir-Net, a dedicated deep neural network designed for image enhancement. This showed promise, utilizing an intricate architecture comprising encoder and decoder networks, along with the incorporation of the Charbonnier Loss. However, this approach was computationally demanding. Subsequently, a specialized Generative Adversarial Network (GAN) architecture, rooted in the Pix2Pix framework, was implemented. This GAN framework involves a U-Net-based Generator and a Discriminator based on Convolutional Neural Networks. To bolster the GAN's performance, both Charbonnier and Wasserstein loss functions were introduced, collectively focusing on capturing minute details while ensuring training stability. The integration of the perceptual loss, calculated based on feature vectors extracted from the VGG16 network pretrained on the ImageNet dataset, further enhanced the network's ability to synthesize relevant images. A series of comprehensive experiments with clinical CT data were conducted, exploring various GAN loss functions, including Wasserstein, Charbonnier, and perceptual loss. The outcomes demonstrated significant image quality improvements, confirmed through pertinent metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) between the corrected images and the ground truth. Furthermore, learning curves and qualitative comparisons added evidence of the enhanced image quality and the network's increased stability, while preserving pixel value intensity. The experiments underscored the potential of deep learning frameworks in enhancing the visual interpretation of CT scans, achieving outcomes with SSIM values close to one and PSNR values reaching up to 76.

Keywords: generative adversarial networks, sparse view computed tomography, CT image correction, Mir-Net

Procedia PDF Downloads 123
103 Characterization of Mycoplasma Pneumoniae Causing Exacerbation of Asthma: A Prototypical Finding from Sri Lanka

Authors: Lakmini Wijesooriya, Vicki Chalker, Jessica Day, Priyantha Perera, N. P. Sunil-Chandra

Abstract:

M. pneumoniae has been identified as an etiology for exacerbation of asthma (EQA), although viruses play a major role in EOA. M. pneumoniae infection is treated empirically with macrolides, and its antibiotic sensitivity is not detected routinely. Characterization of the organism by genotyping and determination of macrolide resistance is important epidemiologically as it guides the empiric antibiotic treatment. To date, there is no such characterization of M. pneumoniae performed in Sri Lanka. The present study describes the characterization of M. pneumoniae detected from a child with EOA following a screening of 100 children with EOA. Of the hundred children with EOA, M. pneumoniae was identified only in one child by Real-Time polymerase chain reaction (PCR) test for identifying the community-acquired respiratory distress syndrome (CARDS) toxin nucleotide sequences. The M. pneumoniae identified from this patient underwent detection of macrolide resistance via conventional PCR, amplifying and sequencing the region of the 23S rDNA gene that contains single nucleotide polymorphisms that confer resistance. Genotyping of the isolate was performed via nested Multilocus Sequence Typing (MLST) in which eight (8) housekeeping genes (ppa, pgm, gyrB, gmk, glyA, atpA, arcC, and adk) were amplified via nested PCR followed by gene sequencing and analysis. As per MLST analysis, the M. pneumoniae was identified as sequence type 14 (ST14), and no mutations that confer resistance were detected. Resistance to macrolides in M. pneumoniae is an increasing problem globally. Establishing surveillance systems is the key to informing local prescriptions. In the absence of local surveillance data, antibiotics are started empirically. If the relevant microbiological samples are not obtained before antibiotic therapy, as in most occasions in children, the course of antibiotic is completed without a microbiological diagnosis. This happens more frequently in therapy for M. pneumoniae which is treated with a macrolide in most patients. Hence, it is important to understand the macrolide sensitivity of M. pneumoniae in the setting. The M. pneumoniae detected in the present study was macrolide sensitive. Further studies are needed to examine a larger dataset in Sri Lanka to determine macrolide resistance levels to inform the use of macrolides in children with EOA. The MLST type varies in different geographical settings, and it also provides a clue to the existence of macrolide resistance. The present study enhances the database of the global distribution of different genotypes of M. pneumoniae as this is the first such characterization performed with the increased number of samples to determine macrolide resistance level in Sri Lanka. M. pneumoniae detected from a child with exacerbation of asthma in Sri Lanka was characterized as ST14 by MLST and no mutations that confer resistance were detected.

Keywords: mycoplasma pneumoniae, Sri Lanka, characterization, macrolide resistance

Procedia PDF Downloads 158
102 A Narrative Inquiry of Identity Formation of Chinese Fashion Designers

Authors: Lily Ye

Abstract:

The contemporary fashion industry has witnessed the global rise of Chinese fashion designers. China plays more and more important role in this sector globally. One of the key debates in contemporary time is the conception of Chinese fashion. A close look at previous discussions on Chinese fashion reveals that most of them are explored through the lens of cultural knowledge and assumptions, using the dichotomous models of East and West. The results of these studies generate an essentialist and orientalist notion of Chinoiserie and Chinese fashion, which sees individual designers from China as undifferential collective members marked by a unique and fixed set of cultural scripts. This study challenges this essentialist conceptualization and brings fresh insights to the discussion of Chinese fashion identity against the backdrop of globalisation. Different from a culturalist approach to researching Chinese fashion, this paper presents an alternative position to address the research agenda through the mobilisation of Giddens’ (1991) theory of reflexive identity formation, privileging individuals’ agency and reflexivity. This approach to the discussion of identity formation not only challenges the traditional view seeing identity as the distinctive and essential characteristics belonging to any given individual or shared by all members of a particular social category or group but highlights fashion designers’ strategic agency and their role as fashion activist. This study draws evidence from a textual analysis of published stories of a group of established Chinese designers such as Guo Pei, Huishan Zhang, Masha Ma, Uma Wang, and Ma Ke. In line with Giddens’ concept of 'reflexive project of the self', this study uses a narrative methodology. Narratives are verbal accounts or stories relating to experiences of Chinese fashion designers. This approach offers the fashion designers a chance to 'speak' for themselves and show the depths and complexities of their experiences. It also emphasises the nuances of identity formation in fashion designers, whose experiences cannot be captured in neat typologies. Thematic analysis (Braun and Clarke, 2006) is adopted to identify and investigate common themes across the whole dataset. At the centre of the analysis is individuals’ self-articulation of their perceptions, experiences and themselves in relation to culture, fashion and identity. The finding indicates that identity is constructed around anchors such as agency, cultural hybridity, reflexivity and sustainability rather than traditional collective categories such as culture and ethnicity. Thus, the old East-West dichotomy is broken down, and essentialised social categories are challenged by the multiplicity and fragmentation of self and cultural hybridity created within designers’ 'small narratives'.

Keywords: Chinoiserie, fashion identity, fashion activism, narrative inquiry

Procedia PDF Downloads 264
101 Development of a Novel Clinical Screening Tool, Using the BSGE Pain Questionnaire, Clinical Examination and Ultrasound to Predict the Severity of Endometriosis Prior to Laparoscopic Surgery

Authors: Marlin Mubarak

Abstract:

Background: Endometriosis is a complex disabling disease affecting young females in the reproductive period mainly. The aim of this project is to generate a diagnostic model to predict severity and stage of endometriosis prior to Laparoscopic surgery. This will help to improve the pre-operative diagnostic accuracy of stage 3 & 4 endometriosis and as a result, refer relevant women to a specialist centre for complex Laparoscopic surgery. The model is based on the British Society of Gynaecological Endoscopy (BSGE) pain questionnaire, clinical examination and ultrasound scan. Design: This is a prospective, observational, study, in which women completed the BSGE pain questionnaire, a BSGE requirement. Also, as part of the routine preoperative assessment patient had a routine ultrasound scan and when recto-vaginal and deep infiltrating endometriosis was suspected an MRI was performed. Setting: Luton & Dunstable University Hospital. Patients: Symptomatic women (n = 56) scheduled for laparoscopy due to pelvic pain. The age ranged between 17 – 52 years of age (mean 33.8 years, SD 8.7 years). Interventions: None outside the recognised and established endometriosis centre protocol set up by BSGE. Main Outcome Measure(s): Sensitivity and specificity of endometriosis diagnosis predicted by symptoms based on BSGE pain questionnaire, clinical examinations and imaging. Findings: The prevalence of diagnosed endometriosis was calculated to be 76.8% and the prevalence of advanced stage was 55.4%. Deep infiltrating endometriosis in various locations was diagnosed in 32/56 women (57.1%) and some had DIE involving several locations. Logistic regression analysis was performed on 36 clinical variables to create a simple clinical prediction model. After creating the scoring system using variables with P < 0.05, the model was applied to the whole dataset. The sensitivity was 83.87% and specificity 96%. The positive likelihood ratio was 20.97 and the negative likelihood ratio was 0.17, indicating that the model has a good predictive value and could be useful in predicting advanced stage endometriosis. Conclusions: This is a hypothesis-generating project with one operator, but future proposed research would provide validation of the model and establish its usefulness in the general setting. Predictive tools based on such model could help organise the appropriate investigation in clinical practice, reduce risks associated with surgery and improve outcome. It could be of value for future research to standardise the assessment of women presenting with pelvic pain. The model needs further testing in a general setting to assess if the initial results are reproducible.

Keywords: deep endometriosis, endometriosis, minimally invasive, MRI, ultrasound.

Procedia PDF Downloads 328
100 Predicting Football Player Performance: Integrating Data Visualization and Machine Learning

Authors: Saahith M. S., Sivakami R.

Abstract:

In the realm of football analytics, particularly focusing on predicting football player performance, the ability to forecast player success accurately is of paramount importance for teams, managers, and fans. This study introduces an elaborate examination of predicting football player performance through the integration of data visualization methods and machine learning algorithms. The research entails the compilation of an extensive dataset comprising player attributes, conducting data preprocessing, feature selection, model selection, and model training to construct predictive models. The analysis within this study will involve delving into feature significance using methodologies like Select Best and Recursive Feature Elimination (RFE) to pinpoint pertinent attributes for predicting player performance. Various machine learning algorithms, including Random Forest, Decision Tree, Linear Regression, Support Vector Regression (SVR), and Artificial Neural Networks (ANN), will be explored to develop predictive models. The evaluation of each model's performance utilizing metrics such as Mean Squared Error (MSE) and R-squared will be executed to gauge their efficacy in predicting player performance. Furthermore, this investigation will encompass a top player analysis to recognize the top-performing players based on the anticipated overall performance scores. Nationality analysis will entail scrutinizing the player distribution based on nationality and investigating potential correlations between nationality and player performance. Positional analysis will concentrate on examining the player distribution across various positions and assessing the average performance of players in each position. Age analysis will evaluate the influence of age on player performance and identify any discernible trends or patterns associated with player age groups. The primary objective is to predict a football player's overall performance accurately based on their individual attributes, leveraging data-driven insights to enrich the comprehension of player success on the field. By amalgamating data visualization and machine learning methodologies, the aim is to furnish valuable tools for teams, managers, and fans to effectively analyze and forecast player performance. This research contributes to the progression of sports analytics by showcasing the potential of machine learning in predicting football player performance and offering actionable insights for diverse stakeholders in the football industry.

Keywords: football analytics, player performance prediction, data visualization, machine learning algorithms, random forest, decision tree, linear regression, support vector regression, artificial neural networks, model evaluation, top player analysis, nationality analysis, positional analysis

Procedia PDF Downloads 18
99 Covid Medical Imaging Trial: Utilising Artificial Intelligence to Identify Changes on Chest X-Ray of COVID

Authors: Leonard Tiong, Sonit Singh, Kevin Ho Shon, Sarah Lewis

Abstract:

Investigation into the use of artificial intelligence in radiology continues to develop at a rapid rate. During the coronavirus pandemic, the combination of an exponential increase in chest x-rays and unpredictable staff shortages resulted in a huge strain on the department's workload. There is a World Health Organisation estimate that two-thirds of the global population does not have access to diagnostic radiology. Therefore, there could be demand for a program that could detect acute changes in imaging compatible with infection to assist with screening. We generated a conventional neural network and tested its efficacy in recognizing changes compatible with coronavirus infection. Following ethics approval, a deidentified set of 77 normal and 77 abnormal chest x-rays in patients with confirmed coronavirus infection were used to generate an algorithm that could train, validate and then test itself. DICOM and PNG image formats were selected due to their lossless file format. The model was trained with 100 images (50 positive, 50 negative), validated against 28 samples (14 positive, 14 negative), and tested against 26 samples (13 positive, 13 negative). The initial training of the model involved training a conventional neural network in what constituted a normal study and changes on the x-rays compatible with coronavirus infection. The weightings were then modified, and the model was executed again. The training samples were in batch sizes of 8 and underwent 25 epochs of training. The results trended towards an 85.71% true positive/true negative detection rate and an area under the curve trending towards 0.95, indicating approximately 95% accuracy in detecting changes on chest X-rays compatible with coronavirus infection. Study limitations include access to only a small dataset and no specificity in the diagnosis. Following a discussion with our programmer, there are areas where modifications in the weighting of the algorithm can be made in order to improve the detection rates. Given the high detection rate of the program, and the potential ease of implementation, this would be effective in assisting staff that is not trained in radiology in detecting otherwise subtle changes that might not be appreciated on imaging. Limitations include the lack of a differential diagnosis and application of the appropriate clinical history, although this may be less of a problem in day-to-day clinical practice. It is nonetheless our belief that implementing this program and widening its scope to detecting multiple pathologies such as lung masses will greatly assist both the radiology department and our colleagues in increasing workflow and detection rate.

Keywords: artificial intelligence, COVID, neural network, machine learning

Procedia PDF Downloads 67
98 Cross-Validation of the Data Obtained for ω-6 Linoleic and ω-3 α-Linolenic Acids Concentration of Hemp Oil Using Jackknife and Bootstrap Resampling

Authors: Vibha Devi, Shabina Khanam

Abstract:

Hemp (Cannabis sativa) possesses a rich content of ω-6 linoleic and ω-3 linolenic essential fatty acid in the ratio of 3:1, which is a rare and most desired ratio that enhances the quality of hemp oil. These components are beneficial for the development of cell and body growth, strengthen the immune system, possess anti-inflammatory action, lowering the risk of heart problem owing to its anti-clotting property and a remedy for arthritis and various disorders. The present study employs supercritical fluid extraction (SFE) approach on hemp seed at various conditions of parameters; temperature (40 - 80) °C, pressure (200 - 350) bar, flow rate (5 - 15) g/min, particle size (0.430 - 1.015) mm and amount of co-solvent (0 - 10) % of solvent flow rate through central composite design (CCD). CCD suggested 32 sets of experiments, which was carried out. As SFE process includes large number of variables, the present study recommends the application of resampling techniques for cross-validation of the obtained data. Cross-validation refits the model on each data to achieve the information regarding the error, variability, deviation etc. Bootstrap and jackknife are the most popular resampling techniques, which create a large number of data through resampling from the original dataset and analyze these data to check the validity of the obtained data. Jackknife resampling is based on the eliminating one observation from the original sample of size N without replacement. For jackknife resampling, the sample size is 31 (eliminating one observation), which is repeated by 32 times. Bootstrap is the frequently used statistical approach for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. For bootstrap resampling, the sample size is 32, which was repeated by 100 times. Estimands for these resampling techniques are considered as mean, standard deviation, variation coefficient and standard error of the mean. For ω-6 linoleic acid concentration, mean value was approx. 58.5 for both resampling methods, which is the average (central value) of the sample mean of all data points. Similarly, for ω-3 linoleic acid concentration, mean was observed as 22.5 through both resampling. Variance exhibits the spread out of the data from its mean. Greater value of variance exhibits the large range of output data, which is 18 for ω-6 linoleic acid (ranging from 48.85 to 63.66 %) and 6 for ω-3 linoleic acid (ranging from 16.71 to 26.2 %). Further, low value of standard deviation (approx. 1 %), low standard error of the mean (< 0.8) and low variance coefficient (< 0.2) reflect the accuracy of the sample for prediction. All the estimator value of variance coefficients, standard deviation and standard error of the mean are found within the 95 % of confidence interval.

Keywords: resampling, supercritical fluid extraction, hemp oil, cross-validation

Procedia PDF Downloads 122
97 Detailed Ichnofacies and Sedimentological Analysis of the Cambrian Succession (Tal Group) of the Nigalidhar Syncline, Lesser Himalaya, India and the Interpretation of Its Palaeoenvironment

Authors: C. A. Sharma, Birendra P. Singh

Abstract:

Ichnofacies analysis is considered the best paleontological tool for interpreting ancient depositional environments. Nineteen (19) ichnogenera (namely: Bergaueria, Catenichnus, Cochlichnus, Cruziana, Diplichnites, Dimorphichnus, Diplocraterion, Gordia, Guanshanichnus, Lockeia, Merostomichnites, Monomorphichnus, Palaeophycus, Phycodes, Planolites, Psammichnites, Rusophycus, Skolithos and Treptichnus) are recocered from the Tal Group (Cambrian) of the Nigalidhar Syncline. The stratigraphic occurrences of these ichnogenera represent alternating proximal Cruziana and Skolithos ichnofacies along the contact of Sankholi and Koti-Dhaman formations of the Tal Group. Five ichnogenera namely Catenichnus, Guanshanichnus, Lockeia, Merostomichnites and Psammichnites are recorded for the first time from the Nigalidhar Syncline. Cruziana ichnofacies is found in the upper part of the Sankholi Formation to the lower part of the Koti Dhaman Formation in the NigaliDhar Syncline. The preservational characters here indicate a subtidal environmental condition with poorly sorted, unconsolidated substrate. Depositional condition ranging from moderate to high energy levels below the fair weather base but above the storm wave base under nearshore to foreshore setting in a wave dominated shallow water environment is also indicated. The proximal Cruziana-ichnofacies is interrupted by the Skolithos ichnofacies in the Tal Group of the Nigalidhar Syncline which indicate fluctuating high energy condition which was unfavorable for the opportunistic organism which were dominant during the proximal Cruziana ichnofacies. The excursion of Skolithos ichnofacies (as a pipe rock in the upper part of Sankholi Formation) into the proximal Cruziana ichnofacies in the Tal Group indicate that increased energy and allied parameters attributed to the high rate of sedimentation near the proximal part of the basin. The level bearing the Skolithos ichnofacies in the Nigalidhar Syncline at the juncture of Sankholi and Koti-Dhaman formations can be correlated to the level marked as unconformity in between the Deo-Ka-Tibba and the Dhaulagiri formations by the conglomeratic horizon in the Mussoorie Syncline, Lesser Himalaya, India. Thus, the Tal Group of the Nigalidhar syncline at this stratigraphic level represent slightly deeper water condition than the Mussoorie Syncline, where in the later the aerial exposure dominated which leads to the deposition of conglomeratic horizon and subsequent formation of unconformity. The overall ichnological and sedimentological dataset allow us to infer that the Cambrian successions of Nigalidhar Syncline were deposited in a wave-dominated proximal part of the basin under the foreshore to close to upper shoreface regimes of the shallow marine setting.

Keywords: Cambrian, Ichnofacies, Lesser Himalaya, Nigalidhar, Tal Group

Procedia PDF Downloads 228
96 Unequal Traveling: How School District System and School District Housing Characteristics Shape the Duration of Families Commuting

Authors: Geyang Xia

Abstract:

In many countries, governments have responded to the growing demand for educational resources through school district systems, and there is substantial evidence that school district systems have been effective in promoting inter-district and inter-school equity in educational resources. However, the scarcity of quality educational resources has brought about varying levels of education among different school districts, making it a common choice for many parents to buy a house in the school district where a quality school is located, and they are even willing to bear huge commuting costs for this purpose. Moreover, this is evidenced by the fact that parents of families in school districts with quality education resources have longer average commute lengths and longer average commute distances than parents in average school districts. This "unequal traveling" under the influence of the school district system is more common in school districts at the primary level of education. This further reinforces the differential hierarchy of educational resources and raises issues of inequitable educational public services, education-led residential segregation, and gentrification of school district housing. Against this background, this paper takes Nanjing, a famous educational city in China, as a case study and selects the school districts where the top 10 public elementary schools are located. The study first identifies the spatio-temporal behavioral trajectory dataset of these high-quality school district households by using spatial vector data, decrypted cell phone signaling data, and census data. Then, by constructing a "house-school-work (HSW)" commuting pattern of the population in the school district where the high-quality educational resources are located, and based on the classification of the HSW commuting pattern of the population, school districts with long employment hours were identified. Ultimately, the mechanisms and patterns inherent in this unequal commuting are analyzed in terms of six aspects, including the centrality of school district location, functional diversity, and accessibility. The results reveal that the "unequal commuting" of Nanjing's high-quality school districts under the influence of the school district system occurs mainly in the peripheral areas of the city, and the schools matched with these high-quality school districts are mostly branches of prestigious schools in the built-up areas of the city's core. At the same time, the centrality of school district location and the diversity of functions are the most important influencing factors of unequal commuting in high-quality school districts. Based on the research results, this paper proposes strategies to optimize the spatial layout of high-quality educational resources and corresponding transportation policy measures.

Keywords: school-district system, high quality school district, commuting pattern, unequal traveling

Procedia PDF Downloads 69
95 Principal Well-Being at Hong Kong: A Quantitative Investigation

Authors: Junjun Chen, Yingxiu Li

Abstract:

The occupational well-being of school principals has played a vital role in the pursuit of individual and school wellness and success. However, principals’ well-being worldwide is under increasing threat because of the challenging and complex nature of their work and growing demands for school standardisation and accountability. Pressure is particularly acute in the post-pandemicfuture as principals attempt to deal with the impact of the pandemic on top of more regular demands. This is particularly true in Hong Kong, as school principals are increasingly wedged between unparalleled political, social, and academic responsibilities. Recognizing the semantic breadth of well-being, scholars have not determined a single, mutually agreeable definition but agreed that the concept of well-being has multiple dimensions across various disciplines. The multidimensional approach promises more precise assessments of the relationships between well-being and other concepts than the ‘affect-only’ approach or other single domains for capturing the essence of principal well-being. The multiple-dimension well-being concept is adopted in this project to understand principal well-being in this study. This study aimed to understand the situation of principal well-being and its influential drivers with a sample of 670 principals from Hong Kong and Mainland China. An online survey was sent to the participants after the breakout of COVID-19 by the researchers. All participants were well informed about the purposes and procedure of the project and the confidentiality of the data prior to filling in the questionnaire. Confirmatory factor analysis and structural equation modelling performed with Mplus were employed to deal with the dataset. The data analysis procedure involved the following three steps. First, the descriptive statistics (e.g., mean and standard deviation) were calculated. Second, confirmatory factor analysis (CFA) was used to trim principal well-being measurement performed with maximum likelihood estimation. Third, structural equation modelling (SEM) was employed to test the influential factors of principal well-being. The results of this study indicated that the overall of principal well-being were above the average mean score. The highest ranking in this study given by the principals was to their psychological and social well-being (M = 5.21). This was followed by spiritual (M = 5.14; SD = .77), cognitive (M = 5.14; SD = .77), emotional (M = 4.96; SD = .79), and physical well-being (M = 3.15; SD = .73). Participants ranked their physical well-being the lowest. Moreover, professional autonomy, supervisor and collegial support, school physical conditions, professional networking, and social media have showed a significant impact on principal well-being. The findings of this study will potentially enhance not only principal well-being, but also the functioning of an individual principal and a school without sacrificing principal well-being for quality education in the process. This will eventually move one step forward for a new future - a wellness society advocated by OECD. Importantly, well-being is an inside job that begins with choosing to have wellness, whilst supports to become a wellness principal are also imperative.

Keywords: well-being, school principals, quantitative, influential factors

Procedia PDF Downloads 60
94 Understanding Evidence Dispersal Caused by the Effects of Using Unmanned Aerial Vehicles in Active Indoor Crime Scenes

Authors: Elizabeth Parrott, Harry Pointon, Frederic Bezombes, Heather Panter

Abstract:

Unmanned aerial vehicles (UAV’s) are making a profound effect within policing, forensic and fire service procedures worldwide. These intelligent devices have already proven useful in photographing and recording large-scale outdoor and indoor sites using orthomosaic and three-dimensional (3D) modelling techniques, for the purpose of capturing and recording sites during and post-incident. UAV’s are becoming an established tool as they are extending the reach of the photographer and offering new perspectives without the expense and restrictions of deploying full-scale aircraft. 3D reconstruction quality is directly linked to the resolution of captured images; therefore, close proximity flights are required for more detailed models. As technology advances deployment of UAVs in confined spaces is becoming more common. With this in mind, this study investigates the effects of UAV operation within active crimes scenes with regard to the dispersal of particulate evidence. To date, there has been little consideration given to the potential effects of using UAV’s within active crime scenes aside from a legislation point of view. Although potentially the technology can reduce the likelihood of contamination by replacing some of the roles of investigating practitioners. There is the risk of evidence dispersal caused by the effect of the strong airflow beneath the UAV, from the downwash of the propellers. The initial results of this study are therefore presented to determine the height of least effect at which to fly, and the commercial propeller type to choose to generate the smallest amount of disturbance from the dataset tested. In this study, a range of commercially available 4-inch propellers were chosen as a starting point due to the common availability and their small size makes them well suited for operation within confined spaces. To perform the testing, a rig was configured to support a single motor and propeller powered with a standalone mains power supply and controlled via a microcontroller. This was to mimic a complete throttle cycle and control the device to ensure repeatability. By removing the variances of battery packs and complex UAV structures to allow for a more robust setup. Therefore, the only changing factors were the propeller and operating height. The results were calculated via computer vision analysis of the recorded dispersal of the sample particles placed below the arm-mounted propeller. The aim of this initial study is to give practitioners an insight into the technology to use when operating within confined spaces as well as recognizing some of the issues caused by UAV’s within active crime scenes.

Keywords: dispersal, evidence, propeller, UAV

Procedia PDF Downloads 142
93 A Visualization Classification Method for Identifying the Decayed Citrus Fruit Infected by Fungi Based on Hyperspectral Imaging

Authors: Jiangbo Li, Wenqian Huang

Abstract:

Early detection of fungal infection in citrus fruit is one of the major problems in the postharvest commercialization process. The automatic and nondestructive detection of infected fruits is still a challenge for the citrus industry. At present, the visual inspection of rotten citrus fruits is commonly performed by workers through the ultraviolet induction fluorescence technology or manual sorting in citrus packinghouses to remove fruit subject with fungal infection. However, the former entails a number of problems because exposing people to this kind of lighting is potentially hazardous to human health, and the latter is very inefficient. Orange is used as a research object. This study would focus on this problem and proposed an effective method based on Vis-NIR hyperspectral imaging in the wavelength range of 400-1000 nm with a spectroscopic resolution of 2.8 nm. In this work, three normalization approaches are applied prior to analysis to reduce the effect of sample curvature on spectral profiles, and it is found that mean normalization was the most effective pretreatment for decreasing spectral variability due to curvature. Then, principal component analysis (PCA) was applied to a dataset composing of average spectra from decayed and normal tissue to reduce the dimensionality of data and observe the ability of Vis-NIR hyper-spectra to discriminate data from two classes. In this case, it was observed that normal and decayed spectra were separable along the resultant first principal component (PC1) axis. Subsequently, five wavelengths (band) centered at 577, 702, 751, 808, and 923 nm were selected as the characteristic wavelengths by analyzing the loadings of PC1. A multispectral combination image was generated based on five selected characteristic wavelength images. Based on the obtained multispectral combination image, the intensity slicing pseudocolor image processing method is used to generate a 2-D visual classification image that would enhance the contrast between normal and decayed tissue. Finally, an image segmentation algorithm for detection of decayed fruit was developed based on the pseudocolor image coupled with a simple thresholding method. For the investigated 238 independent set samples including infected fruits infected by Penicillium digitatum and normal fruits, the total success rate is 100% and 97.5%, respectively, and, the proposed algorithm also used to identify the orange infected by penicillium italicum with a 100% identification accuracy, indicating that the proposed multispectral algorithm here is an effective method and it is potential to be applied in citrus industry.

Keywords: citrus fruit, early rotten, fungal infection, hyperspectral imaging

Procedia PDF Downloads 277