Search results for: STS benchmark dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1465

Search results for: STS benchmark dataset

85 A Versatile Data Processing Package for Ground-Based Synthetic Aperture Radar Deformation Monitoring

Authors: Zheng Wang, Zhenhong Li, Jon Mills

Abstract:

Ground-based synthetic aperture radar (GBSAR) represents a powerful remote sensing tool for deformation monitoring towards various geohazards, e.g. landslides, mudflows, avalanches, infrastructure failures, and the subsidence of residential areas. Unlike spaceborne SAR with a fixed revisit period, GBSAR data can be acquired with an adjustable temporal resolution through either continuous or discontinuous operation. However, challenges arise from processing high temporal-resolution continuous GBSAR data, including the extreme cost of computational random-access-memory (RAM), the delay of displacement maps, and the loss of temporal evolution. Moreover, repositioning errors between discontinuous campaigns impede the accurate measurement of surface displacements. Therefore, a versatile package with two complete chains is developed in this study in order to process both continuous and discontinuous GBSAR data and address the aforementioned issues. The first chain is based on a small-baseline subset concept and it processes continuous GBSAR images unit by unit. Images within a window form a basic unit. By taking this strategy, the RAM requirement is reduced to only one unit of images and the chain can theoretically process an infinite number of images. The evolution of surface displacements can be detected as it keeps temporarily-coherent pixels which are present only in some certain units but not in the whole observation period. The chain supports real-time processing of the continuous data and the delay of creating displacement maps can be shortened without waiting for the entire dataset. The other chain aims to measure deformation between discontinuous campaigns. Temporal averaging is carried out on a stack of images in a single campaign in order to improve the signal-to-noise ratio of discontinuous data and minimise the loss of coherence. The temporal-averaged images are then processed by a particular interferometry procedure integrated with advanced interferometric SAR algorithms such as robust coherence estimation, non-local filtering, and selection of partially-coherent pixels. Experiments are conducted using both synthetic and real-world GBSAR data. Displacement time series at the level of a few sub-millimetres are achieved in several applications (e.g. a coastal cliff, a sand dune, a bridge, and a residential area), indicating the feasibility of the developed GBSAR data processing package for deformation monitoring of a wide range of scientific and practical applications.

Keywords: ground-based synthetic aperture radar, interferometry, small baseline subset algorithm, deformation monitoring

Procedia PDF Downloads 141
84 Preparation of Papers - Developing a Leukemia Diagnostic System Based on Hybrid Deep Learning Architectures in Actual Clinical Environments

Authors: Skyler Kim

Abstract:

An early diagnosis of leukemia has always been a challenge to doctors and hematologists. On a worldwide basis, it was reported that there were approximately 350,000 new cases in 2012, and diagnosing leukemia was time-consuming and inefficient because of an endemic shortage of flow cytometry equipment in current clinical practice. As the number of medical diagnosis tools increased and a large volume of high-quality data was produced, there was an urgent need for more advanced data analysis methods. One of these methods was the AI approach. This approach has become a major trend in recent years, and several research groups have been working on developing these diagnostic models. However, designing and implementing a leukemia diagnostic system in real clinical environments based on a deep learning approach with larger sets remains complex. Leukemia is a major hematological malignancy that results in mortality and morbidity throughout different ages. We decided to select acute lymphocytic leukemia to develop our diagnostic system since acute lymphocytic leukemia is the most common type of leukemia, accounting for 74% of all children diagnosed with leukemia. The results from this development work can be applied to all other types of leukemia. To develop our model, the Kaggle dataset was used, which consists of 15135 total images, 8491 of these are images of abnormal cells, and 5398 images are normal. In this paper, we design and implement a leukemia diagnostic system in a real clinical environment based on deep learning approaches with larger sets. The proposed diagnostic system has the function of detecting and classifying leukemia. Different from other AI approaches, we explore hybrid architectures to improve the current performance. First, we developed two independent convolutional neural network models: VGG19 and ResNet50. Then, using both VGG19 and ResNet50, we developed a hybrid deep learning architecture employing transfer learning techniques to extract features from each input image. In our approach, fusing the features from specific abstraction layers can be deemed as auxiliary features and lead to further improvement of the classification accuracy. In this approach, features extracted from the lower levels are combined into higher dimension feature maps to help improve the discriminative capability of intermediate features and also overcome the problem of network gradient vanishing or exploding. By comparing VGG19 and ResNet50 and the proposed hybrid model, we concluded that the hybrid model had a significant advantage in accuracy. The detailed results of each model’s performance and their pros and cons will be presented in the conference.

Keywords: acute lymphoblastic leukemia, hybrid model, leukemia diagnostic system, machine learning

Procedia PDF Downloads 175
83 Self-Supervised Learning for Hate-Speech Identification

Authors: Shrabani Ghosh

Abstract:

Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in Gab.ai, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on Gab.ai. This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.

Keywords: attention learning, language model, offensive language detection, self-supervised learning

Procedia PDF Downloads 90
82 The Misuse of Free Cash and Earnings Management: An Analysis of the Extent to Which Board Tenure Mitigates Earnings Management

Authors: Michael McCann

Abstract:

Managerial theories propose that, in joint stock companies, executives may be tempted to waste excess free cash on unprofitable projects to keep control of resources. In order to conceal their projects' poor performance, they may seek to engage in earnings management. On the one hand, managers may manipulate earnings upwards in order to post ‘good’ performances and safeguard their position. On the other, since managers pursuit of unrewarding investments are likely to lead to low long-term profitability, managers will use negative accruals to reduce current year’s earnings, smoothing earnings over time in order to conceal the negative effects. Agency models argue that boards of directors are delegated by shareholders to ensure that companies are governed properly. Part of that responsibility is ensuring the reliability of financial information. Analyses of the impact of board characteristics, particularly board independence on the misuse of free cash flow and earnings management finds conflicting evidence. However, existing characterizations of board independence do not account for such directors gaining firm-specific knowledge over time, influencing their monitoring ability. Further, there is little analysis of the influence of the relative experience of independent directors and executives on decisions surrounding the use of free cash. This paper contributes to this literature regarding the heterogeneous characteristics of boards by investigating the influence of independent director tenure on earnings management and the relative tenures of independent directors and Chief Executives. A balanced panel dataset comprising 51 companies across 11 annual periods from 2005 to 2015 is used for the analysis. In each annual period, firms were classified as conducting earnings management if they had discretionary accruals in the bottom quartile (downwards) and top quartile (upwards) of the distributed values for the sample. Logistical regressions were conducted to determine the marginal impact of independent board tenure and a number of control variables on the probability of conducting earnings management. The findings indicate that both absolute and relative measures of board independence and experience do not have a significant impact on the likelihood of earnings management. It is the level of free cash flow which is the major influence on the probability of earnings management. Higher free cash flow increases the probability of earnings management significantly. The research also investigates whether board monitoring of earnings management is contingent on the level of free cash flow. However, the results suggest that board monitoring is not amplified when free cash flow is higher. This suggests that the extent of earnings management in companies is determined by a range of company, industry and situation-specific factors.

Keywords: corporate governance, boards of directors, agency theory, earnings management

Procedia PDF Downloads 208
81 Applying Miniaturized near Infrared Technology for Commingled and Microplastic Waste Analysis

Authors: Monika Rani, Claudio Marchesi, Stefania Federici, Laura E. Depero

Abstract:

Degradation of the aquatic environment by plastic litter, especially microplastics (MPs), i.e., any water-insoluble solid plastic particle with the longest dimension in the range 1µm and 1000 µm (=1 mm) size, is an unfortunate indication of the advancement of the Anthropocene age on Earth. Microplastics formed due to natural weathering processes are termed as secondary microplastics, while when these are synthesized in industries, they are called primary microplastics. Their presence from the highest peaks to the deepest points in oceans explored and their resistance to biological and chemical decay has adversely affected the environment, especially marine life. Even though the presence of MPs in the marine environment is well-reported, a legitimate and authentic analytical technique to sample, analyze, and quantify the MPs is still under progress and testing stages. Among the characterization techniques, vibrational spectroscopic techniques are largely adopted in the field of polymers. And the ongoing miniaturization of these methods is on the way to revolutionize the plastic recycling industry. In this scenario, the capability and the feasibility of a miniaturized near-infrared (MicroNIR) spectroscopy combined with chemometrics tools for qualitative and quantitative analysis of urban plastic waste collected from a recycling plant and microplastic mixture fragmented in the lab were investigated. Based on the Resin Identification Code, 250 plastic samples were used for macroplastic analysis and to set up a library of polymers. Subsequently, MicroNIR spectra were analysed through the application of multivariate modelling. Principal Components Analysis (PCA) was used as an unsupervised tool to find trends within the data. After the exploratory PCA analysis, a supervised classification tool was applied in order to distinguish the different plastic classes, and a database containing the NIR spectra of polymers was made. For the microplastic analysis, the three most abundant polymers in the plastic litter, PE, PP, PS, were mechanically fragmented in the laboratory to micron size. The distinctive arrangement of blends of these three microplastics was prepared in line with a designed ternary composition plot. After the PCA exploratory analysis, a quantitative model Partial Least Squares Regression (PLSR) allowed to predict the percentage of microplastics in the mixtures. With a complete dataset of 63 compositions, PLS was calibrated with 42 data-points. The model was used to predict the composition of 21 unknown mixtures of the test set. The advantage of the consolidated NIR Chemometric approach lies in the quick evaluation of whether the sample is macro or micro, contaminated, coloured or not, and with no sample pre-treatment. The technique can be utilized with bigger example volumes and even considers an on-site evaluation and in this manner satisfies the need for a high-throughput strategy.

Keywords: chemometrics, microNIR, microplastics, urban plastic waste

Procedia PDF Downloads 139
80 Potential Impacts of Climate Change on Hydrological Droughts in the Limpopo River Basin

Authors: Nokwethaba Makhanya, Babatunde J. Abiodun, Piotr Wolski

Abstract:

Climate change possibly intensifies hydrological droughts and reduces water availability in river basins. Despite this, most research on climate change effects in southern Africa has focused exclusively on meteorological droughts. This thesis projects the potential impact of climate change on the future characteristics of hydrological droughts in the Limpopo River Basin (LRB). The study uses regional climate model (RCM) measurements (from the Coordinated Regional Climate Downscaling Experiment, CORDEX) and a combination of hydrological simulations (using the Soil and Water Assessment Tool Plus model, SWAT+) to predict the impacts at four global warming levels (GWLs: 1.5℃, 2.0℃, 2.5℃, and 3.0℃) under the RCP8.5 future climate scenario. The SWAT+ model was calibrated and validated with a streamflow dataset observed over the basin, and the sensitivity of model parameters was investigated. The performance of the SWAT+LRB model was verified using the Nash-Sutcliffe efficiency (NSE), Percent Bias (PBIAS), Root Mean Square Error (RMSE), and coefficient of determination (R²). The Standardized Precipitation Evapotranspiration Index (SPEI) and the Standardized Precipitation Index (SPI) have been used to detect meteorological droughts. The Soil Water Index (SSI) has been used to define agricultural drought, while the Water Yield Drought Index (WYLDI), the Surface Run-off Index (SRI), and the Streamflow Index (SFI) have been used to characterise hydrological drought. The performance of the SWAT+ model simulations over LRB is sensitive to the parameters CN2 (initial SCS runoff curve number for moisture condition II) and ESCO (soil evaporation compensation factor). The best simulation generally performed better during the calibration period than the validation period. In calibration and validation periods, NSE is ≤ 0.8, while PBIAS is ≥ ﹣80.3%, RMSE ≥ 11.2 m³/s, and R² ≤ 0.9. The simulations project a future increase in temperature and potential evapotranspiration over the basin, but they do not project a significant future trend in precipitation and hydrological variables. However, the spatial distribution of precipitation reveals a projected increase in precipitation in the southern part of the basin and a decline in the northern part of the basin, with the region of reduced precipitation projected to increase with GWLs. A decrease in all hydrological variables is projected over most parts of the basin, especially over the eastern part of the basin. The simulations predict meteorological droughts (i.e., SPEI and SPI), agricultural droughts (i.e., SSI), and hydrological droughts (i.e., WYLDI, SRI) would become more intense and severe across the basin. SPEI-drought has a greater magnitude of increase than SPI-drought, and agricultural and hydrological droughts have a magnitude of increase between the two. As a result, this research suggests that future hydrological droughts over the LRB could be more severe than the SPI-drought projection predicts but less severe than the SPEI-drought projection. This research can be used to mitigate the effects of potential climate change on basin hydrological drought.

Keywords: climate change, CORDEX, drought, hydrological modelling, Limpopo River Basin

Procedia PDF Downloads 109
79 Calculation of Organ Dose for Adult and Pediatric Patients Undergoing Computed Tomography Examinations: A Software Comparison

Authors: Aya Al Masri, Naima Oubenali, Safoin Aktaou, Thibault Julien, Malorie Martin, Fouad Maaloul

Abstract:

Introduction: The increased number of performed 'Computed Tomography (CT)' examinations raise public concerns regarding associated stochastic risk to patients. In its Publication 102, the ‘International Commission on Radiological Protection (ICRP)’ emphasized the importance of managing patient dose, particularly from repeated or multiple examinations. We developed a Dose Archiving and Communication System that gives multiple dose indexes (organ dose, effective dose, and skin-dose mapping) for patients undergoing radiological imaging exams. The aim of this study is to compare the organ dose values given by our software for patients undergoing CT exams with those of another software named "VirtualDose". Materials and methods: Our software uses Monte Carlo simulations to calculate organ doses for patients undergoing computed tomography examinations. The general calculation principle consists to simulate: (1) the scanner machine with all its technical specifications and associated irradiation cases (kVp, field collimation, mAs, pitch ...) (2) detailed geometric and compositional information of dozens of well identified organs of computational hybrid phantoms that contain the necessary anatomical data. The mass as well as the elemental composition of the tissues and organs that constitute our phantoms correspond to the recommendations of the international organizations (namely the ICRP and the ICRU). Their body dimensions correspond to reference data developed in the United States. Simulated data was verified by clinical measurement. To perform the comparison, 270 adult patients and 150 pediatric patients were used, whose data corresponds to exams carried out in France hospital centers. The comparison dataset of adult patients includes adult males and females for three different scanner machines and three different acquisition protocols (Head, Chest, and Chest-Abdomen-Pelvis). The comparison sample of pediatric patients includes the exams of thirty patients for each of the following age groups: new born, 1-2 years, 3-7 years, 8-12 years, and 13-16 years. The comparison for pediatric patients were performed on the “Head” protocol. The percentage of the dose difference were calculated for organs receiving a significant dose according to the acquisition protocol (80% of the maximal dose). Results: Adult patients: for organs that are completely covered by the scan range, the maximum percentage of dose difference between the two software is 27 %. However, there are three organs situated at the edges of the scan range that show a slightly higher dose difference. Pediatric patients: the percentage of dose difference between the two software does not exceed 30%. These dose differences may be due to the use of two different generations of hybrid phantoms by the two software. Conclusion: This study shows that our software provides a reliable dosimetric information for patients undergoing Computed Tomography exams.

Keywords: adult and pediatric patients, computed tomography, organ dose calculation, software comparison

Procedia PDF Downloads 140
78 A Comprehensive Key Performance Indicators Dashboard for Emergency Medical Services

Authors: Giada Feletti, Daniela Tedesco, Paolo Trucco

Abstract:

The present study aims to develop a dashboard of Key Performance Indicators (KPI) to enhance information and predictive capabilities in Emergency Medical Services (EMS) systems, supporting both operational and strategic decisions of different actors. The employed research methodology consists of the first phase of revision of the technical-scientific literature concerning the indicators currently used for the performance measurement of EMS systems. From this literature analysis, it emerged that current studies focus on two distinct perspectives: the ambulance service, a fundamental component of pre-hospital health treatment, and the patient care in the Emergency Department (ED). The perspective proposed by this study is to consider an integrated view of the ambulance service process and the ED process, both essential to ensure high quality of care and patient safety. Thus, the proposal focuses on the entire healthcare service process and, as such, allows considering the interconnection between the two EMS processes, the pre-hospital and hospital ones, connected by the assignment of the patient to a specific ED. In this way, it is possible to optimize the entire patient management. Therefore, attention is paid to the dependency of decisions that in current EMS management models tend to be neglected or underestimated. In particular, the integration of the two processes enables the evaluation of the advantage of an ED selection decision having visibility on EDs’ saturation status and therefore considering the distance, the available resources and the expected waiting times. Starting from a critical review of the KPIs proposed in the extant literature, the design of the dashboard was carried out: the high number of analyzed KPIs was reduced by eliminating the ones firstly not in line with the aim of the study and then the ones supporting a similar functionality. The KPIs finally selected were tested on a realistic dataset, which draws us to exclude additional indicators due to the unavailability of data required for their computation. The final dashboard, which was discussed and validated by experts in the field, includes a variety of KPIs able to support operational and planning decisions, early warning, and citizens’ awareness of EDs accessibility in real-time. By associating each KPI to the EMS phase it refers to, it was also possible to design a well-balanced dashboard covering both efficiency and effective performance of the entire EMS process. Indeed, just the initial phases related to the interconnection between ambulance service and patient’s care are covered by traditional KPIs compared to the subsequent phases taking place in the hospital ED. This could be taken into consideration for the potential future development of the dashboard. Moreover, the research could proceed by building a multi-layer dashboard composed of the first level with a minimal set of KPIs to measure the basic performance of the EMS system at an aggregate level and further levels with KPIs that can bring additional and more detailed information.

Keywords: dashboard, decision support, emergency medical services, key performance indicators

Procedia PDF Downloads 92
77 Application of Deep Learning Algorithms in Agriculture: Early Detection of Crop Diseases

Authors: Manaranjan Pradhan, Shailaja Grover, U. Dinesh Kumar

Abstract:

Farming community in India, as well as other parts of the world, is one of the highly stressed communities due to reasons such as increasing input costs (cost of seeds, fertilizers, pesticide), droughts, reduced revenue leading to farmer suicides. Lack of integrated farm advisory system in India adds to the farmers problems. Farmers need right information during the early stages of crop’s lifecycle to prevent damage and loss in revenue. In this paper, we use deep learning techniques to develop an early warning system for detection of crop diseases using images taken by farmers using their smart phone. The research work leads to building a smart assistant using analytics and big data which could help the farmers with early diagnosis of the crop diseases and corrective actions. The classical approach for crop disease management has been to identify diseases at crop level. Recently, ImageNet Classification using the convolutional neural network (CNN) has been successfully used to identify diseases at individual plant level. Our model uses convolution filters, max pooling, dense layers and dropouts (to avoid overfitting). The models are built for binary classification (healthy or not healthy) and multi class classification (identifying which disease). Transfer learning is used to modify the weights of parameters learnt through ImageNet dataset and apply them on crop diseases, which reduces number of epochs to learn. One shot learning is used to learn from very few images, while data augmentation techniques are used to improve accuracy with images taken from farms by using techniques such as rotation, zoom, shift and blurred images. Models built using combination of these techniques are more robust for deploying in the real world. Our model is validated using tomato crop. In India, tomato is affected by 10 different diseases. Our model achieves an accuracy of more than 95% in correctly classifying the diseases. The main contribution of our research is to create a personal assistant for farmers for managing plant disease, although the model was validated using tomato crop, it can be easily extended to other crops. The advancement of technology in computing and availability of large data has made possible the success of deep learning applications in computer vision, natural language processing, image recognition, etc. With these robust models and huge smartphone penetration, feasibility of implementation of these models is high resulting in timely advise to the farmers and thus increasing the farmers' income and reducing the input costs.

Keywords: analytics in agriculture, CNN, crop disease detection, data augmentation, image recognition, one shot learning, transfer learning

Procedia PDF Downloads 104
76 Gender Gap in Returns to Social Entrepreneurship

Authors: Saul Estrin, Ute Stephan, Suncica Vujic

Abstract:

Background and research question: Gender differences in pay are present at all organisational levels, including at the very top. One possible way for women to circumvent organizational norms and discrimination is to engage in entrepreneurship because, as CEOs of their own organizations, entrepreneurs largely determine their own pay. While commercial entrepreneurship plays an important role in job creation and economic growth, social entrepreneurship has come to prominence because of its promise of addressing societal challenges such as poverty, social exclusion, or environmental degradation through market-based rather than state-sponsored activities. This opens the research question whether social entrepreneurship might be a form of entrepreneurship in which the pay of men and women is the same, or at least more similar; that is to say there is little or no gender pay gap. If the gender gap in pay persists also at the top of social enterprises, what are the factors, which might explain these differences? Methodology: The Oaxaca-Blinder Decomposition (OBD) is the standard approach of decomposing the gender pay gap based on the linear regression model. The OBD divides the gender pay gap into the ‘explained’ part due to differences in labour market characteristics (education, work experience, tenure, etc.), and the ‘unexplained’ part due to differences in the returns to those characteristics. The latter part is often interpreted as ‘discrimination’. There are two issues with this approach. (i) In many countries there is a notable convergence in labour market characteristics across genders; hence the OBD method is no longer revealing, since the largest portion of the gap remains ‘unexplained’. (ii) Adding covariates to a base model sequentially either to test a particular coefficient’s ‘robustness’ or to account for the ‘effects’ on this coefficient of adding covariates might be problematic, due to sequence-sensitivity when added covariates are correlated. Gelbach’s decomposition (GD) addresses latter by using the omitted variables bias formula, which constructs a conditional decomposition thus accounting for sequence-sensitivity when added covariates are correlated. We use GD to decompose the differences in gaps of pay (annual and hourly salary), size of the organisation (revenues), effort (weekly hours of work), and sources of finances (fees and sales, grants and donations, microfinance and loans, and investors’ capital) between men and women leading social enterprises. Database: Our empirical work is made possible by our collection of a unique dataset using respondent driven sampling (RDS) methods to address the problem that there is as yet no information on the underlying population of social entrepreneurs. The countries that we focus on are the United Kingdom, Spain, Romania and Hungary. Findings and recommendations: We confirm the existence of a gender pay gap between men and women leading social enterprises. This gap can be explained by differences in the accumulation of human capital, psychological and social factors, as well as cross-country differences. The results of this study contribute to a more rounded perspective, highlighting that although social entrepreneurship may be a highly satisfying occupation, it also perpetuates gender pay inequalities.

Keywords: Gelbach’s decomposition, gender gap, returns to social entrepreneurship, values and preferences

Procedia PDF Downloads 225
75 COVID-19 Laws and Policy: The Use of Policy Surveillance For Better Legal Preparedness

Authors: Francesca Nardi, Kashish Aneja, Katherine Ginsbach

Abstract:

The COVID-19 pandemic has demonstrated both a need for evidence-based and rights-based public health policy and how challenging it can be to make effective decisions with limited information, evidence, and data. The O’Neill Institute, in conjunction with several partners, has been working since the beginning of the pandemic to collect, analyze, and distribute critical data on public health policies enacted in response to COVID-19 around the world in the COVID-19 Law Lab. Well-designed laws and policies can help build strong health systems, implement necessary measures to combat viral transmission, enforce actions that promote public health and safety for everyone, and on the individual level have a direct impact on health outcomes. Poorly designed laws and policies, on the other hand, can fail to achieve the intended results and/or obstruct the realization of fundamental human rights, further disease spread, or cause unintended collateral harms. When done properly, laws can provide the foundation that brings clarity to complexity, embrace nuance, and identifies gaps of uncertainty. However, laws can also shape the societal factors that make disease possible. Law is inseparable from the rest of society, and COVID-19 has exposed just how much laws and policies intersects all facets of society. In the COVID-19 context, evidence-based and well-informed law and policy decisions—made at the right time and in the right place—can and have meant the difference between life or death for many. Having a solid evidentiary base of legal information can promote the understanding of what works well and where, and it can drive resources and action to where they are needed most. We know that legal mechanisms can enable nations to reduce inequities and prepare for emerging threats, like novel pathogens that result in deadly disease outbreaks or antibiotic resistance. The collection and analysis of data on these legal mechanisms is a critical step towards ensuring that legal interventions and legal landscapes are effectively incorporated into more traditional kinds of health science data analyses. The COVID-19 Law Labs see a unique opportunity to collect and analyze this kind of non-traditional data to inform policy using laws and policies from across the globe and across diseases. This global view is critical to assessing the efficacy of policies in a wide range of cultural, economic, and demographic circumstances. The COVID-19 Law Lab is not just a collection of legal texts relating to COVID-19; it is a dataset of concise and actionable legal information that can be used by health researchers, social scientists, academics, human rights advocates, law and policymakers, government decision-makers, and others for cross-disciplinary quantitative and qualitative analysis to identify best practices from this outbreak, and previous ones, to be better prepared for potential future public health events.

Keywords: public health law, surveillance, policy, legal, data

Procedia PDF Downloads 128
74 Abuse against Elderly Widows in India and Selected States: An Exploration

Authors: Rasmita Mishra, Chander Shekher

Abstract:

Background: Population ageing is an inevitable outcome of demographic transition. Due to increased life expectancy, the old age population in India and worldwide has increased, and it will continue to grow more alarmingly in the near future. There are redundant austerity that has been bestowed upon the widows, thus, the life of widows is never been easy in India. The loss of spouse along with other disadvantaged socioeconomic intermediaries like illiteracy and poverty often make the life of widows more difficult to live. Methodology: Ethical statement: The study used secondary data available in the public domain for its wider use in social research. Thus, there was no requirement of ethical consent in the present study. Data source: Building a Knowledge Base on Population Aging in India (BKPAI), 2011 dataset is used to fulfill the objectives of this study. It was carried out in seven states – Himachal Pradesh, Kerala, Maharashtra, Odisha, Punjab, Tamil Nadu, and West Bengal – having a higher percentage of the population in the age group 60 years and above compared to the national average. Statistical analysis: Descriptive and inferential statistics were used to understand the level of elderly widows and incidence of abuse against them in India and selected states. Bivariate and Trivariate analysis were carried out to check the pattern of abuse by selected covariates. Chi-Square test is used to verify the significance of the association. Further, Discriminant Analysis (DA) is carried out to understand which factor can separate out group of neglect and non-neglect elderly. Result: With the addition of 27 million from 2001 to 2011, the total elderly population in India is more than 100 million. Elderly females aged 60+ were more widows than their counterpart elderly males. This pattern was observed across selected states and at national level. At national level, more than one tenth (12 percent) of elderly experienced abuse in their lifetime. Incidence of abuse against elderly widows within family was considerably higher than the outside the family. This pattern was observed across the selected place and abuse in the study. In discriminant analysis, the significant difference between neglected and non-neglected elderly on each of the independent variables was examined using group mean and ANOVA. Discussion: The study is the first of its kind to assess the incidence of abuse against elderly widows using large-scale survey data. Another novelty of this study is that it has assessed for those states in India whereby the proportion of elderly is higher than the national average. Place and perpetrators involved in the abuse against elderly widows certainly envisaged the safeness in the present living arrangement of elderly widows. Conclusion: Due to the increasing life expectancy it is expected that the number of elderly will increase much faster than before. As biologically women live longer than men, there will be more women elderly than men. With respect to the living arrangement, after the demise of the spouse, elderly widows are more likely to live with their children who emerged as the main perpetrator of abuse.

Keywords: elderly abuse, emotional abuse physical abuse, material abuse, psychological abuse, quality of life

Procedia PDF Downloads 396
73 Linguistic Cyberbullying, a Legislative Approach

Authors: Simona Maria Ignat

Abstract:

Bullying online has been an increasing studied topic during the last years. Different approaches, psychological, linguistic, or computational, have been applied. To our best knowledge, a definition and a set of characteristics of phenomenon agreed internationally as a common framework are still waiting for answers. Thus, the objectives of this paper are the identification of bullying utterances on Twitter and their algorithms. This research paper is focused on the identification of words or groups of words, categorized as “utterances”, with bullying effect, from Twitter platform, extracted on a set of legislative criteria. This set is the result of analysis followed by synthesis of law documents on bullying(online) from United States of America, European Union, and Ireland. The outcome is a linguistic corpus with approximatively 10,000 entries. The methods applied to the first objective have been the following. The discourse analysis has been applied in identification of keywords with bullying effect in texts from Google search engine, Images link. Transcription and anonymization have been applied on texts grouped in CL1 (Corpus linguistics 1). The keywords search method and the legislative criteria have been used for identifying bullying utterances from Twitter. The texts with at least 30 representations on Twitter have been grouped. They form the second corpus linguistics, Bullying utterances from Twitter (CL2). The entries have been identified by using the legislative criteria on the the BoW method principle. The BoW is a method of extracting words or group of words with same meaning in any context. The methods applied for reaching the second objective is the conversion of parts of speech to alphabetical and numerical symbols and writing the bullying utterances as algorithms. The converted form of parts of speech has been chosen on the criterion of relevance within bullying message. The inductive reasoning approach has been applied in sampling and identifying the algorithms. The results are groups with interchangeable elements. The outcomes convey two aspects of bullying: the form and the content or meaning. The form conveys the intentional intimidation against somebody, expressed at the level of texts by grammatical and lexical marks. This outcome has applicability in the forensic linguistics for establishing the intentionality of an action. Another outcome of form is a complex of graphemic variations essential in detecting harmful texts online. This research enriches the lexicon already known on the topic. The second aspect, the content, revealed the topics like threat, harassment, assault, or suicide. They are subcategories of a broader harmful content which is a constant concern for task forces and legislators at national and international levels. These topic – outcomes of the dataset are a valuable source of detection. The analysis of content revealed algorithms and lexicons which could be applied to other harmful contents. A third outcome of content are the conveyances of Stylistics, which is a rich source of discourse analysis of social media platforms. In conclusion, this corpus linguistics is structured on legislative criteria and could be used in various fields.

Keywords: corpus linguistics, cyberbullying, legislation, natural language processing, twitter

Procedia PDF Downloads 67
72 Integrating Natural Language Processing (NLP) and Machine Learning in Lung Cancer Diagnosis

Authors: Mehrnaz Mostafavi

Abstract:

The assessment and categorization of incidental lung nodules present a considerable challenge in healthcare, often necessitating resource-intensive multiple computed tomography (CT) scans for growth confirmation. This research addresses this issue by introducing a distinct computational approach leveraging radiomics and deep-learning methods. However, understanding local services is essential before implementing these advancements. With diverse tracking methods in place, there is a need for efficient and accurate identification approaches, especially in the context of managing lung nodules alongside pre-existing cancer scenarios. This study explores the integration of text-based algorithms in medical data curation, indicating their efficacy in conjunction with machine learning and deep-learning models for identifying lung nodules. Combining medical images with text data has demonstrated superior data retrieval compared to using each modality independently. While deep learning and text analysis show potential in detecting previously missed nodules, challenges persist, such as increased false positives. The presented research introduces a Structured-Query-Language (SQL) algorithm designed for identifying pulmonary nodules in a tertiary cancer center, externally validated at another hospital. Leveraging natural language processing (NLP) and machine learning, the algorithm categorizes lung nodule reports based on sentence features, aiming to facilitate research and assess clinical pathways. The hypothesis posits that the algorithm can accurately identify lung nodule CT scans and predict concerning nodule features using machine-learning classifiers. Through a retrospective observational study spanning a decade, CT scan reports were collected, and an algorithm was developed to extract and classify data. Results underscore the complexity of lung nodule cohorts in cancer centers, emphasizing the importance of careful evaluation before assuming a metastatic origin. The SQL and NLP algorithms demonstrated high accuracy in identifying lung nodule sentences, indicating potential for local service evaluation and research dataset creation. Machine-learning models exhibited strong accuracy in predicting concerning changes in lung nodule scan reports. While limitations include variability in disease group attribution, the potential for correlation rather than causality in clinical findings, and the need for further external validation, the algorithm's accuracy and potential to support clinical decision-making and healthcare automation represent a significant stride in lung nodule management and research.

Keywords: lung cancer diagnosis, structured-query-language (SQL), natural language processing (NLP), machine learning, CT scans

Procedia PDF Downloads 60
71 Predicting Loss of Containment in Surface Pipeline using Computational Fluid Dynamics and Supervised Machine Learning Model to Improve Process Safety in Oil and Gas Operations

Authors: Muhammmad Riandhy Anindika Yudhy, Harry Patria, Ramadhani Santoso

Abstract:

Loss of containment is the primary hazard that process safety management is concerned within the oil and gas industry. Escalation to more serious consequences all begins with the loss of containment, starting with oil and gas release from leakage or spillage from primary containment resulting in pool fire, jet fire and even explosion when reacted with various ignition sources in the operations. Therefore, the heart of process safety management is avoiding loss of containment and mitigating its impact through the implementation of safeguards. The most effective safeguard for the case is an early detection system to alert Operations to take action prior to a potential case of loss of containment. The detection system value increases when applied to a long surface pipeline that is naturally difficult to monitor at all times and is exposed to multiple causes of loss of containment, from natural corrosion to illegal tapping. Based on prior researches and studies, detecting loss of containment accurately in the surface pipeline is difficult. The trade-off between cost-effectiveness and high accuracy has been the main issue when selecting the traditional detection method. The current best-performing method, Real-Time Transient Model (RTTM), requires analysis of closely positioned pressure, flow and temperature (PVT) points in the pipeline to be accurate. Having multiple adjacent PVT sensors along the pipeline is expensive, hence generally not a viable alternative from an economic standpoint.A conceptual approach to combine mathematical modeling using computational fluid dynamics and a supervised machine learning model has shown promising results to predict leakage in the pipeline. Mathematical modeling is used to generate simulation data where this data is used to train the leak detection and localization models. Mathematical models and simulation software have also been shown to provide comparable results with experimental data with very high levels of accuracy. While the supervised machine learning model requires a large training dataset for the development of accurate models, mathematical modeling has been shown to be able to generate the required datasets to justify the application of data analytics for the development of model-based leak detection systems for petroleum pipelines. This paper presents a review of key leak detection strategies for oil and gas pipelines, with a specific focus on crude oil applications, and presents the opportunities for the use of data analytics tools and mathematical modeling for the development of robust real-time leak detection and localization system for surface pipelines. A case study is also presented.

Keywords: pipeline, leakage, detection, AI

Procedia PDF Downloads 167
70 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception

Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu

Abstract:

Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.

Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish

Procedia PDF Downloads 121
69 A Quality Index Optimization Method for Non-Invasive Fetal ECG Extraction

Authors: Lucia Billeci, Gennaro Tartarisco, Maurizio Varanini

Abstract:

Fetal cardiac monitoring by fetal electrocardiogram (fECG) can provide significant clinical information about the healthy condition of the fetus. Despite this potentiality till now the use of fECG in clinical practice has been quite limited due to the difficulties in its measuring. The recovery of fECG from the signals acquired non-invasively by using electrodes placed on the maternal abdomen is a challenging task because abdominal signals are a mixture of several components and the fetal one is very weak. This paper presents an approach for fECG extraction from abdominal maternal recordings, which exploits the characteristics of pseudo-periodicity of fetal ECG. It consists of devising a quality index (fQI) for fECG and of finding the linear combinations of preprocessed abdominal signals, which maximize these fQI (quality index optimization - QIO). It aims at improving the performances of the most commonly adopted methods for fECG extraction, usually based on maternal ECG (mECG) estimating and canceling. The procedure for the fECG extraction and fetal QRS (fQRS) detection is completely unsupervised and based on the following steps: signal pre-processing; maternal ECG (mECG) extraction and maternal QRS detection; mECG component approximation and canceling by weighted principal component analysis; fECG extraction by fQI maximization and fetal QRS detection. The proposed method was compared with our previously developed procedure, which obtained the highest at the Physionet/Computing in Cardiology Challenge 2013. That procedure was based on removing the mECG from abdominal signals estimated by a principal component analysis (PCA) and applying the Independent component Analysis (ICA) on the residual signals. Both methods were developed and tuned using 69, 1 min long, abdominal measurements with fetal QRS annotation of the dataset A provided by PhysioNet/Computing in Cardiology Challenge 2013. The QIO-based and the ICA-based methods were compared in analyzing two databases of abdominal maternal ECG available on the Physionet site. The first is the Abdominal and Direct Fetal Electrocardiogram Database (ADdb) which contains the fetal QRS annotations thus allowing a quantitative performance comparison, the second is the Non-Invasive Fetal Electrocardiogram Database (NIdb), which does not contain the fetal QRS annotations so that the comparison between the two methods can be only qualitative. In particular, the comparison on NIdb was performed defining an index of quality for the fetal RR series. On the annotated database ADdb the QIO method, provided the performance indexes Sens=0.9988, PPA=0.9991, F1=0.9989 overcoming the ICA-based one, which provided Sens=0.9966, PPA=0.9972, F1=0.9969. The comparison on NIdb was performed defining an index of quality for the fetal RR series. The index of quality resulted higher for the QIO-based method compared to the ICA-based one in 35 records out 55 cases of the NIdb. The QIO-based method gave very high performances with both the databases. The results of this study foresees the application of the algorithm in a fully unsupervised way for the implementation in wearable devices for self-monitoring of fetal health.

Keywords: fetal electrocardiography, fetal QRS detection, independent component analysis (ICA), optimization, wearable

Procedia PDF Downloads 264
68 Development of a Core Set of Clinical Indicators to Measure Quality of Care for Thyroid Cancer: A Modified-Delphi Approach

Authors: Liane J. Ioannou, Jonathan Serpell, Cino Bendinelli, David Walters, Jenny Gough, Dean Lisewski, Win Meyer-Rochow, Julie Miller, Duncan Topliss, Bill Fleming, Stephen Farrell, Andrew Kiu, James Kollias, Mark Sywak, Adam Aniss, Linda Fenton, Danielle Ghusn, Simon Harper, Aleksandra Popadich, Kate Stringer, David Watters, Susannah Ahern

Abstract:

BACKGROUND: There are significant variations in the management, treatment and outcomes of thyroid cancer, particularly in the role of: diagnostic investigation and pre-treatment scanning; optimal extent of surgery (total or hemi-thyroidectomy); use of active surveillance for small low-risk cancers; central lymph node dissections (therapeutic or prophylactic); outcomes following surgery (e.g. recurrent laryngeal nerve palsy, hypocalcaemia, hypoparathyroidism); post-surgical hormone, calcium and vitamin D therapy; and provision and dosage of radioactive iodine treatment. A proven strategy to reduce variations in the outcome and to improve survival is to measure and compare it using high-quality clinical registry data. Clinical registries provide the most effective means of collecting high-quality data and are a tool for quality improvement. Where they have been introduced at a state or national level, registries have become one of the most clinically valued tools for quality improvement. To benchmark clinical care, clinical quality registries require systematic measurement at predefined intervals and the capacity to report back information to participating clinical units. OBJECTIVE: The aim of this study was to develop a core set clinical indicators that enable measurement and reporting of quality of care for patients with thyroid cancer. We hypothesise that measuring clinical quality indicators, developed to identify differences in quality of care across sites, will reduce variation and improve patient outcomes and survival, thereby lessening costs and healthcare burden to the Australian community. METHOD: Preparatory work and scoping was conducted to identify existing high quality, clinical guidelines and best practice for thyroid cancer both nationally and internationally, as well as relevant literature. A bi-national panel was invited to participate in a modified Delphi process. Panelists were asked to rate each proposed indicator on a Likert scale of 1–9 in a three-round iterative process. RESULTS: A total of 236 potential quality indicators were identified. One hundred and ninety-two indicators were removed to reflect the data capture by the Australian and New Zealand Thyroid Cancer Registry (ANZTCR) (from diagnosis to 90-days post-surgery). The remaining 44 indicators were presented to the panelists for voting. A further 21 indicators were later added by the panelists bringing the total potential quality indicators to 65. Of these, 21 were considered the most important and feasible indicators to measure quality of care in thyroid cancer, of which 12 were recommended for inclusion in the final set. The consensus indicator set spans the spectrum of care, including: preoperative; surgery; surgical complications; staging and post-surgical treatment planning; and post-surgical treatment. CONCLUSIONS: This study provides a core set of quality indicators to measure quality of care in thyroid cancer. This indicator set can be applied as a tool for internal quality improvement, comparative quality reporting, public reporting and research. Inclusion of these quality indicators into monitoring databases such as clinical quality registries will enable opportunities for benchmarking and feedback on best practice care to clinicians involved in the management of thyroid cancer.

Keywords: clinical registry, Delphi survey, quality indicators, quality of care

Procedia PDF Downloads 158
67 Computational Approaches to Study Lineage Plasticity in Human Pancreatic Ductal Adenocarcinoma

Authors: Almudena Espin Perez, Tyler Risom, Carl Pelz, Isabel English, Robert M. Angelo, Rosalie Sears, Andrew J. Gentles

Abstract:

Pancreatic ductal adenocarcinoma (PDAC) is one of the most deadly malignancies. The role of the tumor microenvironment (TME) is gaining significant attention in cancer research. Despite ongoing efforts, the nature of the interactions between tumors, immune cells, and stromal cells remains poorly understood. The cell-intrinsic properties that govern cell lineage plasticity in PDAC and extrinsic influences of immune populations require technically challenging approaches due to the inherently heterogeneous nature of PDAC. Understanding the cell lineage plasticity of PDAC will improve the development of novel strategies that could be translated to the clinic. Members of the team have demonstrated that the acquisition of ductal to neuroendocrine lineage plasticity in PDAC confers therapeutic resistance and is a biomarker of poor outcomes in patients. Our approach combines computational methods for deconvolving bulk transcriptomic cancer data using CIBERSORTx and high-throughput single-cell imaging using Multiplexed Ion Beam Imaging (MIBI) to study lineage plasticity in PDAC and its relationship to the infiltrating immune system. The CIBERSORTx algorithm uses signature matrices from immune cells and stroma from sorted and single-cell data in order to 1) infer the fractions of different immune cell types and stromal cells in bulked gene expression data and 2) impute a representative transcriptome profile for each cell type. We studied a unique set of 300 genomically well-characterized primary PDAC samples with rich clinical annotation. We deconvolved the PDAC transcriptome profiles using CIBERSORTx, leveraging publicly available single-cell RNA-seq data from normal pancreatic tissue and PDAC to estimate cell type proportions in PDAC, and digitally reconstruct cell-specific transcriptional profiles from our study dataset. We built signature matrices and optimized by simulations and comparison to ground truth data. We identified cell-type-specific transcriptional programs that contribute to cancer cell lineage plasticity, especially in the ductal compartment. We also studied cell differentiation hierarchies using CytoTRACE and predict cell lineage trajectories for acinar and ductal cells that we believe are pinpointing relevant information on PDAC progression. Collaborators (Angelo lab, Stanford University) has led the development of the Multiplexed Ion Beam Imaging (MIBI) platform for spatial proteomics. We will use in the very near future MIBI from tissue microarray of 40 PDAC samples to understand the spatial relationship between cancer cell lineage plasticity and stromal cells focused on infiltrating immune cells, using the relevant markers of PDAC plasticity identified from the RNA-seq analysis.

Keywords: deconvolution, imaging, microenvironment, PDAC

Procedia PDF Downloads 105
66 Identification of Clinical Characteristics from Persistent Homology Applied to Tumor Imaging

Authors: Eashwar V. Somasundaram, Raoul R. Wadhwa, Jacob G. Scott

Abstract:

The use of radiomics in measuring geometric properties of tumor images such as size, surface area, and volume has been invaluable in assessing cancer diagnosis, treatment, and prognosis. In addition to analyzing geometric properties, radiomics would benefit from measuring topological properties using persistent homology. Intuitively, features uncovered by persistent homology may correlate to tumor structural features. One example is necrotic cavities (corresponding to 2D topological features), which are markers of very aggressive tumors. We develop a data pipeline in R that clusters tumors images based on persistent homology is used to identify meaningful clinical distinctions between tumors and possibly new relationships not captured by established clinical categorizations. A preliminary analysis was performed on 16 Magnetic Resonance Imaging (MRI) breast tissue segments downloaded from the 'Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis' (I-SPY TRIAL or ISPY1) collection in The Cancer Imaging Archive. Each segment represents a patient’s breast tumor prior to treatment. The ISPY1 dataset also provided the estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status data. A persistent homology matrix up to 2-dimensional features was calculated for each of the MRI segmentation. Wasserstein distances were then calculated between all pairwise tumor image persistent homology matrices to create a distance matrix for each feature dimension. Since Wasserstein distances were calculated for 0, 1, and 2-dimensional features, three hierarchal clusters were constructed. The adjusted Rand Index was used to see how well the clusters corresponded to the ER/PR/HER2 status of the tumors. Triple-negative cancers (negative status for all three receptors) significantly clustered together in the 2-dimensional features dendrogram (Adjusted Rand Index of .35, p = .031). It is known that having a triple-negative breast tumor is associated with aggressive tumor growth and poor prognosis when compared to non-triple negative breast tumors. The aggressive tumor growth associated with triple-negative tumors may have a unique structure in an MRI segmentation, which persistent homology is able to identify. This preliminary analysis shows promising results in the use of persistent homology on tumor imaging to assess the severity of breast tumors. The next step is to apply this pipeline to other tumor segment images from The Cancer Imaging Archive at different sites such as the lung, kidney, and brain. In addition, whether other clinical parameters, such as overall survival, tumor stage, and tumor genotype data are captured well in persistent homology clusters will be assessed. If analyzing tumor MRI segments using persistent homology consistently identifies clinical relationships, this could enable clinicians to use persistent homology data as a noninvasive way to inform clinical decision making in oncology.

Keywords: cancer biology, oncology, persistent homology, radiomics, topological data analysis, tumor imaging

Procedia PDF Downloads 114
65 Multilevel Regression Model - Evaluate Relationship Between Early Years’ Activities of Daily Living and Alzheimer’s Disease Onset Accounting for Influence of Key Sociodemographic Factors Using a Longitudinal Household Survey Data

Authors: Linyi Fan, C.J. Schumaker

Abstract:

Background: Biomedical efforts to treat Alzheimer’s disease (AD) have typically produced mixed to poor results, while more lifestyle-focused treatments such as exercise may fare better than existing biomedical treatments. A few promising studies have indicated that activities of daily life (ADL) may be a useful way of predicting AD. However, the existing cross-sectional studies fail to show how functional-related issues such as ADL in early years predict AD and how social factors influence health either in addition to or in interaction with individual risk factors. This study would helpbetterscreening and early treatments for the elderly population and healthcare practice. The findings have significance academically and practically in terms of creating positive social change. Methodology: The purpose of this quantitative historical, correlational study was to examine the relationship between early years’ ADL and the development of AD in later years. The studyincluded 4,526participantsderived fromRAND HRS dataset. The Health and Retirement Study (HRS) is a longitudinal household survey data set that is available forresearchof retirement and health among the elderly in the United States. The sample was selected by the completion of survey questionnaire about AD and dementia. The variablethat indicates whether the participant has been diagnosed with AD was the dependent variable. The ADL indices and changes in ADL were the independent variables. A four-step multilevel regression model approach was utilized to address the research questions. Results: Amongst 4,526 patients who completed the AD and dementia questionnaire, 144 (3.1%) were diagnosed with AD. Of the 4,526 participants, 3,465 (76.6%) have high school and upper education degrees,4,074 (90.0%) were above poverty threshold. The model evaluatedthe effect of ADL and change in ADL on onset of AD in late years while allowing the intercept of the model to vary by level of education. The results suggested that the only significant predictor of the onset of AD was changes in early years’ ADL (b = 20.253, z = 2.761, p < .05). However, the result of the sensitivity analysis (b = 7.562, z = 1.900, p =.058), which included more control variables and increased the observation period of ADL, are not supported this finding. The model also estimated whether the variances of random effect vary by Level-2 variables. The results suggested that the variances associated with random slopes were approximately zero, suggesting that the relationship between early years’ ADL were not influenced bysociodemographic factors. Conclusion: The finding indicated that an increase in changes in ADL leads to an increase in the probability of onset AD in the future. However, this finding is not support in a broad observation period model. The study also failed to reject the hypothesis that the sociodemographic factors explained significant amounts of variance in random effect. Recommendations were then made for future research and practice based on these limitations and the significance of the findings.

Keywords: alzheimer’s disease, epidemiology, moderation, multilevel modeling

Procedia PDF Downloads 117
64 Spatiotemporal Changes in Drought Sensitivity Captured by Multiple Tree-Ring Parameters of Central European Conifers

Authors: Krešimir Begović, Miloš Rydval, Jan Tumajer, Kristyna Svobodová, Thomas Langbehn, Yumei Jiang, Vojtech Čada, Vaclav Treml, Ryszard Kaczka, Miroslav Svoboda

Abstract:

Environmental changes have increased the frequency and intensity of climatic extremes, particularly hotter droughts, leading to altered tree growth patterns and multi-year lags in tree recovery. The effects of shifting climatic conditions on tree growth are inhomogeneous across species’ natural distribution ranges, with large spatial heterogeneity and inter-population variability, but generally have significant consequences for contemporary forest dynamics and future ecosystem functioning. Despite numerous studies on the impacts of regional drought effects, large uncertainties remain regarding the mechanistic basis of drought legacy effects on wood formation and the ability of individual species to cope with increasingly drier growing conditions and rising year-to-year climatic variability. To unravel the complexity of climate-growth interactions and assess species-specific responses to severe droughts, we combined forward modeling of tree growth (VS-lite model) with correlation analyses against climate (temperature, precipitation, and the SPEI-3 moisture index) and growth responses to extreme drought events from multiple tree-ring parameters (tree-width and blue intensity parameters). We used an extensive dataset with over 1000 tree-ring samples from 23 nature forest reserves across an altitudinal range in Czechia and Slovakia. Our results revealed substantial spatiotemporal variability in growth responses to summer season temperature and moisture availability across species and tree-ring parameters. However, a general trend of increasing spring moisture-growth sensitivity in recent decades was observed in the Scots pine mountain forests and lowland forests of both species. The VS-lite model effectively captured nonstationary climate-growth relationships and accurately estimated high-frequency growth variability, indicating a significant incidence of regional drought events and growth reductions. Notably, growth reductions during extreme drought years and discrete legacy effects identified in individual wood components were most pronounced in the lowland forests. Together with the observed growth declines in recent decades, these findings suggest an increasing vulnerability of Norway spruce and Scots pine in dry lowlands under intensifying climatic constraints.

Keywords: dendroclimatology, Vaganova–Shashkin lite, conifers, central Europe, drought, blue intensity

Procedia PDF Downloads 44
63 Prosodic Transfer in Foreign Language Learning: A Phonetic Crosscheck of Intonation and F₀ Range between Italian and German Native and Non-Native Speakers

Authors: Violetta Cataldo, Renata Savy, Simona Sbranna

Abstract:

Background: Foreign Language Learning (FLL) is characterised by prosodic transfer phenomena regarding pitch accents placement, intonation patterns, and pitch range excursion from the learners’ mother tongue to their Foreign Language (FL) which suggests that the gradual development of general linguistic competence in FL does not imply an equally correspondent improvement of the prosodic competence. Topic: The present study aims to monitor the development of prosodic competence of learners of Italian and German throughout the FLL process. The primary object of this study is to investigate the intonational features and the f₀ range excursion of Italian and German from a cross-linguistic perspective; analyses of native speakers’ productions point out the differences between this pair of languages and provide models for the Target Language (TL). A following crosscheck compares the L2 productions in Italian and German by non-native speakers to the Target Language models, in order to verify the occurrence of prosodic interference phenomena, i.e., type, degree, and modalities. Methodology: The subjects of the research are university students belonging to two groups: Italian native speakers learning German as FL and German native speakers learning Italian as FL. Both of them have been divided into three subgroups according to the FL proficiency level (beginners, intermediate, advanced). The dataset consists of wh-questions placed in situational contexts uttered in both speakers’ L1 and FL. Using a phonetic approach, analyses have considered three domains of intonational contours (Initial Profile, Nuclear Accent, and Terminal Contour) and two dimensions of the f₀ range parameter (span and level), which provide a basis for comparison between L1 and L2 productions. Findings: Results highlight a strong presence of prosodic transfer phenomena affecting L2 productions in the majority of both Italian and German learners, irrespective of their FL proficiency level; the transfer concerns all the three domains of the contour taken into account, although with different modalities and characteristics. Currently, L2 productions of German learners show a pitch span compression on the domain of the Terminal Contour compared to their L1 towards the TL; furthermore, German learners tend to use lower pitch range values in deviation from their L1 when improving their general linguistic competence in Italian FL proficiency level. Results regarding pitch range span and level in L2 productions by Italian learners are still in progress. At present, they show a similar tendency to expand the pitch span and to raise the pitch level, which also reveals a deviation from the L1 possibly in the direction of German TL. Conclusion: Intonational features seem to be 'resistant' parameters to which learners appear not to be particularly sensitive. By contrast, they show a certain sensitiveness to FL pitch range dimensions. Making clear which the most resistant and the most sensitive parameters are when learning FL prosody could lay groundwork for the development of prosodic trainings thanks to which learners could finally acquire a clear and natural pronunciation and intonation.

Keywords: foreign language learning, German, Italian, L2 prosody, pitch range, transfer

Procedia PDF Downloads 273
62 Employing Remotely Sensed Soil and Vegetation Indices and Predicting ‎by Long ‎Short-Term Memory to Irrigation Scheduling Analysis

Authors: Elham Koohikerade, Silvio Jose Gumiere

Abstract:

In this research, irrigation is highlighted as crucial for improving both the yield and quality of ‎potatoes due to their high sensitivity to soil moisture changes. The study presents a hybrid Long ‎Short-Term Memory (LSTM) model aimed at optimizing irrigation scheduling in potato fields in ‎Quebec City, Canada. This model integrates model-based and satellite-derived datasets to simulate ‎soil moisture content, addressing the limitations of field data. Developed under the guidance of the ‎Food and Agriculture Organization (FAO), the simulation approach compensates for the lack of direct ‎soil sensor data, enhancing the LSTM model's predictions. The model was calibrated using indices ‎like Surface Soil Moisture (SSM), Normalized Vegetation Difference Index (NDVI), Enhanced ‎Vegetation Index (EVI), and Normalized Multi-band Drought Index (NMDI) to effectively forecast ‎soil moisture reductions. Understanding soil moisture and plant development is crucial for assessing ‎drought conditions and determining irrigation needs. This study validated the spectral characteristics ‎of vegetation and soil using ECMWF Reanalysis v5 (ERA5) and Moderate Resolution Imaging ‎Spectrometer (MODIS) data from 2019 to 2023, collected from agricultural areas in Dolbeau and ‎Peribonka, Quebec. Parameters such as surface volumetric soil moisture (0-7 cm), NDVI, EVI, and ‎NMDI were extracted from these images. A regional four-year dataset of soil and vegetation moisture ‎was developed using a machine learning approach combining model-based and satellite-based ‎datasets. The LSTM model predicts soil moisture dynamics hourly across different locations and ‎times, with its accuracy verified through cross-validation and comparison with existing soil moisture ‎datasets. The model effectively captures temporal dynamics, making it valuable for applications ‎requiring soil moisture monitoring over time, such as anomaly detection and memory analysis. By ‎identifying typical peak soil moisture values and observing distribution shapes, irrigation can be ‎scheduled to maintain soil moisture within Volumetric Soil Moisture (VSM) values of 0.25 to 0.30 ‎m²/m², avoiding under and over-watering. The strong correlations between parcels suggest that a ‎uniform irrigation strategy might be effective across multiple parcels, with adjustments based on ‎specific parcel characteristics and historical data trends. The application of the LSTM model to ‎predict soil moisture and vegetation indices yielded mixed results. While the model effectively ‎captures the central tendency and temporal dynamics of soil moisture, it struggles with accurately ‎predicting EVI, NDVI, and NMDI.‎

Keywords: irrigation scheduling, LSTM neural network, remotely sensed indices, soil and vegetation ‎monitoring

Procedia PDF Downloads 20
61 Predicting Provider Service Time in Outpatient Clinics Using Artificial Intelligence-Based Models

Authors: Haya Salah, Srinivas Sharan

Abstract:

Healthcare facilities use appointment systems to schedule their appointments and to manage access to their medical services. With the growing demand for outpatient care, it is now imperative to manage physician's time effectively. However, high variation in consultation duration affects the clinical scheduler's ability to estimate the appointment duration and allocate provider time appropriately. Underestimating consultation times can lead to physician's burnout, misdiagnosis, and patient dissatisfaction. On the other hand, appointment durations that are longer than required lead to doctor idle time and fewer patient visits. Therefore, a good estimation of consultation duration has the potential to improve timely access to care, resource utilization, quality of care, and patient satisfaction. Although the literature on factors influencing consultation length abound, little work has done to predict it using based data-driven approaches. Therefore, this study aims to predict consultation duration using supervised machine learning algorithms (ML), which predicts an outcome variable (e.g., consultation) based on potential features that influence the outcome. In particular, ML algorithms learn from a historical dataset without explicitly being programmed and uncover the relationship between the features and outcome variable. A subset of the data used in this study has been obtained from the electronic medical records (EMR) of four different outpatient clinics located in central Pennsylvania, USA. Also, publicly available information on doctor's characteristics such as gender and experience has been extracted from online sources. This research develops three popular ML algorithms (deep learning, random forest, gradient boosting machine) to predict the treatment time required for a patient and conducts a comparative analysis of these algorithms with respect to predictive performance. The findings of this study indicate that ML algorithms have the potential to predict the provider service time with superior accuracy. While the current approach of experience-based appointment duration estimation adopted by the clinic resulted in a mean absolute percentage error of 25.8%, the Deep learning algorithm developed in this study yielded the best performance with a MAPE of 12.24%, followed by gradient boosting machine (13.26%) and random forests (14.71%). Besides, this research also identified the critical variables affecting consultation duration to be patient type (new vs. established), doctor's experience, zip code, appointment day, and doctor's specialty. Moreover, several practical insights are obtained based on the comparative analysis of the ML algorithms. The machine learning approach presented in this study can serve as a decision support tool and could be integrated into the appointment system for effectively managing patient scheduling.

Keywords: clinical decision support system, machine learning algorithms, patient scheduling, prediction models, provider service time

Procedia PDF Downloads 105
60 Harnessing Emerging Creative Technology for Knowledge Discovery of Multiwavelenght Datasets

Authors: Basiru Amuneni

Abstract:

Astronomy is one domain with a rise in data. Traditional tools for data management have been employed in the quest for knowledge discovery. However, these traditional tools become limited in the face of big. One means of maximizing knowledge discovery for big data is the use of scientific visualisation. The aim of the work is to explore the possibilities offered by emerging creative technologies of Virtual Reality (VR) systems and game engines to visualize multiwavelength datasets. Game Engines are primarily used for developing video games, however their advanced graphics could be exploited for scientific visualization which provides a means to graphically illustrate scientific data to ease human comprehension. Modern astronomy is now in the era of multiwavelength data where a single galaxy for example, is captured by the telescope several times and at different electromagnetic wavelength to have a more comprehensive picture of the physical characteristics of the galaxy. Visualising this in an immersive environment would be more intuitive and natural for an observer. This work presents a standalone VR application that accesses galaxy FITS files. The application was built using the Unity Game Engine for the graphics underpinning and the OpenXR API for the VR infrastructure. The work used a methodology known as Design Science Research (DSR) which entails the act of ‘using design as a research method or technique’. The key stages of the galaxy modelling pipeline are FITS data preparation, Galaxy Modelling, Unity 3D Visualisation and VR Display. The FITS data format cannot be read by the Unity Game Engine directly. A DLL (CSHARPFITS) which provides a native support for reading and writing FITS files was used. The Galaxy modeller uses an approach that integrates cleaned FITS image pixels into the graphics pipeline of the Unity3d game Engine. The cleaned FITS images are then input to the galaxy modeller pipeline phase, which has a pre-processing script that extracts, pixel, galaxy world position, and colour maps the FITS image pixels. The user can visualise image galaxies in different light bands, control the blend of the image with similar images from different sources or fuse images for a holistic view. The framework will allow users to build tools to realise complex workflows for public outreach and possibly scientific work with increased scalability, near real time interactivity with ease of access. The application is presented in an immersive environment and can use all commercially available headset built on the OpenXR API. The user can select galaxies in the scene, teleport to the galaxy, pan, zoom in/out, and change colour gradients of the galaxy. The findings and design lessons learnt in the implementation of different use cases will contribute to the development and design of game-based visualisation tools in immersive environment by enabling informed decisions to be made.

Keywords: astronomy, visualisation, multiwavelenght dataset, virtual reality

Procedia PDF Downloads 69
59 Forecasting Thermal Energy Demand in District Heating and Cooling Systems Using Long Short-Term Memory Neural Networks

Authors: Kostas Kouvaris, Anastasia Eleftheriou, Georgios A. Sarantitis, Apostolos Chondronasios

Abstract:

To achieve the objective of almost zero carbon energy solutions by 2050, the EU needs to accelerate the development of integrated, highly efficient and environmentally friendly solutions. In this direction, district heating and cooling (DHC) emerges as a viable and more efficient alternative to conventional, decentralized heating and cooling systems, enabling a combination of more efficient renewable and competitive energy supplies. In this paper, we develop a forecasting tool for near real-time local weather and thermal energy demand predictions for an entire DHC network. In this fashion, we are able to extend the functionality and to improve the energy efficiency of the DHC network by predicting and adjusting the heat load that is distributed from the heat generation plant to the connected buildings by the heat pipe network. Two case-studies are considered; one for Vransko, Slovenia and one for Montpellier, France. The data consists of i) local weather data, such as humidity, temperature, and precipitation, ii) weather forecast data, such as the outdoor temperature and iii) DHC operational parameters, such as the mass flow rate, supply and return temperature. The external temperature is found to be the most important energy-related variable for space conditioning, and thus it is used as an external parameter for the energy demand models. For the development of the forecasting tool, we use state-of-the-art deep neural networks and more specifically, recurrent networks with long-short-term memory cells, which are able to capture complex non-linear relations among temporal variables. Firstly, we develop models to forecast outdoor temperatures for the next 24 hours using local weather data for each case-study. Subsequently, we develop models to forecast thermal demand for the same period, taking under consideration past energy demand values as well as the predicted temperature values from the weather forecasting models. The contributions to the scientific and industrial community are three-fold, and the empirical results are highly encouraging. First, we are able to predict future thermal demand levels for the two locations under consideration with minimal errors. Second, we examine the impact of the outdoor temperature on the predictive ability of the models and how the accuracy of the energy demand forecasts decreases with the forecast horizon. Third, we extend the relevant literature with a new dataset of thermal demand and examine the performance and applicability of machine learning techniques to solve real-world problems. Overall, the solution proposed in this paper is in accordance with EU targets, providing an automated smart energy management system, decreasing human errors and reducing excessive energy production.

Keywords: machine learning, LSTMs, district heating and cooling system, thermal demand

Procedia PDF Downloads 123
58 Computational Fluid Dynamics Simulation of a Nanofluid-Based Annular Solar Collector with Different Metallic Nano-Particles

Authors: Sireetorn Kuharat, Anwar Beg

Abstract:

Motivation- Solar energy constitutes the most promising renewable energy source on earth. Nanofluids are a very successful family of engineered fluids, which contain well-dispersed nanoparticles suspended in a stable base fluid. The presence of metallic nanoparticles (e.g. gold, silver, copper, aluminum etc) significantly improves the thermo-physical properties of the host fluid and generally results in a considerable boost in thermal conductivity, density, and viscosity of nanofluid compared with the original base (host) fluid. This modification in fundamental thermal properties has profound implications in influencing the convective heat transfer process in solar collectors. The potential for improving solar collector direct absorber efficiency is immense and to gain a deeper insight into the impact of different metallic nanoparticles on efficiency and temperature enhancement, in the present work, we describe recent computational fluid dynamics simulations of an annular solar collector system. The present work studies several different metallic nano-particles and compares their performance. Methodologies- A numerical study of convective heat transfer in an annular pipe solar collector system is conducted. The inner tube contains pure water and the annular region contains nanofluid. Three-dimensional steady-state incompressible laminar flow comprising water- (and other) based nanofluid containing a variety of metallic nanoparticles (copper oxide, aluminum oxide, and titanium oxide nanoparticles) is examined. The Tiwari-Das model is deployed for which thermal conductivity, specific heat capacity and viscosity of the nanofluid suspensions is evaluated as a function of solid nano-particle volume fraction. Radiative heat transfer is also incorporated using the ANSYS solar flux and Rosseland radiative models. The ANSYS FLUENT finite volume code (version 18.1) is employed to simulate the thermo-fluid characteristics via the SIMPLE algorithm. Mesh-independence tests are conducted. Validation of the simulations is also performed with a computational Harlow-Welch MAC (Marker and Cell) finite difference method and excellent correlation achieved. The influence of volume fraction on temperature, velocity, pressure contours is computed and visualized. Main findings- The best overall performance is achieved with copper oxide nanoparticles. Thermal enhancement is generally maximized when water is utilized as the base fluid, although in certain cases ethylene glycol also performs very efficiently. Increasing nanoparticle solid volume fraction elevates temperatures although the effects are less prominent in aluminum and titanium oxide nanofluids. Significant improvement in temperature distributions is achieved with copper oxide nanofluid and this is attributed to the superior thermal conductivity of copper compared to other metallic nano-particles studied. Important fluid dynamic characteristics are also visualized including circulation and temperature shoots near the upper region of the annulus. Radiative flux is observed to enhance temperatures significantly via energization of the nanofluid although again the best elevation in performance is attained consistently with copper oxide. Conclusions-The current study generalizes previous investigations by considering multiple metallic nano-particles and furthermore provides a good benchmark against which to calibrate experimental tests on a new solar collector configuration currently being designed at Salford University. Important insights into the thermal conductivity and viscosity with metallic nano-particles is also provided in detail. The analysis is also extendable to other metallic nano-particles including gold and zinc.

Keywords: heat transfer, annular nanofluid solar collector, ANSYS FLUENT, metallic nanoparticles

Procedia PDF Downloads 128
57 Evaluating the Impact of Early Maternal Incarceration on Male Delinquent Behavior during Emerging Adulthood through the Mediating Mechanism of Mastery

Authors: Richard Abel

Abstract:

In the United States, increased incarceration rates have caused many adolescents to feel the strain of parental absence. This absence is then manifest through adolescent feelings of parental rejection. Additionally, upon reentry maternal incarceration may be related to adolescents experienced perceived excessive disciple. It is possible parents engage in this manner of discipline attempting to prevent the child from taking the same path to incarceration as the parent. According to General Strain Theory, adolescents encountering strain are likely to experience negative emotions. The emotion that is most likely to lead to delinquency is anger through reduced inhibitions and motivation to act. Additionally, males are more likely to engage in delinquent behavior, regardless of experiencing strain. This is not the case for every male who experiences maternal incarceration, parental rejection, excessive discipline, or anger. There are protective factors that enable agency within individuals. One such protective factor is mastery, or the perception that one is in control of his or her own future. The model proposed in this research suggests maternal incarceration is associated with increased parental rejection and excessive discipline in males. Males experiencing parental rejection and excessive discipline are likely to experience increased anger, which is then associated with increases in delinquent behavior. This model explores whether agency, in the form of mastery, mediates the relationship between strains and negative emotions, or between negative emotions and delinquent behavior. The Kaplan Longitudinal and Multigenerational Study (KLAMS) dataset is uniquely situated to analyze this model providing longitudinal data collected from both parents and their offspring. Maternal incarceration is constructed using parental responses such that the mother was incarcerated after the child’s birth, and any incarceration that happened prior to birth is excluded. The remaining variables of the study are all constructed from varying waves of the adolescent survey. Parental rejection, along with control variables for age, race, parental socioeconomic status, neighborhood effects, delinquent peers, and prior delinquent behavior are all constructed using Wave I data. To increase causal inference, the negative emotion of anger and the mediating variable of mastery are measured during Wave II. Lastly, delinquent behavior is measured at Wave III. Results of the analysis show expected relationships such that adolescent males encountering maternal incarceration show increased perception of parental rejection and excessive discipline. Additionally, there is a positive relationship between parental rejection and excessive discipline at Wave I and feelings of anger at Wave II for males. For males experiencing either of these strains in Wave I, feelings of anger in Wave II are found to be associated with increased delinquent behavior in Wave III. Mastery was found to mediate the relationship between both parental rejection and excessive discipline and anger, but no such mediation occurs in the relationship between anger and delinquency, regardless of the strain being experienced. These findings suggest adolescent males who feel they are in control of their own lives are less likely to experience negative emotions produced by the occurrence of strain, thereby decreasing male engagement in delinquent behavior later in life.

Keywords: delinquency, mastery, maternal incarceration, strain

Procedia PDF Downloads 114
56 Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values in the Context of the Manufacture of Aircraft Engines

Authors: Sara Rejeb, Catherine Duveau, Tabea Rebafka

Abstract:

To monitor the production process of turbofan aircraft engines, multiple measurements of various geometrical parameters are systematically recorded on manufactured parts. Engine parts are subject to extremely high standards as they can impact the performance of the engine. Therefore, it is essential to analyze these databases to better understand the influence of the different parameters on the engine's performance. Self-organizing maps are unsupervised neural networks which achieve two tasks simultaneously: they visualize high-dimensional data by projection onto a 2-dimensional map and provide clustering of the data. This technique has become very popular for data exploration since it provides easily interpretable results and a meaningful global view of the data. As such, self-organizing maps are usually applied to aircraft engine condition monitoring. As databases in this field are huge and complex, they naturally contain multiple missing entries for various reasons. The classical Kohonen algorithm to compute self-organizing maps is conceived for complete data only. A naive approach to deal with partially observed data consists in deleting items or variables with missing entries. However, this requires a sufficient number of complete individuals to be fairly representative of the population; otherwise, deletion leads to a considerable loss of information. Moreover, deletion can also induce bias in the analysis results. Alternatively, one can first apply a common imputation method to create a complete dataset and then apply the Kohonen algorithm. However, the choice of the imputation method may have a strong impact on the resulting self-organizing map. Our approach is to address simultaneously the two problems of computing a self-organizing map and imputing missing values, as these tasks are not independent. In this work, we propose an extension of self-organizing maps for partially observed data, referred to as missSOM. First, we introduce a criterion to be optimized, that aims at defining simultaneously the best self-organizing map and the best imputations for the missing entries. As such, missSOM is also an imputation method for missing values. To minimize the criterion, we propose an iterative algorithm that alternates the learning of a self-organizing map and the imputation of missing values. Moreover, we develop an accelerated version of the algorithm by entwining the iterations of the Kohonen algorithm with the updates of the imputed values. This method is efficiently implemented in R and will soon be released on CRAN. Compared to the standard Kohonen algorithm, it does not come with any additional cost in terms of computing time. Numerical experiments illustrate that missSOM performs well in terms of both clustering and imputation compared to the state of the art. In particular, it turns out that missSOM is robust to the missingness mechanism, which is in contrast to many imputation methods that are appropriate for only a single mechanism. This is an important property of missSOM as, in practice, the missingness mechanism is often unknown. An application to measurements on one type of part is also provided and shows the practical interest of missSOM.

Keywords: imputation method of missing data, partially observed data, robustness to missingness mechanism, self-organizing maps

Procedia PDF Downloads 133