Search results for: HRV metrics
160 Transformer-Driven Multi-Category Classification for an Automated Academic Strand Recommendation Framework
Authors: Ma Cecilia Siva
Abstract:
This study introduces a Bidirectional Encoder Representations from Transformers (BERT)-based machine learning model aimed at improving educational counseling by automating the process of recommending academic strands for students. The framework is designed to streamline and enhance the strand selection process by analyzing students' profiles and suggesting suitable academic paths based on their interests, strengths, and goals. Data was gathered from a sample of 200 grade 10 students, which included personal essays and survey responses relevant to strand alignment. After thorough preprocessing, the text data was tokenized, label-encoded, and input into a fine-tuned BERT model set up for multi-label classification. The model was optimized for balanced accuracy and computational efficiency, featuring a multi-category classification layer with sigmoid activation for independent strand predictions. Performance metrics showed an F1 score of 88%, indicating a well-balanced model with precision at 80% and recall at 100%, demonstrating its effectiveness in providing reliable recommendations while reducing irrelevant strand suggestions. To facilitate practical use, the final deployment phase created a recommendation framework that processes new student data through the trained model and generates personalized academic strand suggestions. This automated recommendation system presents a scalable solution for academic guidance, potentially enhancing student satisfaction and alignment with educational objectives. The study's findings indicate that expanding the data set, integrating additional features, and refining the model iteratively could improve the framework's accuracy and broaden its applicability in various educational contexts.Keywords: tokenized, sigmoid activation, transformer, multi category classification
Procedia PDF Downloads 8159 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite
Authors: F. Lazzeri, I. Reiter
Abstract:
Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.
Keywords: time-series, features engineering methods for forecasting, energy demand forecasting, Azure Machine Learning
Procedia PDF Downloads 297158 Project Progress Prediction in Software Devlopment Integrating Time Prediction Algorithms and Large Language Modeling
Authors: Dong Wu, Michael Grenn
Abstract:
Managing software projects effectively is crucial for meeting deadlines, ensuring quality, and managing resources well. Traditional methods often struggle with predicting project timelines accurately due to uncertain schedules and complex data. This study addresses these challenges by combining time prediction algorithms with Large Language Models (LLMs). It makes use of real-world software project data to construct and validate a model. The model takes detailed project progress data such as task completion dynamic, team Interaction and development metrics as its input and outputs predictions of project timelines. To evaluate the effectiveness of this model, a comprehensive methodology is employed, involving simulations and practical applications in a variety of real-world software project scenarios. This multifaceted evaluation strategy is designed to validate the model's significant role in enhancing forecast accuracy and elevating overall management efficiency, particularly in complex software project environments. The results indicate that the integration of time prediction algorithms with LLMs has the potential to optimize software project progress management. These quantitative results suggest the effectiveness of the method in practical applications. In conclusion, this study demonstrates that integrating time prediction algorithms with LLMs can significantly improve the predictive accuracy and efficiency of software project management. This offers an advanced project management tool for the industry, with the potential to improve operational efficiency, optimize resource allocation, and ensure timely project completion.Keywords: software project management, time prediction algorithms, large language models (LLMS), forecast accuracy, project progress prediction
Procedia PDF Downloads 79157 Predicting Low Birth Weight Using Machine Learning: A Study on 53,637 Ethiopian Birth Data
Authors: Kehabtimer Shiferaw Kotiso, Getachew Hailemariam, Abiy Seifu Estifanos
Abstract:
Introduction: Despite the highest share of low birth weight (LBW) for neonatal mortality and morbidity, predicting births with LBW for better intervention preparation is challenging. This study aims to predict LBW using a dataset encompassing 53,637 birth cohorts collected from 36 primary hospitals across seven regions in Ethiopia from February 2022 to June 2024. Methods: We identified ten explanatory variables related to maternal and neonatal characteristics, including maternal education, age, residence, history of miscarriage or abortion, history of preterm birth, type of pregnancy, number of livebirths, number of stillbirths, antenatal care frequency, and sex of the fetus to predict LBW. Using WEKA 3.8.2, we developed and compared seven machine learning algorithms. Data preprocessing included handling missing values, outlier detection, and ensuring data integrity in birth weight records. Model performance was evaluated through metrics such as accuracy, precision, recall, F1-score, and area under the Receiver Operating Characteristic curve (ROC AUC) using 10-fold cross-validation. Results: The results demonstrated that the decision tree, J48, logistic regression, and gradient boosted trees model achieved the highest accuracy (94.5% to 94.6%) with a precision of 93.1% to 93.3%, F1-score of 92.7% to 93.1%, and ROC AUC of 71.8% to 76.6%. Conclusion: This study demonstrates the effectiveness of machine learning models in predicting LBW. The high accuracy and recall rates achieved indicate that these models can serve as valuable tools for healthcare policymakers and providers in identifying at-risk newborns and implementing timely interventions to achieve the sustainable developmental goal (SDG) related to neonatal mortality.Keywords: low birth weight, machine learning, classification, neonatal mortality, Ethiopia
Procedia PDF Downloads 21156 Evaluation of IMERG Performance at Estimating the Rainfall Properties through Convective and Stratiform Rain Events in a Semi-Arid Region of Mexico
Authors: Eric Muñoz de la Torre, Julián González Trinidad, Efrén González Ramírez
Abstract:
Rain varies greatly in its duration, intensity, and spatial coverage, it is important to have sub-daily rainfall data for various applications, including risk prevention. However, the ground measurements are limited by the low and irregular density of rain gauges. An alternative to this problem are the Satellite Precipitation Products (SPPs) that use passive microwave and infrared sensors to estimate rainfall, as IMERG, however, these SPPs have to be validated before their application. The aim of this study is to evaluate the performance of the IMERG: Integrated Multi-satellitE Retrievals for Global Precipitation Measurament final run V06B SPP in a semi-arid region of Mexico, using 4 automatic rain gauges (pluviographs) sub-daily data of October 2019 and June to September 2021, using the Minimum inter-event Time (MIT) criterion to separate unique rain events with a dry period of 10 hrs. for the purpose of evaluating the rainfall properties (depth, duration and intensity). Point to pixel analysis, continuous, categorical, and volumetric statistical metrics were used. Results show that IMERG is capable to estimate the rainfall depth with a slight overestimation but is unable to identify the real duration and intensity of the rain events, showing large overestimations and underestimations, respectively. The study zone presented 80 to 85 % of convective rain events, the rest were stratiform rain events, classified by the depth magnitude variation of IMERG pixels and pluviographs. IMERG showed poorer performance at detecting the first ones but had a good performance at estimating stratiform rain events that are originated by Cold Fronts.Keywords: IMERG, rainfall, rain gauge, remote sensing, statistical evaluation
Procedia PDF Downloads 69155 AI-Driven Solutions for Optimizing Master Data Management
Authors: Srinivas Vangari
Abstract:
In the era of big data, ensuring the accuracy, consistency, and reliability of critical data assets is crucial for data-driven enterprises. Master Data Management (MDM) plays a crucial role in this endeavor. This paper investigates the role of Artificial Intelligence (AI) in enhancing MDM, focusing on how AI-driven solutions can automate and optimize various stages of the master data lifecycle. By integrating AI (Quantitative and Qualitative Analysis) into processes such as data creation, maintenance, enrichment, and usage, organizations can achieve significant improvements in data quality and operational efficiency. Quantitative analysis is employed to measure the impact of AI on key metrics, including data accuracy, processing speed, and error reduction. For instance, our study demonstrates an 18% improvement in data accuracy and a 75% reduction in duplicate records across multiple systems post-AI implementation. Furthermore, AI’s predictive maintenance capabilities reduced data obsolescence by 22%, as indicated by statistical analyses of data usage patterns over a 12-month period. Complementing this, a qualitative analysis delves into the specific AI-driven strategies that enhance MDM practices, such as automating data entry and validation, which resulted in a 28% decrease in manual errors. Insights from case studies highlight how AI-driven data cleansing processes reduced inconsistencies by 25% and how AI-powered enrichment strategies improved data relevance by 24%, thus boosting decision-making accuracy. The findings demonstrate that AI significantly enhances data quality and integrity, leading to improved enterprise performance through cost reduction, increased compliance, and more accurate, real-time decision-making. These insights underscore the value of AI as a critical tool in modern data management strategies, offering a competitive edge to organizations that leverage its capabilities.Keywords: artificial intelligence, master data management, data governance, data quality
Procedia PDF Downloads 17154 Optimizing Energy Efficiency: Leveraging Big Data Analytics and AWS Services for Buildings and Industries
Authors: Gaurav Kumar Sinha
Abstract:
In an era marked by increasing concerns about energy sustainability, this research endeavors to address the pressing challenge of energy consumption in buildings and industries. This study delves into the transformative potential of AWS services in optimizing energy efficiency. The research is founded on the recognition that effective management of energy consumption is imperative for both environmental conservation and economic viability. Buildings and industries account for a substantial portion of global energy use, making it crucial to develop advanced techniques for analysis and reduction. This study sets out to explore the integration of AWS services with big data analytics to provide innovative solutions for energy consumption analysis. Leveraging AWS's cloud computing capabilities, scalable infrastructure, and data analytics tools, the research aims to develop efficient methods for collecting, processing, and analyzing energy data from diverse sources. The core focus is on creating predictive models and real-time monitoring systems that enable proactive energy management. By harnessing AWS's machine learning and data analytics capabilities, the research seeks to identify patterns, anomalies, and optimization opportunities within energy consumption data. Furthermore, this study aims to propose actionable recommendations for reducing energy consumption in buildings and industries. By combining AWS services with metrics-driven insights, the research strives to facilitate the implementation of energy-efficient practices, ultimately leading to reduced carbon emissions and cost savings. The integration of AWS services not only enhances the analytical capabilities but also offers scalable solutions that can be customized for different building and industrial contexts. The research also recognizes the potential for AWS-powered solutions to promote sustainable practices and support environmental stewardship.Keywords: energy consumption analysis, big data analytics, AWS services, energy efficiency
Procedia PDF Downloads 64153 Semi-Automatic Segmentation of Mitochondria on Transmission Electron Microscopy Images Using Live-Wire and Surface Dragging Methods
Authors: Mahdieh Farzin Asanjan, Erkan Unal Mumcuoglu
Abstract:
Mitochondria are cytoplasmic organelles of the cell, which have a significant role in the variety of cellular metabolic functions. Mitochondria act as the power plants of the cell and are surrounded by two membranes. Significant morphological alterations are often due to changes in mitochondrial functions. A powerful technique in order to study the three-dimensional (3D) structure of mitochondria and its alterations in disease states is Electron microscope tomography. Detection of mitochondria in electron microscopy images due to the presence of various subcellular structures and imaging artifacts is a challenging problem. Another challenge is that each image typically contains more than one mitochondrion. Hand segmentation of mitochondria is tedious and time-consuming and also special knowledge about the mitochondria is needed. Fully automatic segmentation methods lead to over-segmentation and mitochondria are not segmented properly. Therefore, semi-automatic segmentation methods with minimum manual effort are required to edit the results of fully automatic segmentation methods. Here two editing tools were implemented by applying spline surface dragging and interactive live-wire segmentation tools. These editing tools were applied separately to the results of fully automatic segmentation. 3D extension of these tools was also studied and tested. Dice coefficients of 2D and 3D for surface dragging using splines were 0.93 and 0.92. This metric for 2D and 3D for live-wire method were 0.94 and 0.91 respectively. The root mean square symmetric surface distance values of 2D and 3D for surface dragging was measured as 0.69, 0.93. The same metrics for live-wire tool were 0.60 and 2.11. Comparing the results of these editing tools with the results of automatic segmentation method, it shows that these editing tools, led to better results and these results were more similar to ground truth image but the required time was higher than hand-segmentation timeKeywords: medical image segmentation, semi-automatic methods, transmission electron microscopy, surface dragging using splines, live-wire
Procedia PDF Downloads 169152 Towards an Enhanced Quality of IPTV Media Server Architecture over Software Defined Networking
Authors: Esmeralda Hysenbelliu
Abstract:
The aim of this paper is to present the QoE (Quality of Experience) IPTV SDN-based media streaming server enhanced architecture for configuring, controlling, management and provisioning the improved delivery of IPTV service application with low cost, low bandwidth, and high security. Furthermore, it is given a virtual QoE IPTV SDN-based topology to provide an improved IPTV service based on QoE Control and Management of multimedia services functionalities. Inside OpenFlow SDN Controller there are enabled in high flexibility and efficiency Service Load-Balancing Systems; based on the Loading-Balance module and based on GeoIP Service. This two Load-balancing system improve IPTV end-users Quality of Experience (QoE) with optimal management of resources greatly. Through the key functionalities of OpenFlow SDN controller, this approach produced several important features, opportunities for overcoming the critical QoE metrics for IPTV Service like achieving incredible Fast Zapping time (Channel Switching time) < 0.1 seconds. This approach enabled Easy and Powerful Transcoding system via FFMPEG encoder. It has the ability to customize streaming dimensions bitrates, latency management and maximum transfer rates ensuring delivering of IPTV streaming services (Audio and Video) in high flexibility, low bandwidth and required performance. This QoE IPTV SDN-based media streaming architecture unlike other architectures provides the possibility of Channel Exchanging between several IPTV service providers all over the word. This new functionality brings many benefits as increasing the number of TV channels received by end –users with low cost, decreasing stream failure time (Channel Failure time < 0.1 seconds) and improving the quality of streaming services.Keywords: improved quality of experience (QoE), OpenFlow SDN controller, IPTV service application, softwarization
Procedia PDF Downloads 147151 Quality of Service Based Routing Algorithm for Real Time Applications in MANETs Using Ant Colony and Fuzzy Logic
Authors: Farahnaz Karami
Abstract:
Routing is an important, challenging task in mobile ad hoc networks due to node mobility, lack of central control, unstable links, and limited resources. An ant colony has been found to be an attractive technique for routing in Mobile Ad Hoc Networks (MANETs). However, existing swarm intelligence based routing protocols find an optimal path by considering only one or two route selection metrics without considering correlations among such parameters making them unsuitable lonely for routing real time applications. Fuzzy logic combines multiple route selection parameters containing uncertain information or imprecise data in nature, but does not have multipath routing property naturally in order to provide load balancing. The objective of this paper is to design a routing algorithm using fuzzy logic and ant colony that can solve some of routing problems in mobile ad hoc networks, such as nodes energy consumption optimization to increase network lifetime, link failures rate reduction to increase packet delivery reliability and providing load balancing to optimize available bandwidth. In proposed algorithm, the path information will be given to fuzzy inference system by ants. Based on the available path information and considering the parameters required for quality of service (QoS), the fuzzy cost of each path is calculated and the optimal paths will be selected. NS2.35 simulation tools are used for simulation and the results are compared and evaluated with the newest QoS based algorithms in MANETs according to packet delivery ratio, end-to-end delay and routing overhead ratio criterions. The simulation results show significant improvement in the performance of these networks in terms of decreasing end-to-end delay, and routing overhead ratio, and also increasing packet delivery ratio.Keywords: mobile ad hoc networks, routing, quality of service, ant colony, fuzzy logic
Procedia PDF Downloads 64150 Improving Chest X-Ray Disease Detection with Enhanced Data Augmentation Using Novel Approach of Diverse Conditional Wasserstein Generative Adversarial Networks
Authors: Malik Muhammad Arslan, Muneeb Ullah, Dai Shihan, Daniyal Haider, Xiaodong Yang
Abstract:
Chest X-rays are instrumental in the detection and monitoring of a wide array of diseases, including viral infections such as COVID-19, tuberculosis, pneumonia, lung cancer, and various cardiac and pulmonary conditions. To enhance the accuracy of diagnosis, artificial intelligence (AI) algorithms, particularly deep learning models like Convolutional Neural Networks (CNNs), are employed. However, these deep learning models demand a substantial and varied dataset to attain optimal precision. Generative Adversarial Networks (GANs) can be employed to create new data, thereby supplementing the existing dataset and enhancing the accuracy of deep learning models. Nevertheless, GANs have their limitations, such as issues related to stability, convergence, and the ability to distinguish between authentic and fabricated data. In order to overcome these challenges and advance the detection and classification of CXR normal and abnormal images, this study introduces a distinctive technique known as DCWGAN (Diverse Conditional Wasserstein GAN) for generating synthetic chest X-ray (CXR) images. The study evaluates the effectiveness of this Idiosyncratic DCWGAN technique using the ResNet50 model and compares its results with those obtained using the traditional GAN approach. The findings reveal that the ResNet50 model trained on the DCWGAN-generated dataset outperformed the model trained on the classic GAN-generated dataset. Specifically, the ResNet50 model utilizing DCWGAN synthetic images achieved impressive performance metrics with an accuracy of 0.961, precision of 0.955, recall of 0.970, and F1-Measure of 0.963. These results indicate the promising potential for the early detection of diseases in CXR images using this Inimitable approach.Keywords: CNN, classification, deep learning, GAN, Resnet50
Procedia PDF Downloads 88149 Role of Organizational Culture in Building Sustainable Employee’s Performance in Organizations: A Case Study of Zenith Bank PLC Jalingo Taraba State Nigeria
Authors: Jerome Nyameh
Abstract:
The most valuable asset in the existence of organization is the employees and their ability in maintain appreciable level of performance which support the goal of the organization and the ability to do that depend largely on the organizational culture and culture has been considered most currently as the factor that relate positively to organizational excellence and sustainable employee’s performance over the period of time An employee engagement program will not go far without first establishing the organizational culture that is required to support sustainability. This means integrating sustainability into the overall employee’s performance, with clear vision, goals and metrics. It means having strong culture and a collaborative governance structure that has been develop as a ways of doing things in the organization for decision making and resource allocation. It requires a rewards and recognition program to support and reinforce sustainability behaviors. With such a culture in place, organization will be able to develop a strategy that fully engages employees, while fully realizing the benefits of their contributions. The study investigated empirically the role of organizational culture building sustainable employee’s performance using Zenith bank PLC a model where organizational culture will build sustainable employees performance strategy for a lasting actualization of organizational was developed. In order to achieve the research objectives of (i) to assess how organizational culture can build sustainable employee’s performance (ii) to analyze the gap that exists between organizational culture and sustainable employee’s performance in the organization, a survey questionnaires of 20 items was administered to sixty respondents. The findings of this study have practical implications for organizational leaders, managers and employees, and their organizations, particularly commercial banks in Nigeria, besides offering scope for further research in the area of organizational culture and sustainable employee’s performance. It will also show a significance and positive relationship that exist between organizational culture and sustainable employee’s performance, as means of building viable organization with cultural uniqueness and excellence performance in the world of competition.Keywords: organizational culture, sustainable employee’s performance, organizations, Zenith Bank PLC Nigeria
Procedia PDF Downloads 514148 Multi-Sensor Image Fusion for Visible and Infrared Thermal Images
Authors: Amit Kumar Happy
Abstract:
This paper is motivated by the importance of multi-sensor image fusion with a specific focus on infrared (IR) and visual image (VI) fusion for various applications, including military reconnaissance. Image fusion can be defined as the process of combining two or more source images into a single composite image with extended information content that improves visual perception or feature extraction. These images can be from different modalities like visible camera & IR thermal imager. While visible images are captured by reflected radiations in the visible spectrum, the thermal images are formed from thermal radiation (infrared) that may be reflected or self-emitted. A digital color camera captures the visible source image, and a thermal infrared camera acquires the thermal source image. In this paper, some image fusion algorithms based upon multi-scale transform (MST) and region-based selection rule with consistency verification have been proposed and presented. This research includes the implementation of the proposed image fusion algorithm in MATLAB along with a comparative analysis to decide the optimum number of levels for MST and the coefficient fusion rule. The results are presented, and several commonly used evaluation metrics are used to assess the suggested method's validity. Experiments show that the proposed approach is capable of producing good fusion results. While deploying our image fusion algorithm approaches, we observe several challenges from the popular image fusion methods. While high computational cost and complex processing steps of image fusion algorithms provide accurate fused results, they also make it hard to become deployed in systems and applications that require a real-time operation, high flexibility, and low computation ability. So, the methods presented in this paper offer good results with minimum time complexity.Keywords: image fusion, IR thermal imager, multi-sensor, multi-scale transform
Procedia PDF Downloads 115147 Investigating the Effectiveness of Multilingual NLP Models for Sentiment Analysis
Authors: Othmane Touri, Sanaa El Filali, El Habib Benlahmar
Abstract:
Natural Language Processing (NLP) has gained significant attention lately. It has proved its ability to analyze and extract insights from unstructured text data in various languages. It is found that one of the most popular NLP applications is sentiment analysis which aims to identify the sentiment expressed in a piece of text, such as positive, negative, or neutral, in multiple languages. While there are several multilingual NLP models available for sentiment analysis, there is a need to investigate their effectiveness in different contexts and applications. In this study, we aim to investigate the effectiveness of different multilingual NLP models for sentiment analysis on a dataset of online product reviews in multiple languages. The performance of several NLP models, including Google Cloud Natural Language API, Microsoft Azure Cognitive Services, Amazon Comprehend, Stanford CoreNLP, spaCy, and Hugging Face Transformers are being compared. The models based on several metrics, including accuracy, precision, recall, and F1 score, are being evaluated and compared to their performance across different categories of product reviews. In order to run the study, preprocessing of the dataset has been performed by cleaning and tokenizing the text data in multiple languages. Then training and testing each model has been applied using a cross-validation approach where randomly dividing the dataset into training and testing sets and repeating the process multiple times has been used. A grid search approach to optimize the hyperparameters of each model and select the best-performing model for each category of product reviews and language has been applied. The findings of this study provide insights into the effectiveness of different multilingual NLP models for Multilingual Sentiment Analysis and their suitability for different languages and applications. The strengths and limitations of each model were identified, and recommendations for selecting the most performant model based on the specific requirements of a project were provided. This study contributes to the advancement of research methods in multilingual NLP and provides a practical guide for researchers and practitioners in the field.Keywords: NLP, multilingual, sentiment analysis, texts
Procedia PDF Downloads 102146 Data Mining Model for Predicting the Status of HIV Patients during Drug Regimen Change
Authors: Ermias A. Tegegn, Million Meshesha
Abstract:
Human Immunodeficiency Virus and Acquired Immunodeficiency Syndrome (HIV/AIDS) is a major cause of death for most African countries. Ethiopia is one of the seriously affected countries in sub Saharan Africa. Previously in Ethiopia, having HIV/AIDS was almost equivalent to a death sentence. With the introduction of Antiretroviral Therapy (ART), HIV/AIDS has become chronic, but manageable disease. The study focused on a data mining technique to predict future living status of HIV/AIDS patients at the time of drug regimen change when the patients become toxic to the currently taking ART drug combination. The data is taken from University of Gondar Hospital ART program database. Hybrid methodology is followed to explore the application of data mining on ART program dataset. Data cleaning, handling missing values and data transformation were used for preprocessing the data. WEKA 3.7.9 data mining tools, classification algorithms, and expertise are utilized as means to address the research problem. By using four different classification algorithms, (i.e., J48 Classifier, PART rule induction, Naïve Bayes and Neural network) and by adjusting their parameters thirty-two models were built on the pre-processed University of Gondar ART program dataset. The performances of the models were evaluated using the standard metrics of accuracy, precision, recall, and F-measure. The most effective model to predict the status of HIV patients with drug regimen substitution is pruned J48 decision tree with a classification accuracy of 98.01%. This study extracts interesting attributes such as Ever taking Cotrim, Ever taking TbRx, CD4 count, Age, Weight, and Gender so as to predict the status of drug regimen substitution. The outcome of this study can be used as an assistant tool for the clinician to help them make more appropriate drug regimen substitution. Future research directions are forwarded to come up with an applicable system in the area of the study.Keywords: HIV drug regimen, data mining, hybrid methodology, predictive model
Procedia PDF Downloads 142145 Safeguarding Product Quality through Pre-Qualification of Material Manufacturers: A Ship and Offshore Classification Society's Perspective
Authors: Sastry Y. Kandukuri, Isak Andersen
Abstract:
Despite recent advances in the manufacturing sector, quality issues remain a frequent occurrence, and can result in fatal accidents, equipment downtime, and loss of life. Adequate quality is of high importance in high-risk industries such as sea-going vessels and offshore installations in which third party quality assurance and product control play an important essential role in ensuring manufacturing quality of critical components. Classification societies play a vital role in mitigating risk in these industries by making sure that all the stakeholders i.e. manufacturers, builders, and end users are provided with adequate rules and standards that effectively ensures components produced at a high level of quality based on the area of application and risk of its failure. Quality issues have also been linked to the lack of competence or negligence of stakeholders in supply value chain. However, continued actions and regulatory reforms through modernization of rules and requirements has provided additional tools for purchasers and manufacturers to confront these issues. Included among these tools are updated ‘approval of manufacturer class programs’ aimed at developing and implementing a set of standardized manufacturing quality metrics for use by the manufacturer and verified by the classification society. The establishment and collection of manufacturing and testing requirements described in these programs could provide various stakeholders – from industry to vessel owners – with greater insight into the state of quality at a given manufacturing facility, and allow stakeholders to anticipate better and address quality issues while simultaneously reducing unnecessary failures that are costly to the industry. The publication introduces, explains and discusses critical manufacturing and testing requirements set in a leading class society’s approval of manufacturer regime and its rationale and some case studies.Keywords: classification society, manufacturing, materials processing, materials testing, quality control
Procedia PDF Downloads 355144 The Detection of Implanted Radioactive Seeds on Ultrasound Images Using Convolution Neural Networks
Authors: Edward Holupka, John Rossman, Tye Morancy, Joseph Aronovitz, Irving Kaplan
Abstract:
A common modality for the treatment of early stage prostate cancer is the implantation of radioactive seeds directly into the prostate. The radioactive seeds are positioned inside the prostate to achieve optimal radiation dose coverage to the prostate. These radioactive seeds are positioned inside the prostate using Transrectal ultrasound imaging. Once all of the planned seeds have been implanted, two dimensional transaxial transrectal ultrasound images separated by 2 mm are obtained through out the prostate, beginning at the base of the prostate up to and including the apex. A common deep neural network, called DetectNet was trained to automatically determine the position of the implanted radioactive seeds within the prostate under ultrasound imaging. The results of the training using 950 training ultrasound images and 90 validation ultrasound images. The commonly used metrics for successful training were used to evaluate the efficacy and accuracy of the trained deep neural network and resulted in an loss_bbox (train) = 0.00, loss_coverage (train) = 1.89e-8, loss_bbox (validation) = 11.84, loss_coverage (validation) = 9.70, mAP (validation) = 66.87%, precision (validation) = 81.07%, and a recall (validation) = 82.29%, where train and validation refers to the training image set and validation refers to the validation training set. On the hardware platform used, the training expended 12.8 seconds per epoch. The network was trained for over 10,000 epochs. In addition, the seed locations as determined by the Deep Neural Network were compared to the seed locations as determined by a commercial software based on a one to three months after implant CT. The Deep Learning approach was within \strikeout off\uuline off\uwave off2.29\uuline default\uwave default mm of the seed locations determined by the commercial software. The Deep Learning approach to the determination of radioactive seed locations is robust, accurate, and fast and well within spatial agreement with the gold standard of CT determined seed coordinates.Keywords: prostate, deep neural network, seed implant, ultrasound
Procedia PDF Downloads 198143 Phase Composition Analysis of Ternary Alloy Materials for Gas Turbine Applications
Authors: Mayandi Ramanathan
Abstract:
Gas turbine blades see the most aggressive thermal stress conditions within the engine, due to high Turbine Entry Temperatures in the range of 1500 to 1600°C. The blades rotate at very high rotation rates and remove a significant amount of thermal power from the gas stream. At high temperatures, the major component failure mechanism is a creep. During its service over time under high thermal loads, the blade will deform, lengthen and rupture. High strength and stiffness in the longitudinal direction up to elevated service temperatures are certainly the most needed properties of turbine blades and gas turbine components. The proposed advanced Ti alloy material needs a process that provides a strategic orientation of metallic ordering, uniformity in composition and high metallic strength. The chemical composition of the proposed Ti alloy material (25% Ta/(Al+Ta) ratio), unlike Ti-47Al-2Cr-2Nb, has less excess Al that could limit the service life of turbine blades. Properties and performance of Ti-47Al-2Cr-2Nb and Ti-6Al-4V materials will be compared with that of the proposed Ti alloy material to generalize the performance metrics of various gas turbine components. This paper will involve the summary of the effects of additive manufacturing and heat treatment process conditions on the changes in the phase composition, grain structure, lattice structure of the material, tensile strength, creep strain rate, thermal expansion coefficient and fracture toughness at different temperatures. Based on these results, additive manufacturing and heat treatment process conditions will be optimized to fabricate turbine blade with Ti-43Al matrix alloyed with an optimized amount of refractory Ta metal. Improvement in service temperature of the turbine blades and corrosion resistance dependence on the coercivity of the alloy material will be reported. A correlation of phase composition and creep strain rate will also be discussed.Keywords: high temperature materials, aerospace, specific strength, creep strain, phase composition
Procedia PDF Downloads 115142 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach
Authors: James Ladzekpo
Abstract:
Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.Keywords: diabetes, machine learning, prediction, biomarkers
Procedia PDF Downloads 55141 A Comparative Study of the Proposed Models for the Components of the National Health Information System
Authors: M. Ahmadi, Sh. Damanabi, F. Sadoughi
Abstract:
National Health Information System plays an important role in ensuring timely and reliable access to Health information which is essential for strategic and operational decisions that improve health, quality and effectiveness of health care. In other words, by using the National Health information system you can improve the quality of health data, information and knowledge used to support decision making at all levels and areas of the health sector. Since full identification of the components of this system for better planning and management influential factors of performance seems necessary, therefore, in this study, different attitudes towards components of this system are explored comparatively. Methods: This is a descriptive and comparative kind of study. The society includes printed and electronic documents containing components of the national health information system in three parts: input, process, and output. In this context, search for information using library resources and internet search were conducted and data analysis was expressed using comparative tables and qualitative data. Results: The findings showed that there are three different perspectives presenting the components of national health information system, Lippeveld, Sauerborn, and Bodart Model in 2000, Health Metrics Network (HMN) model from World Health Organization in 2008 and Gattini’s 2009 model. All three models outlined above in the input (resources and structure) require components of management and leadership, planning and design programs, supply of staff, software and hardware facilities, and equipment. In addition, in the ‘process’ section from three models, we pointed up the actions ensuring the quality of health information system and in output section, except Lippeveld Model, two other models consider information products, usage and distribution of information as components of the national health information system. Conclusion: The results showed that all the three models have had a brief discussion about the components of health information in input section. However, Lippeveld model has overlooked the components of national health information in process and output sections. Therefore, it seems that the health measurement model of network has a comprehensive presentation for the components of health system in all three sections-input, process, and output.Keywords: National Health Information System, components of the NHIS, Lippeveld Model
Procedia PDF Downloads 420140 Analysis of Biomarkers Intractable Epileptogenic Brain Networks with Independent Component Analysis and Deep Learning Algorithms: A Comprehensive Framework for Scalable Seizure Prediction with Unimodal Neuroimaging Data in Pediatric Patients
Authors: Bliss Singhal
Abstract:
Epilepsy is a prevalent neurological disorder affecting approximately 50 million individuals worldwide and 1.2 million Americans. There exist millions of pediatric patients with intractable epilepsy, a condition in which seizures fail to come under control. The occurrence of seizures can result in physical injury, disorientation, unconsciousness, and additional symptoms that could impede children's ability to participate in everyday tasks. Predicting seizures can help parents and healthcare providers take precautions, prevent risky situations, and mentally prepare children to minimize anxiety and nervousness associated with the uncertainty of a seizure. This research proposes a comprehensive framework to predict seizures in pediatric patients by evaluating machine learning algorithms on unimodal neuroimaging data consisting of electroencephalogram signals. The bandpass filtering and independent component analysis proved to be effective in reducing the noise and artifacts from the dataset. Various machine learning algorithms’ performance is evaluated on important metrics such as accuracy, precision, specificity, sensitivity, F1 score and MCC. The results show that the deep learning algorithms are more successful in predicting seizures than logistic Regression, and k nearest neighbors. The recurrent neural network (RNN) gave the highest precision and F1 Score, long short-term memory (LSTM) outperformed RNN in accuracy and convolutional neural network (CNN) resulted in the highest Specificity. This research has significant implications for healthcare providers in proactively managing seizure occurrence in pediatric patients, potentially transforming clinical practices, and improving pediatric care.Keywords: intractable epilepsy, seizure, deep learning, prediction, electroencephalogram channels
Procedia PDF Downloads 84139 Personalization of Context Information Retrieval Model via User Search Behaviours for Ranking Document Relevance
Authors: Kehinde Agbele, Longe Olumide, Daniel Ekong, Dele Seluwa, Akintoye Onamade
Abstract:
One major problem of most existing information retrieval systems (IRS) is that they provide even access and retrieval results to individual users specially based on the query terms user issued to the system. When using IRS, users often present search queries made of ad-hoc keywords. It is then up to IRS to obtain a precise representation of user’s information need, and the context of the information. In effect, the volume and range of the Internet documents is growing exponentially and consequently causes difficulties for a user to obtain information that precisely matches the user interest. Diverse combination techniques are used to achieve the specific goal. This is due, firstly, to the fact that users often do not present queries to IRS that optimally represent the information they want, and secondly, the measure of a document's relevance is highly subjective between diverse users. In this paper, we address the problem by investigating the optimization of IRS to individual information needs in order of relevance. The paper addressed the development of algorithms that optimize the ranking of documents retrieved from IRS. This paper addresses this problem with a two-fold approach in order to retrieve domain-specific documents. Firstly, the design of context of information. The context of a query determines retrieved information relevance using personalization and context-awareness. Thus, executing the same query in diverse contexts often leads to diverse result rankings based on the user preferences. Secondly, the relevant context aspects should be incorporated in a way that supports the knowledge domain representing users’ interests. In this paper, the use of evolutionary algorithms is incorporated to improve the effectiveness of IRS. A context-based information retrieval system that learns individual needs from user-provided relevance feedback is developed whose retrieval effectiveness is evaluated using precision and recall metrics. The results demonstrate how to use attributes from user interaction behavior to improve the IR effectiveness.Keywords: context, document relevance, information retrieval, personalization, user search behaviors
Procedia PDF Downloads 463138 Hypertension and Obesity: A Cross-National Comparison of BMI and Waist-Height Ratio
Authors: Adam M. Yates, Julie E. Byles
Abstract:
Hypertension has been identified as a prominent co-morbidity of obesity. To improve clinical intervention of hypertension, it is critical to identify metrics that most accurately reflect risk for increased morbidity. Two of the most relevant and accurate measures for increased risk of hypertension due to excess adipose tissue are Body Mass Index (BMI) and Waist-Height Ratio (WHtR). Previous research has examined these measures in cross-national and cross-ethnic studies, but has most often relied on secondary means such as meta-analysis to identify and evaluate the efficacy of individual body mass measures. In this study, we instead use cross-sectional analysis to assess the cross-ethnic discriminative power of BMI and WHtR to predict risk of hypertension. Using the WHO SAGE survey, which collected anthropometric and biometric data from respondents in six middle-income countries (China, Ghana, India, Mexico, Russia, South Africa), we implement logistic regression to examine the discriminative power of measured BMI and WHtR with a known population of hypertensive and non-hypertensive respondents. We control for gender and age to identify whether optimum cut-off points that are adequately sensitive as tests for risk of hypertension may be different between groups. We report results for OR, RR, and ROC curves for each of the six SAGE countries. As seen in existing literature, results demonstrate that both WHtR and BMI are significant predictors of hypertension (p < .01). For these six countries, we find that cut-off points for WHtR may be dependent upon gender, age and ethnicity. While an optimum omnibus cut-point for WHtR may be 0.55, results also suggest that the gender and age relationship with WHtR may warrant the development of individual cut-offs to optimize health outcomes. Trends through multiple countries show that the optimum cut-point for WHtR increases with age while the area under the curve (AUROC) decreases for both men and women. Comparison between BMI and WHtR indicate that BMI may remain more robust than WHtR. Implications for public health policy are discussed.Keywords: hypertension, obesity, Waist-Height ratio, SAGE
Procedia PDF Downloads 478137 Leveraging Automated and Connected Vehicles with Deep Learning for Smart Transportation Network Optimization
Authors: Taha Benarbia
Abstract:
The advent of automated and connected vehicles has revolutionized the transportation industry, presenting new opportunities for enhancing the efficiency, safety, and sustainability of our transportation networks. This paper explores the integration of automated and connected vehicles into a smart transportation framework, leveraging the power of deep learning techniques to optimize the overall network performance. The first aspect addressed in this paper is the deployment of automated vehicles (AVs) within the transportation system. AVs offer numerous advantages, such as reduced congestion, improved fuel efficiency, and increased safety through advanced sensing and decisionmaking capabilities. The paper delves into the technical aspects of AVs, including their perception, planning, and control systems, highlighting the role of deep learning algorithms in enabling intelligent and reliable AV operations. Furthermore, the paper investigates the potential of connected vehicles (CVs) in creating a seamless communication network between vehicles, infrastructure, and traffic management systems. By harnessing real-time data exchange, CVs enable proactive traffic management, adaptive signal control, and effective route planning. Deep learning techniques play a pivotal role in extracting meaningful insights from the vast amount of data generated by CVs, empowering transportation authorities to make informed decisions for optimizing network performance. The integration of deep learning with automated and connected vehicles paves the way for advanced transportation network optimization. Deep learning algorithms can analyze complex transportation data, including traffic patterns, demand forecasting, and dynamic congestion scenarios, to optimize routing, reduce travel times, and enhance overall system efficiency. The paper presents case studies and simulations demonstrating the effectiveness of deep learning-based approaches in achieving significant improvements in network performance metricsKeywords: automated vehicles, connected vehicles, deep learning, smart transportation network
Procedia PDF Downloads 78136 Feature Engineering Based Detection of Buffer Overflow Vulnerability in Source Code Using Deep Neural Networks
Authors: Mst Shapna Akter, Hossain Shahriar
Abstract:
One of the most important challenges in the field of software code audit is the presence of vulnerabilities in software source code. Every year, more and more software flaws are found, either internally in proprietary code or revealed publicly. These flaws are highly likely exploited and lead to system compromise, data leakage, or denial of service. C and C++ open-source code are now available in order to create a largescale, machine-learning system for function-level vulnerability identification. We assembled a sizable dataset of millions of opensource functions that point to potential exploits. We developed an efficient and scalable vulnerability detection method based on deep neural network models that learn features extracted from the source codes. The source code is first converted into a minimal intermediate representation to remove the pointless components and shorten the dependency. Moreover, we keep the semantic and syntactic information using state-of-the-art word embedding algorithms such as glove and fastText. The embedded vectors are subsequently fed into deep learning networks such as LSTM, BilSTM, LSTM-Autoencoder, word2vec, BERT, and GPT-2 to classify the possible vulnerabilities. Furthermore, we proposed a neural network model which can overcome issues associated with traditional neural networks. Evaluation metrics such as f1 score, precision, recall, accuracy, and total execution time have been used to measure the performance. We made a comparative analysis between results derived from features containing a minimal text representation and semantic and syntactic information. We found that all of the deep learning models provide comparatively higher accuracy when we use semantic and syntactic information as the features but require higher execution time as the word embedding the algorithm puts on a bit of complexity to the overall system.Keywords: cyber security, vulnerability detection, neural networks, feature extraction
Procedia PDF Downloads 89135 Optimizing Wind Turbine Blade Geometry for Enhanced Performance and Durability: A Computational Approach
Authors: Nwachukwu Ifeanyi
Abstract:
Wind energy is a vital component of the global renewable energy portfolio, with wind turbines serving as the primary means of harnessing this abundant resource. However, the efficiency and stability of wind turbines remain critical challenges in maximizing energy output and ensuring long-term operational viability. This study proposes a comprehensive approach utilizing computational aerodynamics and aeromechanics to optimize wind turbine performance across multiple objectives. The proposed research aims to integrate advanced computational fluid dynamics (CFD) simulations with structural analysis techniques to enhance the aerodynamic efficiency and mechanical stability of wind turbine blades. By leveraging multi-objective optimization algorithms, the study seeks to simultaneously optimize aerodynamic performance metrics such as lift-to-drag ratio and power coefficient while ensuring structural integrity and minimizing fatigue loads on the turbine components. Furthermore, the investigation will explore the influence of various design parameters, including blade geometry, airfoil profiles, and turbine operating conditions, on the overall performance and stability of wind turbines. Through detailed parametric studies and sensitivity analyses, valuable insights into the complex interplay between aerodynamics and structural dynamics will be gained, facilitating the development of next-generation wind turbine designs. Ultimately, this research endeavours to contribute to the advancement of sustainable energy technologies by providing innovative solutions to enhance the efficiency, reliability, and economic viability of wind power generation systems. The findings have the potential to inform the design and optimization of wind turbines, leading to increased energy output, reduced maintenance costs, and greater environmental benefits in the transition towards a cleaner and more sustainable energy future.Keywords: computation, robotics, mathematics, simulation
Procedia PDF Downloads 58134 Dynamic Web-Based 2D Medical Image Visualization and Processing Software
Authors: Abdelhalim. N. Mohammed, Mohammed. Y. Esmail
Abstract:
In the course of recent decades, medical imaging has been dominated by the use of costly film media for review and archival of medical investigation, however due to developments in networks technologies and common acceptance of a standard digital imaging and communication in medicine (DICOM) another approach in light of World Wide Web was produced. Web technologies successfully used in telemedicine applications, the combination of web technologies together with DICOM used to design a web-based and open source DICOM viewer. The Web server allowance to inquiry and recovery of images and the images viewed/manipulated inside a Web browser without need for any preinstalling software. The dynamic site page for medical images visualization and processing created by using JavaScript and HTML5 advancements. The XAMPP ‘apache server’ is used to create a local web server for testing and deployment of the dynamic site. The web-based viewer connected to multiples devices through local area network (LAN) to distribute the images inside healthcare facilities. The system offers a few focal points over ordinary picture archiving and communication systems (PACS): easy to introduce, maintain and independently platforms that allow images to display and manipulated efficiently, the system also user-friendly and easy to integrate with an existing system that have already been making use of web technologies. The wavelet-based image compression technique on which 2-D discrete wavelet transform used to decompose the image then wavelet coefficients are transmitted by entropy encoding after threshold to decrease transmission time, stockpiling cost and capacity. The performance of compression was estimated by using images quality metrics such as mean square error ‘MSE’, peak signal to noise ratio ‘PSNR’ and compression ratio ‘CR’ that achieved (83.86%) when ‘coif3’ wavelet filter is used.Keywords: DICOM, discrete wavelet transform, PACS, HIS, LAN
Procedia PDF Downloads 160133 A Review of Benefit-Risk Assessment over the Product Lifecycle
Authors: M. Miljkovic, A. Urakpo, M. Simic-Koumoutsaris
Abstract:
Benefit-risk assessment (BRA) is a valuable tool that takes place in multiple stages during a medicine's lifecycle, and this assessment can be conducted in a variety of ways. The aim was to summarize current BRA methods used during approval decisions and in post-approval settings and to see possible future directions. Relevant reviews, recommendations, and guidelines published in medical literature and through regulatory agencies over the past five years have been examined. BRA implies the review of two dimensions: the dimension of benefits (determined mainly by the therapeutic efficacy) and the dimension of risks (comprises the safety profile of a drug). Regulators, industry, and academia have developed various approaches, ranging from descriptive textual (qualitative) to decision-analytic (quantitative) models, to facilitate the BRA of medicines during the product lifecycle (from Phase I trials, to authorization procedure, post-marketing surveillance and health technology assessment for inclusion in public formularies). These approaches can be classified into the following categories: stepwise structured approaches (frameworks); measures for benefits and risks that are usually endpoint specific (metrics), simulation techniques and meta-analysis (estimation techniques), and utility survey techniques to elicit stakeholders’ preferences (utilities). All these approaches share the following two common goals: to assist this analysis and to improve the communication of decisions, but each is subject to its own specific strengths and limitations. Before using any method, its utility, complexity, the extent to which it is established, and the ease of results interpretation should be considered. Despite widespread and long-time use, BRA is subject to debate, suffers from a number of limitations, and currently is still under development. The use of formal, systematic structured approaches to BRA for regulatory decision-making and quantitative methods to support BRA during the product lifecycle is a standard practice in medicine that is subject to continuous improvement and modernization, not only in methodology but also in cooperation between organizations.Keywords: benefit-risk assessment, benefit-risk profile, product lifecycle, quantitative methods, structured approaches
Procedia PDF Downloads 154132 Threat Modeling Methodology for Supporting Industrial Control Systems Device Manufacturers and System Integrators
Authors: Raluca Ana Maria Viziteu, Anna Prudnikova
Abstract:
Industrial control systems (ICS) have received much attention in recent years due to the convergence of information technology (IT) and operational technology (OT) that has increased the interdependence of safety and security issues to be considered. These issues require ICS-tailored solutions. That led to the need to creation of a methodology for supporting ICS device manufacturers and system integrators in carrying out threat modeling of embedded ICS devices in a way that guarantees the quality of the identified threats and minimizes subjectivity in the threat identification process. To research, the possibility of creating such a methodology, a set of existing standards, regulations, papers, and publications related to threat modeling in the ICS sector and other sectors was reviewed to identify various existing methodologies and methods used in threat modeling. Furthermore, the most popular ones were tested in an exploratory phase on a specific PLC device. The outcome of this exploratory phase has been used as a basis for defining specific characteristics of ICS embedded devices and their deployment scenarios, identifying the factors that introduce subjectivity in the threat modeling process of such devices, and defining metrics for evaluating the minimum quality requirements of identified threats associated to the deployment of the devices in existing infrastructures. Furthermore, the threat modeling methodology was created based on the previous steps' results. The usability of the methodology was evaluated through a set of standardized threat modeling requirements and a standardized comparison method for threat modeling methodologies. The outcomes of these verification methods confirm that the methodology is effective. The full paper includes the outcome of research on different threat modeling methodologies that can be used in OT, their comparison, and the results of implementing each of them in practice on a PLC device. This research is further used to build a threat modeling methodology tailored to OT environments; a detailed description is included. Moreover, the paper includes results of the evaluation of created methodology based on a set of parameters specifically created to rate threat modeling methodologies.Keywords: device manufacturers, embedded devices, industrial control systems, threat modeling
Procedia PDF Downloads 80131 Improving Cell Type Identification of Single Cell Data by Iterative Graph-Based Noise Filtering
Authors: Annika Stechemesser, Rachel Pounds, Emma Lucas, Chris Dawson, Julia Lipecki, Pavle Vrljicak, Jan Brosens, Sean Kehoe, Jason Yap, Lawrence Young, Sascha Ott
Abstract:
Advances in technology make it now possible to retrieve the genetic information of thousands of single cancerous cells. One of the key challenges in single cell analysis of cancerous tissue is to determine the number of different cell types and their characteristic genes within the sample to better understand the tumors and their reaction to different treatments. For this analysis to be possible, it is crucial to filter out background noise as it can severely blur the downstream analysis and give misleading results. In-depth analysis of the state-of-the-art filtering methods for single cell data showed that they do, in some cases, not separate noisy and normal cells sufficiently. We introduced an algorithm that filters and clusters single cell data simultaneously without relying on certain genes or thresholds chosen by eye. It detects communities in a Shared Nearest Neighbor similarity network, which captures the similarities and dissimilarities of the cells by optimizing the modularity and then identifies and removes vertices with a weak clustering belonging. This strategy is based on the fact that noisy data instances are very likely to be similar to true cell types but do not match any of these wells. Once the clustering is complete, we apply a set of evaluation metrics on the cluster level and accept or reject clusters based on the outcome. The performance of our algorithm was tested on three datasets and led to convincing results. We were able to replicate the results on a Peripheral Blood Mononuclear Cells dataset. Furthermore, we applied the algorithm to two samples of ovarian cancer from the same patient before and after chemotherapy. Comparing the standard approach to our algorithm, we found a hidden cell type in the ovarian postchemotherapy data with interesting marker genes that are potentially relevant for medical research.Keywords: cancer research, graph theory, machine learning, single cell analysis
Procedia PDF Downloads 112