Search results for: categorical datasets
448 Dynamic Log Parsing and Intelligent Anomaly Detection Method Combining Retrieval Augmented Generation and Prompt Engineering
Authors: Liu Linxin
Abstract:
As system complexity increases, log parsing and anomaly detection become more and more important in ensuring system stability. However, traditional methods often face the problems of insufficient adaptability and decreasing accuracy when dealing with rapidly changing log contents and unknown domains. To this end, this paper proposes an approach LogRAG, which combines RAG (Retrieval Augmented Generation) technology with Prompt Engineering for Large Language Models, applied to log analysis tasks to achieve dynamic parsing of logs and intelligent anomaly detection. By combining real-time information retrieval and prompt optimisation, this study significantly improves the adaptive capability of log analysis and the interpretability of results. Experimental results show that the method performs well on several public datasets, especially in the absence of training data, and significantly outperforms traditional methods. This paper provides a technical path for log parsing and anomaly detection, demonstrating significant theoretical value and application potential.Keywords: log parsing, anomaly detection, retrieval-augmented generation, prompt engineering, LLMs
Procedia PDF Downloads 29447 Prognosis of Patients with COVID-19 and Hematologic Malignancies
Authors: Elizabeth Behrens, Anne Timmermann, Alexander Yerkan, Joshua Thomas, Deborah Katz, Agne Paner, Melissa Larson, Shivi Jain, Seo-Hyun Kim, Celalettin Ustun, Ankur Varma, Parameswaran Venugopal, Jamile Shammo
Abstract:
Coronavirus Disease-2019 (COVID-19) causes persistent concern for poor outcomes in vulnerable populations. Patients with hematologic malignancies (HM) have been found to have higher COVID-19 case fatality rates compared to those without malignancy. While cytopenias are common in patients with HM, especially in those undergoing chemotherapy treatment, hemoglobin (Hgb) and platelet count have not yet been studied, to our best knowledge, as potential prognostic indicators for patients with HM and COVID-19. The goal of this study is to identify factors that may increase the risk of mortality in patients with HM and COVID-19. In this single-center, retrospective, observational study, 65 patients with HM and laboratory confirmed COVID-19 were identified between March 2020 and January 2021. Information on demographics, laboratory data the day of COVID-19 diagnosis, and prognosis was extracted from the electronic medical record (EMR), chart reviewed, and analyzed using the statistical software SAS version 9.4. Chi-square testing was used for categorical variable analyses. Risk factors associated with mortality were established by logistic regression models. Non-Hodgkin lymphoma (37%), chronic lymphocytic leukemia (20%), and plasma cell dyscrasia (15%) were the most common HM. Higher Hgb level upon COVID-19 diagnosis was related to decreased mortality, odd ratio=0.704 (95% confidence interval [CI]: 0.511-0.969; P = .0263). Platelet count the day of COVID-19 diagnosis was lower in patients who ultimately died (mean 127 ± 72K/uL, n=10) compared to patients who survived (mean 197 ±92K/uL, n=55) (P=.0258). Female sex was related to decreased mortality, odd ratio=0.143 (95% confidence interval [CI]: 0.026-0.785; P = .0353). There was no mortality difference between the patients who were on treatment for HM the day of COVID-19 diagnosis compared to those who were not (P=1.000). Lower Hgb and male sex are independent risk factors associated with increased mortality of HM patients with COVID-19. Clinicians should be especially attentive to patients with HM and COVID-19 who present with cytopenias. Larger multi-center studies are urgently needed to further investigate the impact of anemia, thrombocytopenia, and demographics on outcomes of patients with hematologic malignancies diagnosed with COVID-19.Keywords: anemia, COVID-19, hematologic malignancy, prognosis
Procedia PDF Downloads 149446 Private and Public Health Sector Difference on Client Satisfaction: Results from Secondary Data Analysis in Sindh, Pakistan
Authors: Wajiha Javed, Arsalan Jabbar, Nelofer Mehboob, Muhammad Tafseer, Zahid Memon
Abstract:
Introduction: Researchers globally have strived to explore diverse factors that augment the continuation and uptake of family planning methods. Clients’ satisfaction is one of the core determinants facilitating continuation of family planning methods. There is a major debate yet scanty evidence to contrast public and private sectors with respect to client satisfaction. The objective of this study is to compare quality-of-care provided by public and private sectors of Pakistan through a client satisfaction lens. Methods: We used Pakistan Demographic Heath Survey 2012-13 dataset (Sindh province) on a total of 3133 Married Women of Reproductive Age (MWRA) aged 15-49 years. Source of family planning (public/private sector) was the main exposure variable. Outcome variable was client satisfaction judged by ten different dimensions of client satisfaction. Means and standard deviations were calculated for continuous variable while for categorical variable frequencies and percentages were computed. For univariate analysis, Chi-square/Fisher Exact test was used to find an association between clients’ satisfaction in public and private sectors. Ten different multivariate models were made. Variables were checked for multi-collinearity, confounding, and interaction, and then advanced logistic regression was used to explore the relationship between client satisfaction and dependent outcome after adjusting for all known confounding factors and results are presented as OR and AOR (95% CI). Results: Multivariate analyses showed that clients were less satisfied in contraceptive provision from private sector as compared to public sector (AOR 0.92,95% CI 0.63-1.68) even though the result was not statistically significant. Clients were more satisfied from private sector as compared to the public sector with respect to other determinants of quality-of-care (follow-up care (AOR 3.29, 95% CI 1.95-5.55), infection prevention (AOR 2.41, 95% CI 1.60-3.62), counseling services (AOR 2.01, 95% CI 1.27-3.18, timely treatment (AOR 3.37, 95% CI 2.20-5.15), attitude of staff (AOR 2.23, 95% CI 1.50-3.33), punctuality of staff (AOR 2.28, 95% CI 1.92-4.13), timely referring (AOR 2.34, 95% CI 1.63-3.35), staff cooperation (AOR 1.75, 95% CI 1.22-2.51) and complications handling (AOR 2.27, 95% CI 1.56-3.29).Keywords: client satisfaction, family planning, public private partnership, quality of care
Procedia PDF Downloads 419445 Malaria Parasite Detection Using Deep Learning Methods
Authors: Kaustubh Chakradeo, Michael Delves, Sofya Titarenko
Abstract:
Malaria is a serious disease which affects hundreds of millions of people around the world, each year. If not treated in time, it can be fatal. Despite recent developments in malaria diagnostics, the microscopy method to detect malaria remains the most common. Unfortunately, the accuracy of microscopic diagnostics is dependent on the skill of the microscopist and limits the throughput of malaria diagnosis. With the development of Artificial Intelligence tools and Deep Learning techniques in particular, it is possible to lower the cost, while achieving an overall higher accuracy. In this paper, we present a VGG-based model and compare it with previously developed models for identifying infected cells. Our model surpasses most previously developed models in a range of the accuracy metrics. The model has an advantage of being constructed from a relatively small number of layers. This reduces the computer resources and computational time. Moreover, we test our model on two types of datasets and argue that the currently developed deep-learning-based methods cannot efficiently distinguish between infected and contaminated cells. A more precise study of suspicious regions is required.Keywords: convolution neural network, deep learning, malaria, thin blood smears
Procedia PDF Downloads 130444 A Generative Adversarial Framework for Bounding Confounded Causal Effects
Authors: Yaowei Hu, Yongkai Wu, Lu Zhang, Xintao Wu
Abstract:
Causal inference from observational data is receiving wide applications in many fields. However, unidentifiable situations, where causal effects cannot be uniquely computed from observational data, pose critical barriers to applying causal inference to complicated real applications. In this paper, we develop a bounding method for estimating the average causal effect (ACE) under unidentifiable situations due to hidden confounders. We propose to parameterize the unknown exogenous random variables and structural equations of a causal model using neural networks and implicit generative models. Then, with an adversarial learning framework, we search the parameter space to explicitly traverse causal models that agree with the given observational distribution and find those that minimize or maximize the ACE to obtain its lower and upper bounds. The proposed method does not make any assumption about the data generating process and the type of the variables. Experiments using both synthetic and real-world datasets show the effectiveness of the method.Keywords: average causal effect, hidden confounding, bound estimation, generative adversarial learning
Procedia PDF Downloads 191443 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression
Procedia PDF Downloads 137442 Artificial Neural Network-Based Short-Term Load Forecasting for Mymensingh Area of Bangladesh
Authors: S. M. Anowarul Haque, Md. Asiful Islam
Abstract:
Electrical load forecasting is considered to be one of the most indispensable parts of a modern-day electrical power system. To ensure a reliable and efficient supply of electric energy, special emphasis should have been put on the predictive feature of electricity supply. Artificial Neural Network-based approaches have emerged to be a significant area of interest for electric load forecasting research. This paper proposed an Artificial Neural Network model based on the particle swarm optimization algorithm for improved electric load forecasting for Mymensingh, Bangladesh. The forecasting model is developed and simulated on the MATLAB environment with a large number of training datasets. The model is trained based on eight input parameters including historical load and weather data. The predicted load data are then compared with an available dataset for validation. The proposed neural network model is proved to be more reliable in terms of day-wise load forecasting for Mymensingh, Bangladesh.Keywords: load forecasting, artificial neural network, particle swarm optimization
Procedia PDF Downloads 171441 A Single Cell Omics Experiments as Tool for Benchmarking Bioinformatics Oncology Data Analysis Tools
Authors: Maddalena Arigoni, Maria Luisa Ratto, Raffaele A. Calogero, Luca Alessandri
Abstract:
The presence of tumor heterogeneity, where distinct cancer cells exhibit diverse morphological and phenotypic profiles, including gene expression, metabolism, and proliferation, poses challenges for molecular prognostic markers and patient classification for targeted therapies. Understanding the causes and progression of cancer requires research efforts aimed at characterizing heterogeneity, which can be facilitated by evolving single-cell sequencing technologies. However, analyzing single-cell data necessitates computational methods that often lack objective validation. Therefore, the establishment of benchmarking datasets is necessary to provide a controlled environment for validating bioinformatics tools in the field of single-cell oncology. Benchmarking bioinformatics tools for single-cell experiments can be costly due to the high expense involved. Therefore, datasets used for benchmarking are typically sourced from publicly available experiments, which often lack a comprehensive cell annotation. This limitation can affect the accuracy and effectiveness of such experiments as benchmarking tools. To address this issue, we introduce omics benchmark experiments designed to evaluate bioinformatics tools to depict the heterogeneity in single-cell tumor experiments. We conducted single-cell RNA sequencing on six lung cancer tumor cell lines that display resistant clones upon treatment of EGFR mutated tumors and are characterized by driver genes, namely ROS1, ALK, HER2, MET, KRAS, and BRAF. These driver genes are associated with downstream networks controlled by EGFR mutations, such as JAK-STAT, PI3K-AKT-mTOR, and MEK-ERK. The experiment also featured an EGFR-mutated cell line. Using 10XGenomics platform with cellplex technology, we analyzed the seven cell lines together with a pseudo-immunological microenvironment consisting of PBMC cells labeled with the Biolegend TotalSeq™-B Human Universal Cocktail (CITEseq). This technology allowed for independent labeling of each cell line and single-cell analysis of the pooled seven cell lines and the pseudo-microenvironment. The data generated from the aforementioned experiments are available as part of an online tool, which allows users to define cell heterogeneity and generates count tables as an output. The tool provides the cell line derivation for each cell and cell annotations for the pseudo-microenvironment based on CITEseq data by an experienced immunologist. Additionally, we created a range of pseudo-tumor tissues using different ratios of the aforementioned cells embedded in matrigel. These tissues were analyzed using 10XGenomics (FFPE samples) and Curio Bioscience (fresh frozen samples) platforms for spatial transcriptomics, further expanding the scope of our benchmark experiments. The benchmark experiments we conducted provide a unique opportunity to evaluate the performance of bioinformatics tools for detecting and characterizing tumor heterogeneity at the single-cell level. Overall, our experiments provide a controlled and standardized environment for assessing the accuracy and robustness of bioinformatics tools for studying tumor heterogeneity at the single-cell level, which can ultimately lead to more precise and effective cancer diagnosis and treatment.Keywords: single cell omics, benchmark, spatial transcriptomics, CITEseq
Procedia PDF Downloads 117440 The Impact on the Composition of Survey Refusals΄ Demographic Profile When Implementing Different Classifications
Authors: Eva Tsouparopoulou, Maria Symeonaki
Abstract:
The internationally documented declining survey response rates of the last two decades are mainly attributed to refusals. In fieldwork, a refusal may be obtained not only from the respondent himself/herself, but from other sources on the respondent’s behalf, such as other household members, apartment building residents or administrator(s), and neighborhood residents. In this paper, we investigate how the composition of the demographic profile of survey refusals changes when different classifications are implemented and the classification issues arising from that. The analysis is based on the 2002-2018 European Social Survey (ESS) datasets for Belgium, Germany, and United Kingdom. For these three countries, the size of selected sample units coded as a type of refusal for all nine under investigation rounds was large enough to meet the purposes of the analysis. The results indicate the existence of four different possible classifications that can be implemented and the significance of choosing the one that strengthens the contrasts of the different types of respondents' demographic profiles. Since the foundation of social quantitative research lies in the triptych of definition, classification, and measurement, this study aims to identify the multiplicity of the definition of survey refusals as a methodological tool for the continually growing research on non-response.Keywords: non-response, refusals, European social survey, classification
Procedia PDF Downloads 85439 Bag of Words Representation Based on Fusing Two Color Local Descriptors and Building Multiple Dictionaries
Authors: Fatma Abdedayem
Abstract:
We propose an extension to the famous method called Bag of words (BOW) which proved a successful role in the field of image categorization. Practically, this method based on representing image with visual words. In this work, firstly, we extract features from images using Spatial Pyramid Representation (SPR) and two dissimilar color descriptors which are opponent-SIFT and transformed-color-SIFT. Secondly, we fuse color local features by joining the two histograms coming from these descriptors. Thirdly, after collecting of all features, we generate multi-dictionaries coming from n random feature subsets that obtained by dividing all features into n random groups. Then, by using these dictionaries separately each image can be represented by n histograms which are lately concatenated horizontally and form the final histogram, that allows to combine Multiple Dictionaries (MDBoW). In the final step, in order to classify image we have applied Support Vector Machine (SVM) on the generated histograms. Experimentally, we have used two dissimilar image datasets in order to test our proposition: Caltech 256 and PASCAL VOC 2007.Keywords: bag of words (BOW), color descriptors, multi-dictionaries, MDBoW
Procedia PDF Downloads 297438 Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent
Authors: Zhifeng Kong
Abstract:
Over-parameterized neural networks have attracted a great deal of attention in recent deep learning theory research, as they challenge the classic perspective of over-fitting when the model has excessive parameters and have gained empirical success in various settings. While a number of theoretical works have been presented to demystify properties of such models, the convergence properties of such models are still far from being thoroughly understood. In this work, we study the convergence properties of training two-hidden-layer partially over-parameterized fully connected networks with the Rectified Linear Unit activation via gradient descent. To our knowledge, this is the first theoretical work to understand convergence properties of deep over-parameterized networks without the equally-wide-hidden-layer assumption and other unrealistic assumptions. We provide a probabilistic lower bound of the widths of hidden layers and proved linear convergence rate of gradient descent. We also conducted experiments on synthetic and real-world datasets to validate our theory.Keywords: over-parameterization, rectified linear units ReLU, convergence, gradient descent, neural networks
Procedia PDF Downloads 142437 Development of Fake News Model Using Machine Learning through Natural Language Processing
Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini
Abstract:
Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.Keywords: fake news detection, natural language processing, machine learning, classification techniques.
Procedia PDF Downloads 167436 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders
Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh
Abstract:
Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches
Procedia PDF Downloads 74435 Product Features Extraction from Opinions According to Time
Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou
Abstract:
Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.Keywords: opinion mining, product feature extraction, sentiment analysis, SentiWordNet
Procedia PDF Downloads 410434 A Dynamic Neural Network Model for Accurate Detection of Masked Faces
Authors: Oladapo Tolulope Ibitoye
Abstract:
Neural networks have become prominent and widely engaged in algorithmic-based machine learning networks. They are perfect in solving day-to-day issues to a certain extent. Neural networks are computing systems with several interconnected nodes. One of the numerous areas of application of neural networks is object detection. This is a prominent area due to the coronavirus disease pandemic and the post-pandemic phases. Wearing a face mask in public slows the spread of the virus, according to experts’ submission. This calls for the development of a reliable and effective model for detecting face masks on people's faces during compliance checks. The existing neural network models for facemask detection are characterized by their black-box nature and large dataset requirement. The highlighted challenges have compromised the performance of the existing models. The proposed model utilized Faster R-CNN Model on Inception V3 backbone to reduce system complexity and dataset requirement. The model was trained and validated with very few datasets and evaluation results shows an overall accuracy of 96% regardless of skin tone.Keywords: convolutional neural network, face detection, face mask, masked faces
Procedia PDF Downloads 68433 Artificial Reproduction System and Imbalanced Dataset: A Mendelian Classification
Authors: Anita Kushwaha
Abstract:
We propose a new evolutionary computational model called Artificial Reproduction System which is based on the complex process of meiotic reproduction occurring between male and female cells of the living organisms. Artificial Reproduction System is an attempt towards a new computational intelligence approach inspired by the theoretical reproduction mechanism, observed reproduction functions, principles and mechanisms. A reproductive organism is programmed by genes and can be viewed as an automaton, mapping and reducing so as to create copies of those genes in its off springs. In Artificial Reproduction System, the binding mechanism between male and female cells is studied, parameters are chosen and a network is constructed also a feedback system for self regularization is established. The model then applies Mendel’s law of inheritance, allele-allele associations and can be used to perform data analysis of imbalanced data, multivariate, multiclass and big data. In the experimental study Artificial Reproduction System is compared with other state of the art classifiers like SVM, Radial Basis Function, neural networks, K-Nearest Neighbor for some benchmark datasets and comparison results indicates a good performance.Keywords: bio-inspired computation, nature- inspired computation, natural computing, data mining
Procedia PDF Downloads 272432 Investigations of Protein Aggregation Using Sequence and Structure Based Features
Authors: M. Michael Gromiha, A. Mary Thangakani, Sandeep Kumar, D. Velmurugan
Abstract:
The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson, and spongiform encephalopathies is formation of amyloid fibrils and plaques in proteins. We have analyzed different sets of proteins and peptides to understand the influence of sequence-based features on protein aggregation process. The comparison of 373 pairs of homologous mesophilic and thermophilic proteins showed that aggregation-prone regions (APRs) are present in both. But, the thermophilic protein monomers show greater ability to ‘stow away’ the APRs in their hydrophobic cores and protect them from solvent exposure. The comparison of amyloid forming and amorphous b-aggregating hexapeptides suggested distinct preferences for specific residues at the six positions as well as all possible combinations of nine residue pairs. The compositions of residues at different positions and residue pairs have been converted into energy potentials and utilized for distinguishing between amyloid forming and amorphous b-aggregating peptides. Our method could correctly identify the amyloid forming peptides at an accuracy of 95-100% in different datasets of peptides.Keywords: aggregation, amyloids, thermophilic proteins, amino acid residues, machine learning techniques
Procedia PDF Downloads 614431 Analysis of Diabetes Patients Using Pearson, Cost Optimization, Control Chart Methods
Authors: Devatha Kalyan Kumar, R. Poovarasan
Abstract:
In this paper, we have taken certain important factors and health parameters of diabetes patients especially among children by birth (pediatric congenital) where using the above three metrics methods we are going to assess the importance of each attributes in the dataset and thereby determining the most highly responsible and co-related attribute causing diabetics among young patients. We use cost optimization, control chart and Spearmen methodologies for the real-time application of finding the data efficiency in this diabetes dataset. The Spearmen methodology is the correlation methodologies used in software development process to identify the complexity between the various modules of the software. Identifying the complexity is important because if the complexity is higher, then there is a higher chance of occurrence of the risk in the software. With the use of control; chart mean, variance and standard deviation of data are calculated. With the use of Cost optimization model, we find to optimize the variables. Hence we choose the Spearmen, control chart and cost optimization methods to assess the data efficiency in diabetes datasets.Keywords: correlation, congenital diabetics, linear relationship, monotonic function, ranking samples, pediatric
Procedia PDF Downloads 256430 Intrusion Detection System Using Linear Discriminant Analysis
Authors: Zyad Elkhadir, Khalid Chougdali, Mohammed Benattou
Abstract:
Most of the existing intrusion detection systems works on quantitative network traffic data with many irrelevant and redundant features, which makes detection process more time’s consuming and inaccurate. A several feature extraction methods, such as linear discriminant analysis (LDA), have been proposed. However, LDA suffers from the small sample size (SSS) problem which occurs when the number of the training samples is small compared with the samples dimension. Hence, classical LDA cannot be applied directly for high dimensional data such as network traffic data. In this paper, we propose two solutions to solve SSS problem for LDA and apply them to a network IDS. The first method, reduce the original dimension data using principal component analysis (PCA) and then apply LDA. In the second solution, we propose to use the pseudo inverse to avoid singularity of within-class scatter matrix due to SSS problem. After that, the KNN algorithm is used for classification process. We have chosen two known datasets KDDcup99 and NSLKDD for testing the proposed approaches. Results showed that the classification accuracy of (PCA+LDA) method outperforms clearly the pseudo inverse LDA method when we have large training data.Keywords: LDA, Pseudoinverse, PCA, IDS, NSL-KDD, KDDcup99
Procedia PDF Downloads 226429 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy
Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie
Abstract:
In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.Keywords: data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data
Procedia PDF Downloads 320428 Big Data Applications for Transportation Planning
Authors: Antonella Falanga, Armando Cartenì
Abstract:
"Big data" refers to extremely vast and complex sets of data, encompassing extraordinarily large and intricate datasets that require specific tools for meaningful analysis and processing. These datasets can stem from diverse origins like sensors, mobile devices, online transactions, social media platforms, and more. The utilization of big data is pivotal, offering the chance to leverage vast information for substantial advantages across diverse fields, thereby enhancing comprehension, decision-making, efficiency, and fostering innovation in various domains. Big data, distinguished by its remarkable attributes of enormous volume, high velocity, diverse variety, and significant value, represent a transformative force reshaping the industry worldwide. Their pervasive impact continues to unlock new possibilities, driving innovation and advancements in technology, decision-making processes, and societal progress in an increasingly data-centric world. The use of these technologies is becoming more widespread, facilitating and accelerating operations that were once much more complicated. In particular, big data impacts across multiple sectors such as business and commerce, healthcare and science, finance, education, geography, agriculture, media and entertainment and also mobility and logistics. Within the transportation sector, which is the focus of this study, big data applications encompass a wide variety, spanning across optimization in vehicle routing, real-time traffic management and monitoring, logistics efficiency, reduction of travel times and congestion, enhancement of the overall transportation systems, but also mitigation of pollutant emissions contributing to environmental sustainability. Meanwhile, in public administration and the development of smart cities, big data aids in improving public services, urban planning, and decision-making processes, leading to more efficient and sustainable urban environments. Access to vast data reservoirs enables deeper insights, revealing hidden patterns and facilitating more precise and timely decision-making. Additionally, advancements in cloud computing and artificial intelligence (AI) have further amplified the potential of big data, enabling more sophisticated and comprehensive analyses. Certainly, utilizing big data presents various advantages but also entails several challenges regarding data privacy and security, ensuring data quality, managing and storing large volumes of data effectively, integrating data from diverse sources, the need for specialized skills to interpret analysis results, ethical considerations in data use, and evaluating costs against benefits. Addressing these difficulties requires well-structured strategies and policies to balance the benefits of big data with privacy, security, and efficient data management concerns. Building upon these premises, the current research investigates the efficacy and influence of big data by conducting an overview of the primary and recent implementations of big data in transportation systems. Overall, this research allows us to conclude that big data better provide to enhance rational decision-making for mobility choices and is imperative for adeptly planning and allocating investments in transportation infrastructures and services.Keywords: big data, public transport, sustainable mobility, transport demand, transportation planning
Procedia PDF Downloads 60427 Utilizing Google Earth for Internet GIS
Authors: Alireza Derambakhsh
Abstract:
The objective of this examination is to explore the capability of utilizing Google Earth for Internet GIS applications. The study particularly analyzes the utilization of vector and characteristic information and the capability of showing and preparing this information in new ways utilizing the Google Earth stage. It has progressively been perceived that future improvements in GIS will fixate on Internet GIS, and in three noteworthy territories: GIS information access, spatial data scattering and GIS displaying/preparing. Google Earth is one of the group of geobrowsers that offer a free and simple to utilize administration that empower information with a spatial part to be overlain on top of a 3-D model of the Earth. This examination makes a methodological structure to accomplish its objective that comprises of three noteworthy parts: A database level, an application level and a customer level. As verification of idea a web model has been produced, which incorporates a differing scope of datasets and lets clients direst inquiries and make perceptions of this custom information. The outcomes uncovered that both vector and property information can be successfully spoken to and imagined utilizing Google Earth. In addition, the usefulness to question custom information and envision results has been added to the Google Earth stage.Keywords: Google earth, internet GIS, vector, characteristic information
Procedia PDF Downloads 308426 A Deep Learning-Based Pedestrian Trajectory Prediction Algorithm
Authors: Haozhe Xiang
Abstract:
With the rise of the Internet of Things era, intelligent products are gradually integrating into people's lives. Pedestrian trajectory prediction has become a key issue, which is crucial for the motion path planning of intelligent agents such as autonomous vehicles, robots, and drones. In the current technological context, deep learning technology is becoming increasingly sophisticated and gradually replacing traditional models. The pedestrian trajectory prediction algorithm combining neural networks and attention mechanisms has significantly improved prediction accuracy. Based on in-depth research on deep learning and pedestrian trajectory prediction algorithms, this article focuses on physical environment modeling and learning of historical trajectory time dependence. At the same time, social interaction between pedestrians and scene interaction between pedestrians and the environment were handled. An improved pedestrian trajectory prediction algorithm is proposed by analyzing the existing model architecture. With the help of these improvements, acceptable predicted trajectories were successfully obtained. Experiments on public datasets have demonstrated the algorithm's effectiveness and achieved acceptable results.Keywords: deep learning, graph convolutional network, attention mechanism, LSTM
Procedia PDF Downloads 70425 Meta Mask Correction for Nuclei Segmentation in Histopathological Image
Authors: Jiangbo Shi, Zeyu Gao, Chen Li
Abstract:
Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations
Procedia PDF Downloads 140424 Deep Learning Based Unsupervised Sport Scene Recognition and Highlights Generation
Authors: Ksenia Meshkova
Abstract:
With increasing amount of multimedia data, it is very important to automate and speed up the process of obtaining meta. This process means not just recognition of some object or its movement, but recognition of the entire scene versus separate frames and having timeline segmentation as a final result. Labeling datasets is time consuming, besides, attributing characteristics to particular scenes is clearly difficult due to their nature. In this article, we will consider autoencoders application to unsupervised scene recognition and clusterization based on interpretable features. Further, we will focus on particular types of auto encoders that relevant to our study. We will take a look at the specificity of deep learning related to information theory and rate-distortion theory and describe the solutions empowering poor interpretability of deep learning in media content processing. As a conclusion, we will present the results of the work of custom framework, based on autoencoders, capable of scene recognition as was deeply studied above, with highlights generation resulted out of this recognition. We will not describe in detail the mathematical description of neural networks work but will clarify the necessary concepts and pay attention to important nuances.Keywords: neural networks, computer vision, representation learning, autoencoders
Procedia PDF Downloads 127423 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents
Authors: Ying Zhao, Xingyan Bin
Abstract:
Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints
Procedia PDF Downloads 547422 A Neuron Model of Facial Recognition and Detection of an Authorized Entity Using Machine Learning System
Authors: J. K. Adedeji, M. O. Oyekanmi
Abstract:
This paper has critically examined the use of Machine Learning procedures in curbing unauthorized access into valuable areas of an organization. The use of passwords, pin codes, user’s identification in recent times has been partially successful in curbing crimes involving identities, hence the need for the design of a system which incorporates biometric characteristics such as DNA and pattern recognition of variations in facial expressions. The facial model used is the OpenCV library which is based on the use of certain physiological features, the Raspberry Pi 3 module is used to compile the OpenCV library, which extracts and stores the detected faces into the datasets directory through the use of camera. The model is trained with 50 epoch run in the database and recognized by the Local Binary Pattern Histogram (LBPH) recognizer contained in the OpenCV. The training algorithm used by the neural network is back propagation coded using python algorithmic language with 200 epoch runs to identify specific resemblance in the exclusive OR (XOR) output neurons. The research however confirmed that physiological parameters are better effective measures to curb crimes relating to identities.Keywords: biometric characters, facial recognition, neural network, OpenCV
Procedia PDF Downloads 256421 Urban Logistics Dynamics: A User-Centric Approach to Traffic Modelling and Kinetic Parameter Analysis
Authors: Emilienne Lardy, Eric Ballot, Mariam Lafkihi
Abstract:
Efficient urban logistics requires a comprehensive understanding of traffic dynamics, particularly as it pertains to kinetic parameters influencing energy consumption and trip duration estimations. While real-time traffic information is increasingly accessible, current high-precision forecasting services embedded in route planning often function as opaque 'black boxes' for users. These services, typically relying on AI-processed counting data, fall short in accommodating open design parameters essential for management studies, notably within Supply Chain Management. This work revisits the modelling of traffic conditions in the context of city logistics, emphasizing its significance from the user’s point of view, with two focuses. Firstly, the focus is not on the vehicle flow but on the vehicles themselves and the impact of the traffic conditions on their driving behaviour. This means opening the range of studied indicators beyond vehicle speed, to describe extensively the kinetic and dynamic aspects of the driving behaviour. To achieve this, we leverage the Art. Kinema parameters are designed to characterize driving cycles. Secondly, this study examines how the driving context (i.e., exogenous factors to the traffic flow) determines the mentioned driving behaviour. Specifically, we explore how accurately the kinetic behaviour of a vehicle can be predicted based on a limited set of exogenous factors, such as time, day, road type, orientation, slope, and weather conditions. To answer this question, statistical analysis was conducted on real-world driving data, which includes high-frequency measurements of vehicle speed. A Factor Analysis and a Generalized Linear Model have been established to link kinetic parameters with independent categorical contextual variables. The results include an assessment of the adjustment quality and the robustness of the models, as well as an overview of the model’s outputs.Keywords: factor analysis, generalised linear model, real world driving data, traffic congestion, urban logistics, vehicle kinematics
Procedia PDF Downloads 65420 Quantifying User-Related, System-Related, and Context-Related Patterns of Smartphone Use
Authors: Andrew T. Hendrickson, Liven De Marez, Marijn Martens, Gytha Muller, Tudor Paisa, Koen Ponnet, Catherine Schweizer, Megan Van Meer, Mariek Vanden Abeele
Abstract:
Quantifying and understanding the myriad ways people use their phones and how that impacts their relationships, cognitive abilities, mental health, and well-being is increasingly important in our phone-centric society. However, most studies on the patterns of phone use have focused on theory-driven tests of specific usage hypotheses using self-report questionnaires or analyses of smaller datasets. In this work we present a series of analyses from a large corpus of over 3000 users that combine data-driven and theory-driven analyses to identify reliable smartphone usage patterns and clusters of similar users. Furthermore, we compare the stability of user clusters across user- and system-initiated sessions, as well as during the hypothesized ritualized behavior times directly before and after sleeping. Our results indicate support for some hypothesized usage patterns but present a more complete and nuanced view of how people use smartphones.Keywords: data mining, experience sampling, smartphone usage, health and well being
Procedia PDF Downloads 163419 Reviewing Image Recognition and Anomaly Detection Methods Utilizing GANs
Authors: Agastya Pratap Singh
Abstract:
This review paper examines the emerging applications of generative adversarial networks (GANs) in the fields of image recognition and anomaly detection. With the rapid growth of digital image data, the need for efficient and accurate methodologies to identify and classify images has become increasingly critical. GANs, known for their ability to generate realistic data, have gained significant attention for their potential to enhance traditional image recognition systems and improve anomaly detection performance. The paper systematically analyzes various GAN architectures and their modifications tailored for image recognition tasks, highlighting their strengths and limitations. Additionally, it delves into the effectiveness of GANs in detecting anomalies in diverse datasets, including medical imaging, industrial inspection, and surveillance. The review also discusses the challenges faced in training GANs, such as mode collapse and stability issues, and presents recent advancements aimed at overcoming these obstacles.Keywords: generative adversarial networks, image recognition, anomaly detection, synthetic data generation, deep learning, computer vision, unsupervised learning, pattern recognition, model evaluation, machine learning applications
Procedia PDF Downloads 25