Search results for: categorical datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 869

Search results for: categorical datasets

569 Big Data for Local Decision-Making: Indicators Identified at International Conference on Urban Health 2017

Authors: Dana R. Thomson, Catherine Linard, Sabine Vanhuysse, Jessica E. Steele, Michal Shimoni, Jose Siri, Waleska Caiaffa, Megumi Rosenberg, Eleonore Wolff, Tais Grippa, Stefanos Georganos, Helen Elsey

Abstract:

The Sustainable Development Goals (SDGs) and Urban Health Equity Assessment and Response Tool (Urban HEART) identify dozens of key indicators to help local decision-makers prioritize and track inequalities in health outcomes. However, presentations and discussions at the International Conference on Urban Health (ICUH) 2017 suggested that additional indicators are needed to make decisions and policies. A local decision-maker may realize that malaria or road accidents are a top priority. However, s/he needs additional health determinant indicators, for example about standing water or traffic, to address the priority and reduce inequalities. Health determinants reflect the physical and social environments that influence health outcomes often at community- and societal-levels and include such indicators as access to quality health facilities, access to safe parks, traffic density, location of slum areas, air pollution, social exclusion, and social networks. Indicator identification and disaggregation are necessarily constrained by available datasets – typically collected about households and individuals in surveys, censuses, and administrative records. Continued advancements in earth observation, data storage, computing and mobile technologies mean that new sources of health determinants indicators derived from 'big data' are becoming available at fine geographic scale. Big data includes high-resolution satellite imagery and aggregated, anonymized mobile phone data. While big data are themselves not representative of the population (e.g., satellite images depict the physical environment), they can provide information about population density, wealth, mobility, and social environments with tremendous detail and accuracy when combined with population-representative survey, census, administrative and health system data. The aim of this paper is to (1) flag to data scientists important indicators needed by health decision-makers at the city and sub-city scale - ideally free and publicly available, and (2) summarize for local decision-makers new datasets that can be generated from big data, with layperson descriptions of difficulties in generating them. We include SDGs and Urban HEART indicators, as well as indicators mentioned by decision-makers attending ICUH 2017.

Keywords: health determinant, health outcome, mobile phone, remote sensing, satellite imagery, SDG, urban HEART

Procedia PDF Downloads 209
568 MULTI-FLGANs: Multi-Distributed Adversarial Networks for Non-Independent and Identically Distributed Distribution

Authors: Akash Amalan, Rui Wang, Yanqi Qiao, Emmanouil Panaousis, Kaitai Liang

Abstract:

Federated learning is an emerging concept in the domain of distributed machine learning. This concept has enabled General Adversarial Networks (GANs) to benefit from the rich distributed training data while preserving privacy. However, in a non-IID setting, current federated GAN architectures are unstable, struggling to learn the distinct features, and vulnerable to mode collapse. In this paper, we propose an architecture MULTI-FLGAN to solve the problem of low-quality images, mode collapse, and instability for non-IID datasets. Our results show that MULTI-FLGAN is four times as stable and performant (i.e., high inception score) on average over 20 clients compared to baseline FLGAN.

Keywords: federated learning, generative adversarial network, inference attack, non-IID data distribution

Procedia PDF Downloads 158
567 Recognition of New Biomarkers in the Epigenetic Pathway of Breast Cancer

Authors: Fatemeh Zeinali Sehrig

Abstract:

This study aimed to evaluate the expression of miR-299-3p, DNMT1, DNMT3A, and DNMT3B in breast cancer samples and investigate their diagnostic significance. Using the GSE40525 and GSE45666, the miR-299-3p expression level was studied in breast cancer tissues. Also, the expression levels of DNMT1, DNMT3A, and DNMT3B were investigated by analyzing GSE61725, GSE86374, and GSE37751 datasets. The target genes were studied in terms of biological processes of molecular functions and cellular components. Consistent with the in silico results, miR-299-3p expression was substantially decreased in breast cancer tissues, and the expression levels of DNMT1, DNMT3A, and DNMT3B were considerably upregulated in breast cancer samples. It was found that the expression levels of miR-299-3p and DNMT1, DNMT3A, and DNMT3B could be valuable diagnostic tools for detecting breast cancer. Also, miR-299-3p downregulation may play a role in DNMT1, DNMT3A, and DNMT3B upregulation in breast cancer.

Keywords: breast cancer, miR-299-3p, DNMTs, GEO database

Procedia PDF Downloads 38
566 Real Time Multi Person Action Recognition Using Pose Estimates

Authors: Aishrith Rao

Abstract:

Human activity recognition is an important aspect of video analytics, and many approaches have been recommended to enable action recognition. In this approach, the model is used to identify the action of the multiple people in the frame and classify them accordingly. A few approaches use RNNs and 3D CNNs, which are computationally expensive and cannot be trained with the small datasets which are currently available. Multi-person action recognition has been performed in order to understand the positions and action of people present in the video frame. The size of the video frame can be adjusted as a hyper-parameter depending on the hardware resources available. OpenPose has been used to calculate pose estimate using CNN to produce heap-maps, one of which provides skeleton features, which are basically joint features. The features are then extracted, and a classification algorithm can be applied to classify the action.

Keywords: human activity recognition, computer vision, pose estimates, convolutional neural networks

Procedia PDF Downloads 141
565 Self-reported Acute Pesticide Intoxication in Ethiopia

Authors: Amare Nigatu, Mågne Bratveit, Bente E. Moen

Abstract:

Background: Pesticide exposure is an important public health concern in Ethiopia, but there is limited information on pesticide intoxications. Residents may have an increased risk of pesticide exposure through proximity of their homes to farms using pesticides. Also the pesticide exposure might be related to employment at these farms. This study investigated the prevalence of acute pesticide intoxications (API) by residence proximity to a nearby flower farm and assessed if intoxications are related to working there or not. Methods: A cross-sectional survey involving 516 persons was conducted. Participants were grouped according to their residence proximity from a large flower farm; living within 5 kilometers and 5-12 kilometers away, respectively. In a structured interview, participants were asked if they had health symptoms within 48 hours of pesticide exposure in the past year. Those, who had experienced this and reported two or more typical pesticide intoxication symptoms, were considered as having had API. Chi-square and independent t-tests were used to compare categorical and continuous variables, respectively. Confounding variables were adjusted by using binomial regression model. Results: The prevalence of API in the past year among the residents in the study area was 26%, and it was higher in the population living close to the flower farm (42%) compared to those living far away (11%), prevalence ratio (PR) = 3.2, 95% CI: 2.2-4.8, adjusted for age, gender & education. A subgroup living close to the farm & working there had significantly more API (56%) than those living close & did not work there (16%), adjusted PR = 3.0, 95% CI: 1.8-4.9. Flower farm workers reported more API (56%) than those not working there (13%,), adjusted PR = 4.0, 95% CI: 2.9-5.6. Conclusion: The residents living closer than 5 kilometers to the flower farm reported significantly higher prevalence of API than those living 5-12 kilometers away. This increased risk of API was associated with work at the flower farm.

Keywords: acute pesticide intoxications, self-reported symptoms, flower farm workers, living proximity

Procedia PDF Downloads 292
564 Generating Product Description with Generative Pre-Trained Transformer 2

Authors: Minh-Thuan Nguyen, Phuong-Thai Nguyen, Van-Vinh Nguyen, Quang-Minh Nguyen

Abstract:

Research on automatically generating descriptions for e-commerce products is gaining increasing attention in recent years. However, the generated descriptions of their systems are often less informative and attractive because of lacking training datasets or the limitation of these approaches, which often use templates or statistical methods. In this paper, we explore a method to generate production descriptions by using the GPT-2 model. In addition, we apply text paraphrasing and task-adaptive pretraining techniques to improve the qualify of descriptions generated from the GPT-2 model. Experiment results show that our models outperform the baseline model through automatic evaluation and human evaluation. Especially, our methods achieve a promising result not only on the seen test set but also in the unseen test set.

Keywords: GPT-2, product description, transformer, task-adaptive, language model, pretraining

Procedia PDF Downloads 197
563 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization

Authors: K. Umbleja, M. Ichino, H. Yaguchi

Abstract:

In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.

Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data

Procedia PDF Downloads 170
562 Mining Scientific Literature to Discover Potential Research Data Sources: An Exploratory Study in the Field of Haemato-Oncology

Authors: A. Anastasiou, K. S. Tingay

Abstract:

Background: Discovering suitable datasets is an important part of health research, particularly for projects working with clinical data from patients organized in cohorts (cohort data), but with the proliferation of so many national and international initiatives, it is becoming increasingly difficult for research teams to locate real world datasets that are most relevant to their project objectives. We present a method for identifying healthcare institutes in the European Union (EU) which may hold haemato-oncology (HO) data. A key enabler of this research was the bibInsight platform, a scientometric data management and analysis system developed by the authors at Swansea University. Method: A PubMed search was conducted using HO clinical terms taken from previous work. The resulting XML file was processed using the bibInsight platform, linking affiliations to the Global Research Identifier Database (GRID). GRID is an international, standardized list of institutions, including the city and country in which the institution exists, as well as a category of the main business type, e.g., Academic, Healthcare, Government, Company. Countries were limited to the 28 current EU members, and institute type to 'Healthcare'. An article was considered valid if at least one author was affiliated with an EU-based healthcare institute. Results: The PubMed search produced 21,310 articles, consisting of 9,885 distinct affiliations with correspondence in GRID. Of these articles, 760 were from EU countries, and 390 of these were healthcare institutes. One affiliation was excluded as being a veterinary hospital. Two EU countries did not have any publications in our analysis dataset. The results were analysed by country and by individual healthcare institute. Networks both within the EU and internationally show institutional collaborations, which may suggest a willingness to share data for research purposes. Geographical mapping can ensure that data has broad population coverage. Collaborations with industry or government may exclude healthcare institutes that may have embargos or additional costs associated with data access. Conclusions: Data reuse is becoming increasingly important both for ensuring the validity of results, and economy of available resources. The ability to identify potential, specific data sources from over twenty thousand articles in less than an hour could assist in improving knowledge of, and access to, data sources. As our method has not yet specified if these healthcare institutes are holding data, or merely publishing on that topic, future work will involve text mining of data-specific concordant terms to identify numbers of participants, demographics, study methodologies, and sub-topics of interest.

Keywords: data reuse, data discovery, data linkage, journal articles, text mining

Procedia PDF Downloads 115
561 River's Bed Level Changing Pattern Due to Sedimentation, Case Study: Gash River, Kassala, Sudan

Authors: Faisal Ali, Hasssan Saad Mohammed Hilmi, Mustafa Mohamed, Shamseddin Musa

Abstract:

The Gash rivers an ephemeral river, it usually flows from July to September, it has a braided pattern with high sediment content, of 15200 ppm in suspension, and 360 kg/sec as bed load. The Gash river bed has an average slope of 1.3 m/Km. The objectives of this study were: assessing the Gash River bed level patterns; quantifying the annual variations in Gash bed level; and recommending a suitable method to reduce the sediment accumulation on the Gash River bed. The study covered temporally the period 1905-2013 using datasets included the Gash river flows, and the cross sections. The results showed that there is an increasing trend in the river bed of 5 cm3 per year. This is resulted in changing the behavior of the flood routing and consequently the flood hazard is tremendously increased in Kassala city.

Keywords: bed level, cross section, gash river, sedimentation

Procedia PDF Downloads 542
560 Decision Trees Constructing Based on K-Means Clustering Algorithm

Authors: Loai Abdallah, Malik Yousef

Abstract:

A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.

Keywords: ensemble clustering, decision trees, classification, K nearest neighbors

Procedia PDF Downloads 191
559 A Comparison of YOLO Family for Apple Detection and Counting in Orchards

Authors: Yuanqing Li, Changyi Lei, Zhaopeng Xue, Zhuo Zheng, Yanbo Long

Abstract:

In agricultural production and breeding, implementing automatic picking robot in orchard farming to reduce human labour and error is challenging. The core function of it is automatic identification based on machine vision. This paper focuses on apple detection and counting in orchards and implements several deep learning methods. Extensive datasets are used and a semi-automatic annotation method is proposed. The proposed deep learning models are in state-of-the-art YOLO family. In view of the essence of the models with various backbones, a multi-dimensional comparison in details is made in terms of counting accuracy, mAP and model memory, laying the foundation for realising automatic precision agriculture.

Keywords: agricultural object detection, deep learning, machine vision, YOLO family

Procedia PDF Downloads 198
558 Monitoring Land Productivity Dynamics of Gombe State, Nigeria

Authors: Ishiyaku Abdulkadir, Satish Kumar J

Abstract:

Land Productivity is a measure of the greenness of above-ground biomass in health and potential gain and is not related to agricultural productivity. Monitoring land productivity dynamics is essential to identify, especially when and where the trend is characterized degraded for mitigation measures. This research aims to monitor the land productivity trend of Gombe State between 2001 and 2015. QGIS was used to compute NDVI from AVHRR/MODIS datasets in a cloud-based method. The result appears that land area with improving productivity account for 773sq.km with 4.31%, stable productivity traced to 4,195.6 sq.km with 23.40%, stable but stressed productivity represent 18.7sq.km account for 0.10%, early sign of decline productivity occupied 5203.1sq.km with 29%, declining productivity account for 7019.7sq.km, represent 39.2%, water bodies occupied 718.7sq.km traced to 4% of the state’s area.

Keywords: above-ground biomass, dynamics, land productivity, man-environment relationship

Procedia PDF Downloads 145
557 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 140
556 Assessment of Nurses’ Knowledge of the Glasgow Coma Scale in a Saudi Tertiary Care Hospital: A Cross-Sectional Study

Authors: Roaa Al Sharif, Salsabil Abo Al-Azayem, Nimah Alsomali, Wjoud Alsaeed, Nawal Alshammari, Abdulaziz Alwatban, Yaseen Alrabae, Razan Orfali, Faisal Alqarni, Ahmad Alrasheedi

Abstract:

from various countries have revealed that nurses possess only a basic understanding of the GCS. Regarding this matter, limited knowledge is available about the situation in Saudi Arabia. Overall, the available research suggests that there is room for improvement in the knowledge of the GCS among nurses in Saudi Arabia. Further training and education programs may be beneficial in enhancing nurses' understanding and application of the GCS in clinical practice. Objective: To determine the level of knowledge and competence in assessing the GCS among staff nurses and to identify factors that might influence their knowledge at King Fahd Medical City in Riyadh, Saudi Arabia. Methods: A descriptive, cross-sectional survey involving 199 KFMC staff nurses was conducted. Nurses were provided with a structured questionnaire, and data were collected and analyzed using SPSS version 16, employing descriptive statistics and Chi-square tests. Results: The majority, 81.4% of nurses, had an average level of knowledge in assessing the Glasgow Coma Scale (GCS). The mean score for measuring the level of knowledge among staff nurses in GCS assessment was 8.8 ± 1.826. Overall, 13.6% of respondents demonstrated good knowledge of the GCS, scoring between 11 and 15 points, while only 5% of nurses exhibited poor knowledge of the GCS assessment. There was a significant correlation between knowledge and nurses' departments (χ2(2) = 19.184, p < 0.001). χ2(2) = 19.184," representing a Chi-square statistic with 2 degrees of freedom used to test the association between categorical variables in the data analysis. Conclusion: The findings indicate that knowledge of GCS assessment among staff nurses in a single center in Saudi Arabia is moderate. Therefore, there is a need for continuous education programs to enhance their competence in using this assessment.

Keywords: Glasgow Coma Scale, brain injury, nurses’ knowledge assessment, continuous education programs

Procedia PDF Downloads 23
555 The Use of Respiratory Index of Severity in Children (RISC) for Predicting Clinical Outcomes for 3 Months-59 Months Old Patients Hospitalized with Community-Acquired Pneumonia in Visayas Community Medical Center, Cebu City from January 2013 - June 2

Authors: Karl Owen L. Suan, Juliet Marie S. Lambayan, Floramay P. Salo-Curato

Abstract:

Objective: To predict the outcome among patients admitted with community-acquired pneumonia (ages 3 months to 59 months old) admitted in Visayas Community Medical Center using the Respiratory Index of Severity in Children (RISC). Design: A cross-sectional study design was used. Setting: The study was done in Visayas Community Medical Center, which is a private tertiary level in Cebu City from January-June 2013. Patients/Participants: A total of 72 patients were initially enrolled in the study. However, 1 patient transferred to another institution, thus 71 patients were included in this study. Within 24 hours from admission, patients were assigned a RISC score. Statistical Analysis: Cohen’s kappa coefficient was used for inter-rater agreement for categorical data. This study used frequency and percentage distribution for qualitative data. Mean, standard deviation and range were used for quantitative data. To determine the relationship of each RISC score parameter and the total RISC score with the outcome, a Mann Whitney U Test and 2x2 Fischer Exact test for testing associations were used. A p value less of than 0.05 alpha was considered significant. Results: There was a statistical significance between RISC score and clinical outcome. RISC score of greater than 4 was correlated with intubation and/or mortality. Conclusion: The RISC scoring system is a simple combination of clinical parameters and a reliable tool that will help stratify patients aged 3 months to 59 months in predicting clinical outcome.

Keywords: RISC, clinical outcome, community-acquired pneumonia, patients

Procedia PDF Downloads 302
554 Combining the Dynamic Conditional Correlation and Range-GARCH Models to Improve Covariance Forecasts

Authors: Piotr Fiszeder, Marcin Fałdziński, Peter Molnár

Abstract:

The dynamic conditional correlation model of Engle (2002) is one of the most popular multivariate volatility models. However, this model is based solely on closing prices. It has been documented in the literature that the high and low price of the day can be used in an efficient volatility estimation. We, therefore, suggest a model which incorporates high and low prices into the dynamic conditional correlation framework. Empirical evaluation of this model is conducted on three datasets: currencies, stocks, and commodity exchange-traded funds. The utilisation of realized variances and covariances as proxies for true variances and covariances allows us to reach a strong conclusion that our model outperforms not only the standard dynamic conditional correlation model but also a competing range-based dynamic conditional correlation model.

Keywords: volatility, DCC model, high and low prices, range-based models, covariance forecasting

Procedia PDF Downloads 183
553 Distorted Document Images Dataset for Text Detection and Recognition

Authors: Ilia Zharikov, Philipp Nikitin, Ilia Vasiliev, Vladimir Dokholyan

Abstract:

With the increasing popularity of document analysis and recognition systems, text detection (TD) and optical character recognition (OCR) in document images become challenging tasks. However, according to our best knowledge, no publicly available datasets for these particular problems exist. In this paper, we introduce a Distorted Document Images dataset (DDI-100) and provide a detailed analysis of the DDI-100 in its current state. To create the dataset we collected 7000 unique document pages, and extend it by applying different types of distortions and geometric transformations. In total, DDI-100 contains more than 100,000 document images together with binary text masks, text and character locations in terms of bounding boxes. We also present an analysis of several state-of-the-art TD and OCR approaches on the presented dataset. Lastly, we demonstrate the usefulness of DDI-100 to improve accuracy and stability of the considered TD and OCR models.

Keywords: document analysis, open dataset, optical character recognition, text detection

Procedia PDF Downloads 173
552 Base Deficit Profiling in Patients with Isolated Blunt Traumatic Brain Injury – Correlation with Severity and Outcomes

Authors: Shahan Waheed, Muhammad Waqas, Asher Feroz

Abstract:

Objectives: To determine the utility of base deficit in traumatic brain injury in assessing the severity and to correlate with the conventional computed tomography scales in grading the severity of head injury. Methodology: Observational cross-sectional study conducted in a tertiary care facility from 1st January 2010 to 31st December 2012. All patients with isolated traumatic brain injury presenting within 24 hours of the injury to the emergency department were included in the study. Initial Glasgow Coma Scale and base deficit values were taken at presentation, the patients were followed during their hospital stay and CT scan brain findings were recorded and graded as per the Rotterdam scale, the findings were cross-checked by a radiologist, Glasgow Outcome Scale was taken on last follow up. Outcomes were dichotomized into favorable and unfavorable outcomes. Continuous variables with normal and non-normal distributions are reported as mean ± SD. Categorical variables are presented as frequencies and percentages. Relationship of the base deficit with GCS, GOS, CT scan brain and length of stay was calculated using Spearman`s correlation. Results: 154 patients were enrolled in the study. Mean age of the patients were 30 years and 137 were males. The severity of brain injuries as per the GCS was 34 moderate and 109 severe respectively. 34 percent of the total has an unfavorable outcome with a mean of 18±14. The correlation was significant at the 0.01 level with GCS on presentation and the base deficit 0.004. The correlation was not significant between the Rotterdam CT scan brain findings, length of stay and the base deficit. Conclusion: The base deficit was found to be a good predictor of severity of brain injury. There was no association of the severity of injuries on the CT scan brain as per the Rotterdam scale and the base deficit. Further studies with large sample size are needed to further evaluate the associations.

Keywords: base deficit, traumatic brain injury, Rotterdam, GCS

Procedia PDF Downloads 444
551 FPGA Implementation of Adaptive Clock Recovery for TDMoIP Systems

Authors: Semih Demir, Anil Celebi

Abstract:

Circuit switched networks widely used until the end of the 20th century have been transformed into packages switched networks. Time Division Multiplexing over Internet Protocol (TDMoIP) is a system that enables Time Division Multiplexing (TDM) traffic to be carried over packet switched networks (PSN). In TDMoIP systems, devices that send TDM data to the PSN and receive it from the network must operate with the same clock frequency. In this study, it was aimed to implement clock synchronization process in Field Programmable Gate Array (FPGA) chips using time information attached to the packages received from PSN. The designed hardware is verified using the datasets obtained for the different carrier types and comparing the results with the software model. Field tests are also performed by using the real time TDMoIP system.

Keywords: clock recovery on TDMoIP, FPGA, MATLAB reference model, clock synchronization

Procedia PDF Downloads 278
550 The Choosing the Right Projects With Multi-Criteria Decision Making to Ensure the Sustainability of the Projects

Authors: Saniye Çeşmecioğlu

Abstract:

The importance of project sustainability and success has become increasingly significant due to the proliferation of external environmental factors that have decreased project resistance in contemporary times. The primary approach to forestall the failure of projects is to ensure their long-term viability through the strategic selection of projects as creating judicious project selection framework within the organization. Decision-makers require precise decision contexts (models) that conform to the company's business objectives and sustainability expectations during the project selection process. The establishment of a rational model for project selection enables organizations to create a distinctive and objective framework for the selection process. Additionally, for the optimal implementation of this decision-making model, it is crucial to establish a Project Management Office (PMO) team and Project Steering Committee within the organizational structure to oversee the framework. These teams enable updating project selection criteria and weights in response to changing conditions, ensuring alignment with the company's business goals, and facilitating the selection of potentially viable projects. This paper presents a multi-criteria decision model for selecting project sustainability and project success criteria that ensures timely project completion and retention. The model was developed using MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique) and was based on broadcaster companies’ expectations. The ultimate results of this study provide a model that endorses the process of selecting the appropriate project objectively by utilizing project selection and sustainability criteria along with their respective weights for organizations. Additionally, the study offers suggestions that may ascertain helpful in future endeavors.

Keywords: project portfolio management, project selection, multi-criteria decision making, project sustainability and success criteria, MACBETH

Procedia PDF Downloads 62
549 Person Re-Identification using Siamese Convolutional Neural Network

Authors: Sello Mokwena, Monyepao Thabang

Abstract:

In this study, we propose a comprehensive approach to address the challenges in person re-identification models. By combining a centroid tracking algorithm with a Siamese convolutional neural network model, our method excels in detecting, tracking, and capturing robust person features across non-overlapping camera views. The algorithm efficiently identifies individuals in the camera network, while the neural network extracts fine-grained global features for precise cross-image comparisons. The approach's effectiveness is further accentuated by leveraging the camera network topology for guidance. Our empirical analysis on benchmark datasets highlights its competitive performance, particularly evident when background subtraction techniques are selectively applied, underscoring its potential in advancing person re-identification techniques.

Keywords: camera network, convolutional neural network topology, person tracking, person re-identification, siamese

Procedia PDF Downloads 73
548 Healthcare Data Mining Innovations

Authors: Eugenia Jilinguirian

Abstract:

In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.

Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database

Procedia PDF Downloads 66
547 Estimating Estimators: An Empirical Comparison of Non-Invasive Analysis Methods

Authors: Yan Torres, Fernanda Simoes, Francisco Petrucci-Fonseca, Freddie-Jeanne Richard

Abstract:

The non-invasive samples are an alternative of collecting genetic samples directly. Non-invasive samples are collected without the manipulation of the animal (e.g., scats, feathers and hairs). Nevertheless, the use of non-invasive samples has some limitations. The main issue is degraded DNA, leading to poorer extraction efficiency and genotyping. Those errors delayed for some years a widespread use of non-invasive genetic information. Possibilities to limit genotyping errors can be done using analysis methods that can assimilate the errors and singularities of non-invasive samples. Genotype matching and population estimation algorithms can be highlighted as important analysis tools that have been adapted to deal with those errors. Although, this recent development of analysis methods there is still a lack of empirical performance comparison of them. A comparison of methods with dataset different in size and structure can be useful for future studies since non-invasive samples are a powerful tool for getting information specially for endangered and rare populations. To compare the analysis methods, four different datasets used were obtained from the Dryad digital repository were used. Three different matching algorithms (Cervus, Colony and Error Tolerant Likelihood Matching - ETLM) are used for matching genotypes and two different ones for population estimation (Capwire and BayesN). The three matching algorithms showed different patterns of results. The ETLM produced less number of unique individuals and recaptures. A similarity in the matched genotypes between Colony and Cervus was observed. That is not a surprise since the similarity between those methods on the likelihood pairwise and clustering algorithms. The matching of ETLM showed almost no similarity with the genotypes that were matched with the other methods. The different cluster algorithm system and error model of ETLM seems to lead to a more criterious selection, although the processing time and interface friendly of ETLM were the worst between the compared methods. The population estimators performed differently regarding the datasets. There was a consensus between the different estimators only for the one dataset. The BayesN showed higher and lower estimations when compared with Capwire. The BayesN does not consider the total number of recaptures like Capwire only the recapture events. So, this makes the estimator sensitive to data heterogeneity. Heterogeneity in the sense means different capture rates between individuals. In those examples, the tolerance for homogeneity seems to be crucial for BayesN work properly. Both methods are user-friendly and have reasonable processing time. An amplified analysis with simulated genotype data can clarify the sensibility of the algorithms. The present comparison of the matching methods indicates that Colony seems to be more appropriated for general use considering a time/interface/robustness balance. The heterogeneity of the recaptures affected strongly the BayesN estimations, leading to over and underestimations population numbers. Capwire is then advisable to general use since it performs better in a wide range of situations.

Keywords: algorithms, genetics, matching, population

Procedia PDF Downloads 143
546 A Machine Learning Approach to Detecting Evasive PDF Malware

Authors: Vareesha Masood, Ammara Gul, Nabeeha Areej, Muhammad Asif Masood, Hamna Imran

Abstract:

The universal use of PDF files has prompted hackers to use them for malicious intent by hiding malicious codes in their victim’s PDF machines. Machine learning has proven to be the most efficient in identifying benign files and detecting files with PDF malware. This paper has proposed an approach using a decision tree classifier with parameters. A modern, inclusive dataset CIC-Evasive-PDFMal2022, produced by Lockheed Martin’s Cyber Security wing is used. It is one of the most reliable datasets to use in this field. We designed a PDF malware detection system that achieved 99.2%. Comparing the suggested model to other cutting-edge models in the same study field, it has a great performance in detecting PDF malware. Accordingly, we provide the fastest, most reliable, and most efficient PDF Malware detection approach in this paper.

Keywords: PDF, PDF malware, decision tree classifier, random forest classifier

Procedia PDF Downloads 91
545 Improvement of Ground Truth Data for Eye Location on Infrared Driver Recordings

Authors: Sorin Valcan, Mihail Gaianu

Abstract:

Labeling is a very costly and time consuming process which aims to generate datasets for training neural networks in several functionalities and projects. For driver monitoring system projects, the need for labeled images has a significant impact on the budget and distribution of effort. This paper presents the modifications done to an algorithm used for the generation of ground truth data for 2D eyes location on infrared images with drivers in order to improve the quality of the data and performance of the trained neural networks. The algorithm restrictions become tougher, which makes it more accurate but also less constant. The resulting dataset becomes smaller and shall not be altered by any kind of manual label adjustment before being used in the neural networks training process. These changes resulted in a much better performance of the trained neural networks.

Keywords: labeling automation, infrared camera, driver monitoring, eye detection, convolutional neural networks

Procedia PDF Downloads 118
544 Designing Emergency Response Network for Rail Hazmat Shipments

Authors: Ali Vaezi, Jyotirmoy Dalal, Manish Verma

Abstract:

The railroad is one of the primary transportation modes for hazardous materials (hazmat) shipments in North America. Installing an emergency response network capable of providing a commensurate response is one of the primary levers to contain (or mitigate) the adverse consequences from rail hazmat incidents. To this end, we propose a two-stage stochastic program to determine the location of and equipment packages to be stockpiled at each response facility. The raw input data collected from publicly available reports were processed, fed into the proposed optimization program, and then tested on a realistic railroad network in Ontario (Canada). From the resulting analyses, we conclude that the decisions based only on empirical datasets would undermine the effectiveness of the resulting network; coverage can be improved by redistributing equipment in the network, purchasing equipment with higher containment capacity, and making use of a disutility multiplier factor.

Keywords: hazmat, rail network, stochastic programming, emergency response

Procedia PDF Downloads 182
543 Understanding and Improving Neural Network Weight Initialization

Authors: Diego Aguirre, Olac Fuentes

Abstract:

In this paper, we present a taxonomy of weight initialization schemes used in deep learning. We survey the most representative techniques in each class and compare them in terms of overhead cost, convergence rate, and applicability. We also introduce a new weight initialization scheme. In this technique, we perform an initial feedforward pass through the network using an initialization mini-batch. Using statistics obtained from this pass, we initialize the weights of the network, so the following properties are met: 1) weight matrices are orthogonal; 2) ReLU layers produce a predetermined number of non-zero activations; 3) the output produced by each internal layer has a unit variance; 4) weights in the last layer are chosen to minimize the error in the initial mini-batch. We evaluate our method on three popular architectures, and a faster converge rates are achieved on the MNIST, CIFAR-10/100, and ImageNet datasets when compared to state-of-the-art initialization techniques.

Keywords: deep learning, image classification, supervised learning, weight initialization

Procedia PDF Downloads 135
542 Sentiment Analysis of Consumers’ Perceptions on Social Media about the Main Mobile Providers in Jamaica

Authors: Sherrene Bogle, Verlia Bogle, Tyrone Anderson

Abstract:

In recent years, organizations have become increasingly interested in the possibility of analyzing social media as a means of gaining meaningful feedback about their products and services. The aspect based sentiment analysis approach is used to predict the sentiment for Twitter datasets for Digicel and Lime, the main mobile companies in Jamaica, using supervised learning classification techniques. The results indicate an average of 82.2 percent accuracy in classifying tweets when comparing three separate classification algorithms against the purported baseline of 70 percent and an average root mean squared error of 0.31. These results indicate that the analysis of sentiment on social media in order to gain customer feedback can be a viable solution for mobile companies looking to improve business performance.

Keywords: machine learning, sentiment analysis, social media, supervised learning

Procedia PDF Downloads 444
541 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: big data, open data, productivity, data governance

Procedia PDF Downloads 371
540 MarginDistillation: Distillation for Face Recognition Neural Networks with Margin-Based Softmax

Authors: Svitov David, Alyamkin Sergey

Abstract:

The usage of convolutional neural networks (CNNs) in conjunction with the margin-based softmax approach demonstrates the state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and face embeddings predicted by the teacher network.

Keywords: ArcFace, distillation, face recognition, margin-based softmax

Procedia PDF Downloads 146