Search results for: imputation method of missing data
38534 Data Challenges Facing Implementation of Road Safety Management Systems in Egypt
Authors: A. Anis, W. Bekheet, A. El Hakim
Abstract:
Implementing a Road Safety Management System (SMS) in a crowded developing country such as Egypt is a necessity. Beginning a sustainable SMS requires a comprehensive reliable data system for all information pertinent to road crashes. In this paper, a survey for the available data in Egypt and validating it for using in an SMS in Egypt. The research provides some missing data, and refer to the unavailable data in Egypt, looking forward to the contribution of the scientific society, the authorities, and the public in solving the problem of missing or unreliable crash data. The required data for implementing an SMS in Egypt are divided into three categories; the first is available data such as fatality and injury rates and it is proven in this research that it may be inconsistent and unreliable, the second category of data is not available, but it may be estimated, an example of estimating vehicle cost is available in this research, the third is not available and can be measured case by case such as the functional and geometric properties of a facility. Some inquiries are provided in this research for the scientific society, such as how to improve the links among stakeholders of road safety in order to obtain a consistent, non-biased, and reliable data system.Keywords: road safety management system, road crash, road fatality, road injury
Procedia PDF Downloads 15238533 A Reasoning Method of Cyber-Attack Attribution Based on Threat Intelligence
Authors: Li Qiang, Yang Ze-Ming, Liu Bao-Xu, Jiang Zheng-Wei
Abstract:
With the increasing complexity of cyberspace security, the cyber-attack attribution has become an important challenge of the security protection systems. The difficult points of cyber-attack attribution were forced on the problems of huge data handling and key data missing. According to this situation, this paper presented a reasoning method of cyber-attack attribution based on threat intelligence. The method utilizes the intrusion kill chain model and Bayesian network to build attack chain and evidence chain of cyber-attack on threat intelligence platform through data calculation, analysis and reasoning. Then, we used a number of cyber-attack events which we have observed and analyzed to test the reasoning method and demo system, the result of testing indicates that the reasoning method can provide certain help in cyber-attack attribution.Keywords: reasoning, Bayesian networks, cyber-attack attribution, Kill Chain, threat intelligence
Procedia PDF Downloads 45238532 Linkage Disequilibrium and Haplotype Blocks Study from Two High-Density Panels and a Combined Panel in Nelore Beef Cattle
Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari
Abstract:
Genotype imputation has been used to reduce genomic selections costs. In order to increase haplotype detection accuracy in methods that considers the linkage disequilibrium, another approach could be used, such as combined genotype data from different panels. Therefore, this study aimed to evaluate the linkage disequilibrium and haplotype blocks in two high-density panels before and after the imputation to a combined panel in Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip (IHD), wherein 93 animals (23 bulls and 70 progenies) were also genotyped with the Affymetrix Axion Genome-Wide BOS 1 Array Plate (AHD). After the quality control, 809 IHD animals (509,107 SNPs) and 93 AHD (427,875 SNPs) remained for analyses. The combined genotype panel (CP) was constructed by merging both panels after quality control, resulting in 880,336 SNPs. Imputation analysis was conducted using software FImpute v.2.2b. The reference (CP) and target (IHD) populations consisted of 23 bulls and 786 animals, respectively. The linkage disequilibrium and haplotype blocks studies were carried out for IHD, AHD, and imputed CP. Two linkage disequilibrium measures were considered; the correlation coefficient between alleles from two loci (r²) and the |D’|. Both measures were calculated using the software PLINK. The haplotypes' blocks were estimated using the software Haploview. The r² measurement presented different decay when compared to |D’|, wherein AHD and IHD had almost the same decay. For r², even with possible overestimation by the sample size for AHD (93 animals), the IHD presented higher values when compared to AHD for shorter distances, but with the increase of distance, both panels presented similar values. The r² measurement is influenced by the minor allele frequency of the pair of SNPs, which can cause the observed difference comparing the r² decay and |D’| decay. As a sum of the combinations between Illumina and Affymetrix panels, the CP presented a decay equivalent to a mean of these combinations. The estimated haplotype blocks detected for IHD, AHD, and CP were 84,529, 63,967, and 140,336, respectively. The IHD were composed by haplotype blocks with mean of 137.70 ± 219.05kb, the AHD with mean of 102.10kb ± 155.47, and the CP with mean of 107.10kb ± 169.14. The majority of the haplotype blocks of these three panels were composed by less than 10 SNPs, with only 3,882 (IHD), 193 (AHD) and 8,462 (CP) haplotype blocks composed by 10 SNPs or more. There was an increase in the number of chromosomes covered with long haplotypes when CP was used as well as an increase in haplotype coverage for short chromosomes (23-29), which can contribute for studies that explore haplotype blocks. In general, using CP could be an alternative to increase density and number of haplotype blocks, increasing the probability to obtain a marker close to a quantitative trait loci of interest.Keywords: Bos taurus indicus, decay, genotype imputation, single nucleotide polymorphism
Procedia PDF Downloads 28138531 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values
Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi
Abstract:
A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.Keywords: eXtreme gradient boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impair, multiclass classification, ADNI, support vector machine, random forest
Procedia PDF Downloads 18938530 Optimizing Nitrogen Fertilizer Application in Rice Cultivation: A Decision Model for Top and Ear Dressing Dosages
Authors: Ya-Li Tsai
Abstract:
Nitrogen is a vital element crucial for crop growth, significantly influencing crop yield. In rice cultivation, farmers often apply substantial nitrogen fertilizer to maximize yields. However, excessive nitrogen application increases the risk of lodging and pest infestation, leading to yield losses. Additionally, conventional flooded irrigation methods consume significant water resources, necessitating precise agricultural and intelligent water management systems. In this study, it leveraged physiological data and field images captured by unmanned aerial vehicles, considering fertilizer treatment and irrigation as key factors. Statistical models incorporating rice physiological data, yield, and vegetation indices from image data were developed. Missing physiological data were addressed using multiple imputation and regression methods, and regression models were established using principal component analysis and stepwise regression. Target nitrogen accumulation at key growth stages was identified to optimize fertilizer application, with the difference between actual and target nitrogen accumulation guiding recommendations for ear dressing dosage. Field experiments conducted in 2022 validated the recommended ear dressing dosage, demonstrating no significant difference in final yield compared to traditional fertilizer levels under alternate wetting and drying irrigation. These findings highlight the efficacy of applying recommended dosages based on fertilizer decision models, offering the potential for reduced fertilizer use while maintaining yield in rice cultivation.Keywords: intelligent fertilizer management, nitrogen top and ear dressing fertilizer, rice, yield optimization
Procedia PDF Downloads 8438529 A Hybrid Data-Handler Module Based Approach for Prioritization in Quality Function Deployment
Authors: P. Venu, Joeju M. Issac
Abstract:
Quality Function Deployment (QFD) is a systematic technique that creates a platform where the customer responses can be positively converted to design attributes. The accuracy of a QFD process heavily depends on the data that it is handling which is captured from customers or QFD team members. Customized computer programs that perform Quality Function Deployment within a stipulated time have been used by various companies across the globe. These programs heavily rely on storage and retrieval of the data on a common database. This database must act as a perfect source with minimum missing values or error values in order perform actual prioritization. This paper introduces a missing/error data handler module which uses Genetic Algorithm and Fuzzy numbers. The prioritization of customer requirements of sesame oil is illustrated and a comparison is made between proposed data handler module-based deployment and manual deployment.Keywords: hybrid data handler, QFD, prioritization, module-based deployment
Procedia PDF Downloads 29738528 General Architecture for Automation of Machine Learning Practices
Authors: U. Borasi, Amit Kr. Jain, Rakesh, Piyush Jain
Abstract:
Data collection, data preparation, model training, model evaluation, and deployment are all processes in a typical machine learning workflow. Training data needs to be gathered and organised. This often entails collecting a sizable dataset and cleaning it to remove or correct any inaccurate or missing information. Preparing the data for use in the machine learning model requires pre-processing it after it has been acquired. This often entails actions like scaling or normalising the data, handling outliers, selecting appropriate features, reducing dimensionality, etc. This pre-processed data is then used to train a model on some machine learning algorithm. After the model has been trained, it needs to be assessed by determining metrics like accuracy, precision, and recall, utilising a test dataset. Every time a new model is built, both data pre-processing and model training—two crucial processes in the Machine learning (ML) workflow—must be carried out. Thus, there are various Machine Learning algorithms that can be employed for every single approach to data pre-processing, generating a large set of combinations to choose from. Example: for every method to handle missing values (dropping records, replacing with mean, etc.), for every scaling technique, and for every combination of features selected, a different algorithm can be used. As a result, in order to get the optimum outcomes, these tasks are frequently repeated in different combinations. This paper suggests a simple architecture for organizing this largely produced “combination set of pre-processing steps and algorithms” into an automated workflow which simplifies the task of carrying out all possibilities.Keywords: machine learning, automation, AUTOML, architecture, operator pool, configuration, scheduler
Procedia PDF Downloads 5838527 Global Solar Irradiance: Data Imputation to Analyze Complementarity Studies of Energy in Colombia
Authors: Jeisson A. Estrella, Laura C. Herrera, Cristian A. Arenas
Abstract:
The Colombian electricity sector has been transforming through the insertion of new energy sources to generate electricity, one of them being solar energy, which is being promoted by companies interested in photovoltaic technology. The study of this technology is important for electricity generation in general and for the planning of the sector from the perspective of energy complementarity. Precisely in this last approach is where the project is located; we are interested in answering the concerns about the reliability of the electrical system when climatic phenomena such as El Niño occur or in defining whether it is viable to replace or expand thermoelectric plants. Reliability of the electrical system when climatic phenomena such as El Niño occur, or to define whether it is viable to replace or expand thermoelectric plants with renewable electricity generation systems. In this regard, some difficulties related to the basic information on renewable energy sources from measured data must first be solved, as these come from automatic weather stations. Basic information on renewable energy sources from measured data, since these come from automatic weather stations administered by the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM) and, in the range of study (2005-2019), have significant amounts of missing data. For this reason, the overall objective of the project is to complete the global solar irradiance datasets to obtain time series to develop energy complementarity analyses in a subsequent project. Global solar irradiance data sets to obtain time series that will allow the elaboration of energy complementarity analyses in the following project. The filling of the databases will be done through numerical and statistical methods, which are basic techniques for undergraduate students in technical areas who are starting out as researchers technical areas who are starting out as researchers.Keywords: time series, global solar irradiance, imputed data, energy complementarity
Procedia PDF Downloads 7138526 University Students’ Fear of Missing out and Night Eating Syndrome. A Descriptive Correlational Study
Authors: Mohammed Qutishat, Omar Al-Omari, Kholoud Al-Damery, Mohammed Al-Qadiri
Abstract:
Objective: The current study aims to explore the relationship between Night Eating Syndrome and the experiences of Fear of Missing out (FOMO) among college students in Oman. Methods: The study adopted a descriptive correlational design. The total sample was 366 based on defined inclusion criteria. The questionnaires were distributed over one month during the spring semester of 2020. We used a self-report instrument as a measurement tool to investigate the extents of the research phenomena, and it consists of two major sections: fear of missing out Questionnaires and Night Eating Questionnaire. Results: The respondents' age ranged between 18 and 30. The majority of the participants were female 76.7% (204), single 97.7% (266), in their third academic year 28.6% (76), live in –campus, 57.1% (152). The findings of this study showed that fear of missing out experiences are significantly correlated with age (P=.010), gender (P= .005), and daily sleeping hours (P= .007). However, night eating experiences are significantly associated with age (p=018), living arrangement (P= .017), and sleeping hours (P= .000). Conclusion: This article can define a limiting aspect of the relationship between fear of missing out and night eating behaviors. During academic life, students may find themselves overloaded and use their smartphones to do the simplest tasks they have, leading them to skip their meals frequently and interfere with their eating patterns and psychological function. Health awareness programs or the implementation of healthy eating standards and technology uses can be introduced for undergraduates.Keywords: fear of missing out, night eating syndrome, smartphone, addiction
Procedia PDF Downloads 23138525 Attribute Analysis of Quick Response Code Payment Users Using Discriminant Non-negative Matrix Factorization
Authors: Hironori Karachi, Haruka Yamashita
Abstract:
Recently, the system of quick response (QR) code is getting popular. Many companies introduce new QR code payment services and the services are competing with each other to increase the number of users. For increasing the number of users, we should grasp the difference of feature of the demographic information, usage information, and value of users between services. In this study, we conduct an analysis of real-world data provided by Nomura Research Institute including the demographic data of users and information of users’ usages of two services; LINE Pay, and PayPay. For analyzing such data and interpret the feature of them, Nonnegative Matrix Factorization (NMF) is widely used; however, in case of the target data, there is a problem of the missing data. EM-algorithm NMF (EMNMF) to complete unknown values for understanding the feature of the given data presented by matrix shape. Moreover, for comparing the result of the NMF analysis of two matrices, there is Discriminant NMF (DNMF) shows the difference of users features between two matrices. In this study, we combine EMNMF and DNMF and also analyze the target data. As the interpretation, we show the difference of the features of users between LINE Pay and Paypay.Keywords: data science, non-negative matrix factorization, missing data, quality of services
Procedia PDF Downloads 13138524 Hormone Replacement Therapy (HRT) and Its Impact on the All-Cause Mortality of UK Women: A Matched Cohort Study 1984-2017
Authors: Nurunnahar Akter, Elena Kulinskaya, Nicholas Steel, Ilyas Bakbergenuly
Abstract:
Although Hormone Replacement Therapy (HRT) is an effective treatment in ameliorating menopausal symptoms, it has mixed effects on different health outcomes, increasing, for instance, the risk of breast cancer. Because of this, many symptomatic women are left untreated. Untreated menopausal symptoms may result in other health issues, which eventually put an extra burden and costs to the health care system. All-cause mortality analysis may explain the net benefits and risks of the HRT therapy. However, it received far less attention in HRT studies. This study investigated the impact of HRT on all-cause mortality using electronically recorded primary care data from The Health Improvement Network (THIN) that broadly represents the female population in the United Kingdom (UK). The study entry date for this study was the record of the first HRT prescription from 1984, and patients were followed up until death or transfer to another GP practice or study end date, which was January 2017. 112,354 HRT users (cases) were matched with 245,320 non-users by age at HRT initiation and general practice (GP). The hazards of all-cause mortality associated with HRT were estimated by a parametric Weibull-Cox model adjusting for a wide range of important medical, lifestyle, and socio-demographic factors. The multilevel multiple imputation techniques were used to deal with missing data. This study found that during 32 years of follow-up, combined HRT reduced the hazard ratio (HR) of all-cause mortality by 9% (HR: 0.91; 95% Confidence Interval, 0.88-0.94) in women of age between 46 to 65 at first treatment compared to the non-users of the same age. Age-specific mortality analyses found that combined HRT decreased mortality by 13% (HR: 0.87; 95% CI, 0.82-0.92), 12% (HR: 0.88; 95% CI, 0.82-0.93), and 8% (HR: 0.92; 95% CI, 0.85-0.98), in 51 to 55, 56 to 60, and 61 to 65 age group at first treatment, respectively. There was no association between estrogen-only HRT and women’s all-cause mortality. The findings from this study may help to inform the choices of women at menopause and to further educate the clinicians and resource planners.Keywords: hormone replacement therapy, multiple imputations, primary care data, the health improvement network (THIN)
Procedia PDF Downloads 17138523 Genome-Wide Association Study Identify COL2A1 as a Susceptibility Gene for the Hand Development Failure of Kashin-Beck Disease
Authors: Feng Zhang
Abstract:
Kashin-Beck disease (KBD) is a chronic osteochondropathy. The mechanism of hand growth and development failure of KBD remains elusive now. In this study, we conducted a two-stage genome-wide association study (GWAS) of palmar length-width ratio (LWR) of KBD, totally involving 493 Chinese Han KBD patients. Affymetrix Genome Wide Human SNP Array 6.0 was applied for SNP genotyping. Association analysis was conducted by PLINK software. Imputation analysis was performed by IMPUTE against the reference panel of the 1000 genome project. In the GWAS, the most significant association was observed between palmar LWR and rs2071358 of COL2A1 gene (P value = 4.68×10-8). Imputation analysis identified 3 SNPs surrounding rs2071358 with significant or suggestive association signals. Replication study observed additional significant association signals at both rs2071358 (P value = 0.017) and rs4760608 (P value = 0.002) of COL2A1 gene after Bonferroni correction. Our results suggest that COL2A1 gene was a novel susceptibility gene involved in the growth and development failure of hand of KBD.Keywords: Kashin-Beck disease, genome-wide association study, COL2A1, hand
Procedia PDF Downloads 22038522 Analytical Study of Data Mining Techniques for Software Quality Assurance
Authors: Mariam Bibi, Rubab Mehboob, Mehreen Sirshar
Abstract:
Satisfying the customer requirements is the ultimate goal of producing or developing any product. The quality of the product is decided on the bases of the level of customer satisfaction. There are different techniques which have been reported during the survey which enhance the quality of the product through software defect prediction and by locating the missing software requirements. Some mining techniques were proposed to assess the individual performance indicators in collaborative environment to reduce errors at individual level. The basic intention is to produce a product with zero or few defects thereby producing a best product quality wise. In the analysis of survey the techniques like Genetic algorithm, artificial neural network, classification and clustering techniques and decision tree are studied. After analysis it has been discovered that these techniques contributed much to the improvement and enhancement of the quality of the product.Keywords: data mining, defect prediction, missing requirements, software quality
Procedia PDF Downloads 46938521 Influence of Atmospheric Pollutants on Child Respiratory Disease in Cartagena De Indias, Colombia
Authors: Jose A. Alvarez Aldegunde, Adrian Fernandez Sanchez, Matthew D. Menden, Bernardo Vila Rodriguez
Abstract:
Up to five statistical pre-processings have been carried out considering the pollutant records of the stations present in Cartagena de Indias, Colombia, also taking into account the childhood asthma incidence surveys conducted in hospitals in the city by the Health Ministry of Colombia for this study. These pre-processings have consisted of different techniques such as the determination of the quality of data collection, determination of the quality of the registration network, identification and debugging of errors in data collection, completion of missing data and purified data, as well as the improvement of the time scale of records. The characterization of the quality of the data has been conducted by means of density analysis of the pollutant registration stations using ArcGis Software and through mass balance techniques, making it possible to determine inconsistencies in the records relating the registration data between stations following the linear regression. The results obtained in this process have highlighted the positive quality in the pollutant registration process. Consequently, debugging of errors has allowed us to identify certain data as statistically non-significant in the incidence and series of contamination. This data, together with certain missing records in the series recorded by the measuring stations, have been completed by statistical imputation equations. Following the application of these prior processes, the basic series of incidence data for respiratory disease and pollutant records have allowed the characterization of the influence of pollutants on respiratory diseases such as, for example, childhood asthma. This characterization has been carried out using statistical correlation methods, including visual correlation, simple linear regression correlation and spectral analysis with PAST Software which identifies maximum periodicity cycles and minimums under the formula of the Lomb periodgram. In relation to part of the results obtained, up to eleven maximums and minimums considered contemporary between the incidence records and the particles have been identified taking into account the visual comparison. The spectral analyses that have been performed on the incidence and the PM2.5 have returned a series of similar maximum periods in both registers, which are at a maximum during a period of one year and another every 25 days (0.9 and 0.07 years). The bivariate analysis has managed to characterize the variable "Daily Vehicular Flow" in the ninth position of importance of a total of 55 variables. However, the statistical correlation has not obtained a favorable result, having obtained a low value of the R2 coefficient. The series of analyses conducted has demonstrated the importance of the influence of pollutants such as PM2.5 in the development of childhood asthma in Cartagena. The quantification of the influence of the variables has been able to determine that there is a 56% probability of dependence between PM2.5 and childhood respiratory asthma in Cartagena. Considering this justification, the study could be completed through the application of the BenMap Software, throwing a series of spatial results of interpolated values of the pollutant contamination records that exceeded the established legal limits (represented by homogeneous units up to the neighborhood level) and results of the impact on the exacerbation of pediatric asthma. As a final result, an economic estimate (in Colombian Pesos) of the monthly and individual savings derived from the percentage reduction of the influence of pollutants in relation to visits to the Hospital Emergency Room due to asthma exacerbation in pediatric patients has been granted.Keywords: Asthma Incidence, BenMap, PM2.5, Statistical Analysis
Procedia PDF Downloads 11638520 Multiscale Connected Component Labelling and Applications to Scientific Microscopy Image Processing
Authors: Yayun Hsu, Henry Horng-Shing Lu
Abstract:
In this paper, a new method is proposed to extending the method of connected component labeling from processing binary images to multi-scale modeling of images. By using the adaptive threshold of multi-scale attributes, this approach minimizes the possibility of missing those important components with weak intensities. In addition, the computational cost of this approach remains similar to that of the typical approach of component labeling. Then, this methodology is applied to grain boundary detection and Drosophila Brain-bow neuron segmentation. These demonstrate the feasibility of the proposed approach in the analysis of challenging microscopy images for scientific discovery.Keywords: microscopic image processing, scientific data mining, multi-scale modeling, data mining
Procedia PDF Downloads 43538519 Validation of a Placebo Method with Potential for Blinding in Ultrasound-Guided Dry Needling
Authors: Johnson C. Y. Pang, Bo Peng, Kara K. L. Reeves, Allan C. L. Fud
Abstract:
Objective: Dry needling (DN) has long been used as a treatment method for various musculoskeletal pain conditions. However, the evidence level of the studies was low due to the limitations of the methodology. Lack of randomization and inappropriate blinding is potentially the main sources of bias. A method that can differentiate clinical results due to the targeted experimental procedure from its placebo effect is needed to enhance the validity of the trial. Therefore, this study aimed to validate the method as a placebo ultrasound(US)-guided DN for patients with knee osteoarthritis (KOA). Design: This is a randomized controlled trial (RCT). Ninety subjects (25 males and 65 females) aged between 51 and 80 (61.26 ± 5.57) with radiological KOA were recruited and randomly assigned into three groups with a computer program. Group 1 (G1) received real US-guided DN, Group 2 (G2) received placebo US-guided DN, and Group 3 (G3) was the control group. Both G1 and G2 subjects received the same procedure of US-guided DN, except the US monitor was turned off in G2, blinding the G2 subjects to the incorporation of faux US guidance. This arrangement created the placebo effect intended to permit comparison of their results to those who received actual US-guided DN. Outcome measures, including the visual analog scale (VAS) and Knee injury and Osteoarthritis Outcome Score (KOOS) subscales of pain, symptoms, and quality of life (QOL), were analyzed by repeated measures analysis of covariance (ANCOVA) for time effects and group effects. The data regarding the perception of receiving real US-guided DN or placebo US-guided DN were analyzed by the chi-squared test. The missing data were analyzed with the intention-to-treat (ITT) approach if more than 5% of the data were missing. Results: The placebo US-guided DN (G2) subjects had the same perceptions as the use of real US guidance in the advancement of DN (p<0.128). G1 had significantly higher pain reduction (VAS and KOOS-pain) than G2 and G3 at 8 weeks (both p<0.05) only. There was no significant difference between G2 and G3 at 8 weeks (both p>0.05). Conclusion: The method with the US monitor turned off during the application of DN is credible for blinding the participants and allowing researchers to incorporate faux US guidance. The validated placebo US-guided DN technique can aid in investigations of the effects of US-guided DN with short-term effects of pain reduction for patients with KOA. Acknowledgment: This work was supported by the Caritas Institute of Higher Education [grant number IDG200101].Keywords: ultrasound-guided dry needling, dry needling, knee osteoarthritis, physiotheraphy
Procedia PDF Downloads 12038518 Higher Freshwater Fish and Sea Fish Intake Is Inversely Associated with Liver Cancer in Patients with Hepatitis B
Authors: Maomao Cao
Abstract:
Background and aims While the association between higher consumption of fish and lower liver cancer risk has been confirmed, however, the association between specific fish intake and liver cancer risk remains unknown. We aimed to identify the association between specific fish consumption and the risk of liver cancer. Methods: Based on a community-based seropositive hepatitis B cohort involving 18404 individuals, face to face interview was conducted by a standardized questionnaire to acquire baseline information. Three common fish types in this study were analyzed, including freshwater fish, sea fish, and small fish (shrimp, crab, conch, and shell). All participants received liver cancer screening, and possible cases were identified by CT or MRI. Multivariable logistic models were applied to estimate the odds ratio (OR) and 95% confidence intervals (CI). Multivariate multiple imputations were utilized to impute observations with missing values. Results: 179 liver cancer cases were identified. Consumption of freshwater fish and sea fish at least once a week had a strong inverse association with liver cancer risk compared with the lowest intake level, with an adjusted OR of 0.53 (95% CI, 0.38-0.75) and 0.38 (95% CI, 0.19-0.73), respectively. This inverse association was also observed after the imputation. There was no statistically significant association between intake of small fish and liver cancer risk (OR=0.58, 95%, CI 0.32-1.08). Conclusions: Our findings suggest that consumption of freshwater fish and sea fish at least once a week could reduce liver cancer risk.Keywords: cross-sectional study, fish intake, liver cancer, risk factor
Procedia PDF Downloads 27538517 Examining the Missing Feedback Link in Environmental Kuznets Curve Hypothesis
Authors: Apra Sinha
Abstract:
The inverted U-shaped Environmental Kuznets curve (EKC) demonstrates(pollution-income relationship)that initially the pollution and environmental degradation surpass the level of income per capita; however this trend reverses since at the higher income levels, economic growth initiates environmental upgrading. However, what effect does increased environmental degradation has on growth is the missing feedback link which has not been addressed in the EKC hypothesis. This paper examines the missing feedback link in EKC hypothesis in Indian context by examining the casual association between fossil fuel consumption, carbon dioxide emissions and economic growth for India. Fossil fuel consumption here has been taken as a proxy of driver of economic growth. The casual association between the aforementioned variables has been analyzed using five interventions namely 1) urban development for which urbanization has been taken proxy 2) industrial development for which industrial value added has been taken proxy 3) trade liberalization for which sum of exports and imports as a share of GDP has been taken as proxy 4)financial development for which a)domestic credit to private sector and b)net foreign assets has been taken as proxies. The choice of interventions for this study has been done keeping in view the economic liberalization perspective of India. The main aim of the paper is to investigate the missing feedback link for Environmental Kuznets Curve Hypothesis before and after incorporating the intervening variables. The period of study is from 1971 to 2011 as it covers pre and post liberalization era in India. All the data has been taken from World Bank country level indicators. The Johansen and Juselius cointegration testing methodology and Error Correction based Granger causality have been applied on all the variables. The results clearly show that out of five interventions, only in two interventions the missing feedback link is being addressed. This paper can put forward significant policy implications for environment protection and sustainable development.Keywords: environmental Kuznets curve hypothesis, fossil fuel consumption, industrialization, trade liberalization, urbanization
Procedia PDF Downloads 25238516 Validation of a Placebo Method with Potential for Blinding in Ultrasound-Guided Dry Needling
Authors: Johnson C. Y. Pang, Bo Pengb, Kara K. L. Reevesc, Allan C. L. Fud
Abstract:
Objective: Dry needling (DN) has long been used as a treatment method for various musculoskeletal pain conditions. However, the evidence level of the studies was low due to the limitations of the methodology. Lack of randomization and inappropriate blinding are potentially the main sources of bias. A method that can differentiate clinical results due to the targeted experimental procedure from its placebo effect is needed to enhance the validity of the trial. Therefore, this study aimed to validate the method as a placebo ultrasound(US)-guided DN for patients with knee osteoarthritis (KOA). Design: This is a randomized controlled trial (RCT). Ninety subjects (25 males and 65 females) aged between 51 and 80 (61.26±5.57) with radiological KOA were recruited and randomly assigned into three groups with a computer program. Group 1 (G1) received real US-guided DN, Group 2 (G2) received placebo US-guided DN, and Group 3 (G3) was the control group. Both G1 and G2 subjects received the same procedure of US-guided DN, except the US monitor was turned off in G2, blinding the G2 subjects to the incorporation of faux US guidance. This arrangement created the placebo effect intended to permit comparison of their results to those who received actual US-guided DN. Outcome measures, including the visual analog scale (VAS) and Knee injury and Osteoarthritis Outcome Score (KOOS) subscales of pain, symptoms and quality of life (QOL), were analyzed by repeated-measures analysis of covariance (ANCOVA) for time effects and group effects. The data regarding the perception of receiving real US-guided DN or placebo US-guided DN were analyzed by the chi-squared test. The missing data were analyzed with the intention-to-treat (ITT) approach if more than 5% of the data were missing. Results: The placebo US-guided DN (G2) subjects had the same perceptions as the use of real US guidance in the advancement of DN (p<0.128). G1 had significantly higher pain reduction (VAS and KOOS-pain) than G2 and G3 at 8 weeks (both p<0.05) only. There was no significant difference between G2 and G3 at 8 weeks (both p>0.05). Conclusion: The method with the US monitor turned off during the application of DN is credible for blinding the participants and allowing researchers to incorporate faux US guidance. The validated placebo US-guided DN technique can aid in investigations of the effects of US-guided DN with short-term effects of pain reduction for patients with KOA. Acknowledgment: This work was supported by the Caritas Institute of Higher Education [grant number IDG200101].Keywords: reliability, jumping, 3D motion analysis, anterior crucial ligament reconstruction
Procedia PDF Downloads 12038515 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering
Authors: Emiel Caron
Abstract:
Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics
Procedia PDF Downloads 19438514 Study of Evapotranspiration for Pune District
Authors: Ranjeet Sable, Mahotsavi Patil, Aadesh Nimbalkar, Prajakta Palaskar, Ritu Sagar
Abstract:
The exact amount of water used by various crops in different climatic conditions is necessary to step for design, planning, and management of irrigation schemes, water resources, scheduling of irrigation systems. Evaporation and transpiration are combinable called as evapotranspiration. Water loss from trees during photosynthesis is called as transpiration and when water gets converted into gaseous state is called evaporation. For calculation of correct evapotranspiration, we have to choose the method in such way that is should be suitable and require minimum climatic data also it should be applicable for wide range of climatic conditions. In hydrology, there are multiple correlations and regression is generally used to develop relationships between three or more hydrological variables by knowing the dependence between them. This research work includes the study of various methods for calculation of evapotranspiration and selects reasonable and suitable one Pune region (Maharashtra state). As field methods are very costly, time-consuming and not give appropriate results if the suitable climate is not maintained. Observation recorded at Pune metrological stations are used to calculate evapotranspiration with the help of Radiation Method (RAD), Modified Penman Method (MPM), Thornthwaite Method (THW), Blaney-Criddle (BCL), Christiansen Equation (CNM), Hargreaves Method (HGM), from which Hargreaves and Thornthwaite are temperature based methods. Performance of all these methods are compared with Modified Penman method and method which showing less variation with standard Modified Penman method (MPM) is selected as the suitable one. Evapotranspiration values are estimated on a monthly basis. Comparative analysis in this research used for selection for raw data-dependent methods in case of missing data.Keywords: Blaney-Criddle, Christiansen equation evapotranspiration, Hargreaves method, precipitations, Penman method, water use efficiency
Procedia PDF Downloads 27138513 Improving the Statistics Nature in Research Information System
Authors: Rajbir Cheema
Abstract:
In order to introduce an integrated research information system, this will provide scientific institutions with the necessary information on research activities and research results in assured quality. Since data collection, duplication, missing values, incorrect formatting, inconsistencies, etc. can arise in the collection of research data in different research information systems, which can have a wide range of negative effects on data quality, the subject of data quality should be treated with better results. This paper examines the data quality problems in research information systems and presents the new techniques that enable organizations to improve their quality of research information.Keywords: Research information systems (RIS), research information, heterogeneous sources, data quality, data cleansing, science system, standardization
Procedia PDF Downloads 15838512 On Deterministic Chaos: Disclosing the Missing Mathematics from the Lorenz-Haken Equations
Authors: Meziane Belkacem
Abstract:
We aim at converting the original 3D Lorenz-Haken equations, which describe laser dynamics –in terms of self-pulsing and chaos- into 2-second-order differential equations, out of which we extract the so far missing mathematics and corroborations with respect to nonlinear interactions. Leaning on basic trigonometry, we pull out important outcomes; a fundamental result attributes chaos to forbidden periodic solutions inside some precisely delimited region of the control parameter space that governs the bewildering dynamics.Keywords: Physics, optics, nonlinear dynamics, chaos
Procedia PDF Downloads 15838511 Crop Recommendation System Using Machine Learning
Authors: Prathik Ranka, Sridhar K, Vasanth Daniel, Mithun Shankar
Abstract:
With growing global food needs and climate uncertainties, informed crop choices are critical for increasing agricultural productivity. Here we propose a machine learning-based crop recommendation system to help farmers in choosing the most proper crops according to their geographical regions and soil properties. We can deploy algorithms like Decision Trees, Random Forests and Support Vector Machines on a broad dataset that consists of climatic factors, soil characteristics and historical crop yields to predict the best choice of crops. The approach includes first preprocessing the data after assessing them for missing values, unlike in previous jobs where we used all the available information and then transformed because there was no way such a model could have worked with missing data, and normalizing as throughput that will be done over a network to get best results out of our machine learning division. The model effectiveness is measured through performance metrics like accuracy, precision and recall. The resultant app provides a farmer-friendly dashboard through which farmers can enter their local conditions and receive individualized crop suggestions.Keywords: crop recommendation, precision agriculture, crop, machine learning
Procedia PDF Downloads 1938510 Improving Temporal Correlations in Empirical Orthogonal Function Expansions for Data Interpolating Empirical Orthogonal Function Algorithm
Authors: Ping Bo, Meng Yunshan
Abstract:
Satellite-derived sea surface temperature (SST) is a key parameter for many operational and scientific applications. However, the disadvantage of SST data is a high percentage of missing data which is mainly caused by cloud coverage. Data Interpolating Empirical Orthogonal Function (DINEOF) algorithm is an EOF-based technique for reconstructing the missing data and has been widely used in oceanographic field. The reconstruction of SST images within a long time series using DINEOF can cause large discontinuities and one solution for this problem is to filter the temporal covariance matrix to reduce the spurious variability. Based on the previous researches, an algorithm is presented in this paper to improve the temporal correlations in EOF expansion. Similar with the previous researches, a filter, such as Laplacian filter, is implemented on the temporal covariance matrix, but the temporal relationship between two consecutive images which is used in the filter is considered in the presented algorithm, for example, two images in the same season are more likely correlated than those in the different seasons, hence the latter one is less weighted in the filter. The presented approach is tested for the monthly nighttime 4-km Advanced Very High Resolution Radiometer (AVHRR) Pathfinder SST for the long-term period spanning from 1989 to 2006. The results obtained from the presented algorithm are compared to those from the original DINEOF algorithm without filtering and from the DINEOF algorithm with filtering but without taking temporal relationship into account.Keywords: data interpolating empirical orthogonal function, image reconstruction, sea surface temperature, temporal filter
Procedia PDF Downloads 32538509 Evaluation of DNA Paternity Testing Accuracy of Child Trafficking Cases
Authors: Wing Kam Fung, Kexin Yu
Abstract:
Child trafficking has been a serious problem in modern China. The Chinese government has established a national anti-trafficking DNA database to help reunite missing children with their families. The database collects DNA information from missing children's parents, trafficked and homeless children, then conducts paternity tests to find matched pairs. This paper considers the matching accuracy in such cases by looking into the exclusion probability in paternity testing. First, the situation of child trafficking in China is introduced. Next, derivations of the exclusion probability for both one-parent and two-parents cases are given, followed by extension to allow for 1 or 2 mutations. The accuracy of paternity testing of child trafficking cases is then assessed using the exclusion probabilities and available data. Finally, the number of loci that should be used to ensure a correct match is investigated.Keywords: child trafficking, DNA database, exclusion probability, paternity testing
Procedia PDF Downloads 45938508 Identifying Missing Component in the Bechdel Test Using Principal Component Analysis Method
Authors: Raghav Lakhotia, Chandra Kanth Nagesh, Krishna Madgula
Abstract:
A lot has been said and discussed regarding the rationale and significance of the Bechdel Score. It became a digital sensation in 2013, when Swedish cinemas began to showcase the Bechdel test score of a film alongside its rating. The test has drawn criticism from experts and the film fraternity regarding its use to rate the female presence in a movie. The pundits believe that the score is too simplified and the underlying criteria of a film to pass the test must include 1) at least two women, 2) who have at least one dialogue, 3) about something other than a man, is egregious. In this research, we have considered a few more parameters which highlight how we represent females in film, like the number of female dialogues in a movie, dialogue genre, and part of speech tags in the dialogue. The parameters were missing in the existing criteria to calculate the Bechdel score. The research aims to analyze 342 movies scripts to test a hypothesis if these extra parameters, above with the current Bechdel criteria, are significant in calculating the female representation score. The result of the Principal Component Analysis method concludes that the female dialogue content is a key component and should be considered while measuring the representation of women in a work of fiction.Keywords: Bechdel test, dialogue genre, parts of speech tags, principal component analysis
Procedia PDF Downloads 14438507 RGB-D SLAM Algorithm Based on pixel level Dense Depth Map
Authors: Hao Zhang, Hongyang Yu
Abstract:
Scale uncertainty is a well-known challenging problem in visual SLAM. Because RGB-D sensor provides depth information, RGB-D SLAM improves this scale uncertainty problem. However, due to the limitation of physical hardware, the depth map output by RGB-D sensor usually contains a large area of missing depth values. These missing depth information affect the accuracy and robustness of RGB-D SLAM. In order to reduce these effects, this paper completes the missing area of the depth map output by RGB-D sensor and then fuses the completed dense depth map into ORB SLAM2. By adding the process of obtaining pixel-level dense depth maps, a better RGB-D visual SLAM algorithm is finally obtained. In the process of obtaining dense depth maps, a deep learning model of indoor scenes is adopted. Experiments are conducted on public datasets and real-world environments of indoor scenes. Experimental results show that the proposed SLAM algorithm has better robustness than ORB SLAM2.Keywords: RGB-D, SLAM, dense depth, depth map
Procedia PDF Downloads 14338506 Robust Heart Rate Estimation from Multiple Cardiovascular and Non-Cardiovascular Physiological Signals Using Signal Quality Indices and Kalman Filter
Authors: Shalini Rankawat, Mansi Rankawat, Rahul Dubey, Mazad Zaveri
Abstract:
Physiological signals such as electrocardiogram (ECG) and arterial blood pressure (ABP) in the intensive care unit (ICU) are often seriously corrupted by noise, artifacts, and missing data, which lead to errors in the estimation of heart rate (HR) and incidences of false alarm from ICU monitors. Clinical support in ICU requires most reliable heart rate estimation. Cardiac activity, because of its relatively high electrical energy, may introduce artifacts in Electroencephalogram (EEG), Electrooculogram (EOG), and Electromyogram (EMG) recordings. This paper presents a robust heart rate estimation method by detection of R-peaks of ECG artifacts in EEG, EMG & EOG signals, using energy-based function and a novel Signal Quality Index (SQI) assessment technique. SQIs of physiological signals (EEG, EMG, & EOG) were obtained by correlation of nonlinear energy operator (teager energy) of these signals with either ECG or ABP signal. HR is estimated from ECG, ABP, EEG, EMG, and EOG signals from separate Kalman filter based upon individual SQIs. Data fusion of each HR estimate was then performed by weighing each estimate by the Kalman filters’ SQI modified innovations. The fused signal HR estimate is more accurate and robust than any of the individual HR estimate. This method was evaluated on MIMIC II data base of PhysioNet from bedside monitors of ICU patients. The method provides an accurate HR estimate even in the presence of noise and artifacts.Keywords: ECG, ABP, EEG, EMG, EOG, ECG artifacts, Teager-Kaiser energy, heart rate, signal quality index, Kalman filter, data fusion
Procedia PDF Downloads 69638505 Reusing Assessments Tests by Generating Arborescent Test Groups Using a Genetic Algorithm
Authors: Ovidiu Domşa, Nicolae Bold
Abstract:
Using Information and Communication Technologies (ICT) notions in education and three basic processes of education (teaching, learning and assessment) can bring benefits to the pupils and the professional development of teachers. In this matter, we refer to these notions as concepts taken from the informatics area and apply them to the domain of education. These notions refer to genetic algorithms and arborescent structures, used in the specific process of assessment or evaluation. This paper uses these kinds of notions to generate subtrees from a main tree of tests related between them by their degree of difficulty. These subtrees must contain the highest number of connections between the nodes and the lowest number of missing edges (which are subtrees of the main tree) and, in the particular case of the non-existence of a subtree with no missing edges, the subtrees which have the lowest (minimal) number of missing edges between the nodes, where a node is a test and an edge is a direct connection between two tests which differs by one degree of difficulty. The subtrees are represented as sequences. The tests are the same (a number coding a test represents that test in every sequence) and they are reused for each sequence of tests.Keywords: chromosome, genetic algorithm, subtree, test
Procedia PDF Downloads 325