Search results for: locally weighted regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4104

Search results for: locally weighted regression

3984 A Critical Study of the Performance of Self Compacting Concrete (SCC) Using Locally Supplied Materials in Bahrain

Authors: A. Umar, A. Tamimi

Abstract:

Development of new types of concrete with improved performance is a very important issue for the whole building industry. The development is based on the optimization of the concrete mix design, with an emphasis not only on the workability and mechanical properties but also to the durability and the reliability of the concrete structure in general. Self-compacting concrete (SCC) is a high-performance material designed to flow into formwork under its own weight and without the aid of mechanical vibration. At the same time it is cohesive enough to fill spaces of almost any size and shape without segregation or bleeding. Construction time is shorter and production of SCC is environmentally friendly (no noise, no vibration). Furthermore, SCC produces a good surface finish. Despite these advantages, SCC has not gained much local acceptance though it has been promoted in the Middle East for the last ten to twelve years. The reluctance in utilizing the advantages of SCC, in Bahrain, may be due to lack of research or published data pertaining to locally produced SCC. Therefore, there is a need to conduct studies on SCC using locally available material supplies. From the literature, it has been observed that the use of viscosity modifying admixtures (VMA), micro silica and glass fibers have proved to be very effective in stabilizing the rheological properties and the strength of fresh and hardened properties of self-compacting concrete (SCC). Therefore, in the present study, it is proposed to carry out investigations of SCC with combinations of various dosages of VMAs with and without micro silica and glass fibers and to study their influence on the properties of fresh and hardened concrete.

Keywords: self-compacting concrete, viscosity modifying admixture, micro silica, glass fibers

Procedia PDF Downloads 618
3983 Predicting Bridge Pier Scour Depth with SVM

Authors: Arun Goel

Abstract:

Prediction of maximum local scour is necessary for the safety and economical design of the bridges. A number of equations have been developed over the years to predict local scour depth using laboratory data and a few pier equations have also been proposed using field data. Most of these equations are empirical in nature as indicated by the past publications. In this paper, attempts have been made to compute local depth of scour around bridge pier in dimensional and non-dimensional form by using linear regression, simple regression and SVM (Poly and Rbf) techniques along with few conventional empirical equations. The outcome of this study suggests that the SVM (Poly and Rbf) based modeling can be employed as an alternate to linear regression, simple regression and the conventional empirical equations in predicting scour depth of bridge piers. The results of present study on the basis of non-dimensional form of bridge pier scour indicates the improvement in the performance of SVM (Poly and Rbf) in comparison to dimensional form of scour.

Keywords: modeling, pier scour, regression, prediction, SVM (Poly and Rbf kernels)

Procedia PDF Downloads 424
3982 CAG Repeat Polymorphism of Androgen Receptor and Female Sexual Functions in Egyptian Female Population

Authors: Azza Gaber Farag, Yasser Atta Shehata, Sara Elsayed Elghazouly, Mustafa Elsayed Elshaib, Nesreen Gamal Elden Elhelbawy

Abstract:

Background: Androgen receptor (AR) polymorphism in cytosine adenineguanine (CAG) repeat has an effect on the functional capacity of AR in males. However, little researches in this field are available regarding female sexual function. Aim: To investigate the possible link between polymorphism in the CAG repeat of AR gene and female sexual function in a sample of the Egyptian population. Materials and methods: 500 Egyptian married females completed a questionnaire regarding sociodemographic, reproductive, and sexual data. AR CAG repeat length was analyzed for those having female sexual dysfunctions (FSD) using real-time PCR. Results: The most sensitive domain to AR CAG repeat length was the orgasm domain that showed significant positive correlations with short allele (p=0.001), long allele (p=.015), biallellic mean (p=.000), and X weighted biallelic mean (p=.000). The satisfaction domain had significant positive correlations with the biallelic mean (p=.035), and the X weighted biallelic mean (p=. 032). However, the pain domain was of significant negative correlations with AR polymorphism of short allele (p=.002), biallelic mean (p=.013), and X weighted biallelic mean (p = . 011). Conclusions: AR polymorphism could represent a non-negligible aspect in female sexual function. The lower AR CAG repeat polymorphism was of significant impact on FSD, affecting mainly female orgasm followed by pain disorders that finally reflected On her sexual satisfaction.

Keywords: female sexual dysfunction, androgen receptor, CAG repeat polymorphism, androgen

Procedia PDF Downloads 144
3981 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 114
3980 Reminiscence Therapy for Alzheimer’s Disease Restrained on Logistic Regression Based Linear Bootstrap Aggregating

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Xianpei Li, Yanmin Yuan, Tracy Lin Huan

Abstract:

Researchers are doing enchanting research into the inherited features of Alzheimer’s disease and probable consistent therapies. In Alzheimer’s, memories are extinct in reverse order; memories formed lately are more transitory than those from formerly. Reminiscence therapy includes the conversation of past actions, trials and knowledges with another individual or set of people, frequently with the help of perceptible reminders such as photos, household and other acquainted matters from the past, music and collection of tapes. In this manuscript, the competence of reminiscence therapy for Alzheimer’s disease is measured using logistic regression based linear bootstrap aggregating. Logistic regression is used to envisage the experiential features of the patient’s memory through various therapies. Linear bootstrap aggregating shows better stability and accuracy of reminiscence therapy used in statistical classification and regression of memories related to validation therapy, supportive psychotherapy, sensory integration and simulated presence therapy.

Keywords: Alzheimer’s disease, linear bootstrap aggregating, logistic regression, reminiscence therapy

Procedia PDF Downloads 273
3979 Predicting Survival in Cancer: How Cox Regression Model Compares to Artifial Neural Networks?

Authors: Dalia Rimawi, Walid Salameh, Amal Al-Omari, Hadeel AbdelKhaleq

Abstract:

Predication of Survival time of patients with cancer, is a core factor that influences oncologist decisions in different aspects; such as offered treatment plans, patients’ quality of life and medications development. For a long time proportional hazards Cox regression (ph. Cox) was and still the most well-known statistical method to predict survival outcome. But due to the revolution of data sciences; new predication models were employed and proved to be more flexible and provided higher accuracy in that type of studies. Artificial neural network is one of those models that is suitable to handle time to event predication. In this study we aim to compare ph Cox regression with artificial neural network method according to data handling and Accuracy of each model.

Keywords: Cox regression, neural networks, survival, cancer.

Procedia PDF Downloads 158
3978 Survival and Hazard Maximum Likelihood Estimator with Covariate Based on Right Censored Data of Weibull Distribution

Authors: Al Omari Mohammed Ahmed

Abstract:

This paper focuses on Maximum Likelihood Estimator with Covariate. Covariates are incorporated into the Weibull model. Under this regression model with regards to maximum likelihood estimator, the parameters of the covariate, shape parameter, survival function and hazard rate of the Weibull regression distribution with right censored data are estimated. The mean square error (MSE) and absolute bias are used to compare the performance of Weibull regression distribution. For the simulation comparison, the study used various sample sizes and several specific values of the Weibull shape parameter.

Keywords: weibull regression distribution, maximum likelihood estimator, survival function, hazard rate, right censoring

Procedia PDF Downloads 412
3977 Local Revenue Generation: Its Contribution to the Development of the Municipality of Bacolod, Lanao Del Norte

Authors: Louvill Manangan Ozarraga

Abstract:

this study was designed to ascertain the concept of revenue generation system of Bacolod, Lanao del Norte, through the completely enumerated elected officials and permanent employees sample respondents. The pertinent data were obtained through the use of structured questionnaire and with the help of key informants. The study utilized a cross-sectional survey design to analyze and interpret the data using frequency count, percentage distribution, and weighted mean. For the major findings, the local revenue generation of the Municipality has increased by Php 4,465,394.21 roughly 73.52% from years 2018 to 2020. Administrative activities help the Municipality cope up with development namely, issuance of ordinance, personnel augmentation and collection strategies. Moreover, respondents were undecided whether revenue generation contributed to infrastructures and purchases of assets. Majority of the respondents agreed that the municipality’s local revenue generation contributes to the social welfare of its constituents. Also, the respondents disagreed that locally generated revenue augments the 20% development fund. The study revealed that there is a big difference on the 2018 and 2020 Real Property Tax (RPT) collection. No committee was created to monitor and supervise the municipal revenue generation system. The Municipality, through partnership with TESDA, provides skilled-job opportunity to its constituents and participants.

Keywords: contribution, development, Bacolod Lanao del Norte, revenue generation system

Procedia PDF Downloads 55
3976 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 44
3975 An EWMA P-Chart Based on Improved Square Root Transformation

Authors: Saowanit Sukparungsee

Abstract:

Generally, the traditional Shewhart p chart has been developed by for charting the binomial data. This chart has been developed using the normal approximation with condition as low defect level and the small to moderate sample size. In real applications, however, are away from these assumptions due to skewness in the exact distribution. In this paper, a modified Exponentially Weighted Moving Average (EWMA) control chat for detecting a change in binomial data by improving square root transformations, namely ISRT p EWMA control chart. The numerical results show that ISRT p EWMA chart is superior to ISRT p chart for small to moderate shifts, otherwise, the latter is better for large shifts.

Keywords: number of defects, exponentially weighted moving average, average run length, square root transformations

Procedia PDF Downloads 405
3974 Impact of a Locally-Prepared Fermented Alcoholic Beverage from Jaggery on the Gut Bacterial Profile of the Tea-Tribal Populations of Assam, India

Authors: Rupamoni Thakur, Madhusmita Dehingia, Narayan C. Talukdar, Mojibur R. Khan

Abstract:

The human gut is an extremely active fermentation site and is inhabited by diverse bacterial species. Consumption of alcoholic beverages has been shown to substantially modulate the human gut bacterial profile (GBP) of an individual. Assam, a major north-eastern state of India, is home to a number of tribal populations of which the tea-tribes form a major community. These tea-tribal communities are known to prepare and consume a locally-prepared alcoholic beverage from fermented jaggery, whose chemical composition is unknown. In this study, we demonstrate the effect of daily intake of the locally-prepared alcoholic beverage on the GBP of the tea-tribal communities and correlate it with the changes in the biochemical biomarkers of the population. The fecal bacterial diversity of 40 drinkers and 35 non-drinking healthy individuals were analyzed by polymerase chain reaction (PCR)–denaturing gradient gel electrophoresis (DGGE). The results suggested that the GBP was significantly modulated in the fermented-beverage consuming subjects. Significant difference was also observed in the serum biochemical parameters such as triglyceride, total cholesterol and the liver marker enzymes (ASAT/ALAT and GGT). Further studies to identify the GBP of drinkers vs non-drinkers through Next-generation Sequencing (NGS) analysis and to correlate the changes with the biochemical biomarkers of the population is underway.

Keywords: alcoholic beverage, gut bacterial profile, PCR-DGGE analysis, tea-tribes of India

Procedia PDF Downloads 282
3973 Determining the Causality Variables in Female Genital Mutilation: A Factor Screening Approach

Authors: Ekele Alih, Enejo Jalija

Abstract:

Female Genital Mutilation (FGM) is made up of three types namely: Clitoridectomy, Excision and Infibulation. In this study, we examine the factors responsible for FGM in order to identify the causality variables in a logistic regression approach. From the result of the survey conducted by the Public Health Division, Nigeria Institute of Medical Research, Yaba, Lagos State, the tau statistic, τ was used to screen 9 factors that causes FGM in order to select few of the predictors before multiple regression equation is obtained. The need for this may be that the sample size may not be able to sustain having a regression with all the predictors or to avoid multi-collinearity. A total of 300 respondents, comprising 150 adult males and 150 adult females were selected for the household survey based on the multi-stage sampling procedure. The tau statistic,

Keywords: female genital mutilation, logistic regression, tau statistic, African society

Procedia PDF Downloads 227
3972 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 36
3971 Statistical Convergence of the Szasz-Mirakjan-Kantorovich-Type Operators

Authors: Rishikesh Yadav, Ramakanta Meher, Vishnu Narayan Mishra

Abstract:

The main aim of this article is to investigate the statistical convergence of the summation of integral type operators and to obtain the weighted statistical convergence. The rate of statistical convergence by means of modulus of continuity and function belonging to the Lipschitz class are also studied. We discuss the convergence of the defined operators by graphical representation and put a better rate of convergence than the Szasz-Mirakjan-Kantorovich operators. In the last section, we extend said operators into bivariate operators to study about the rate of convergence in sense of modulus of continuity and by means of Lipschitz class by using function of two variables.

Keywords: The Szasz-Mirakjan-Kantorovich operators, statistical convergence, modulus of continuity, Peeters K-functional, weighted modulus of continuity

Procedia PDF Downloads 168
3970 Relationship between Reproduction Performances and Coat Characteristics of Montbeliarde Cows during Hot Season in Algeria

Authors: Sara Lamari, Toufik Madani

Abstract:

This study aimed to explore the relationship between reproduction performances and coat characteristics of Montbéliarde cows born in Algeria or imported from Europe during the hot season in Algeria. Hair coat traits (hair coat color, Hair Weight, hair length, the number of hair per unit area, total hair diameters and hair medulla diameters) were estimated in 18 imported cattle and 49 locally born cows. These traits were measured in an area of 20cm below the dorsal line in the center of the thorax. Results showed that hair coats were significantly different between locally born and imported cows. Imported cows had whiter coats when compared to locally born cows for Montbéliarde cows. A significant effect of total hair diameter was observed on the interval from calving to conception (IC) for imported Montbéliarde cows, suggesting less incidence of heat stress on reproduction efficiency of cows with thin diameter hair coats. Montbéliarde cows with short hair coat registered significantly more number of mating per conception (2, 28±1, 93 Vs. 1,67±0,92) and IC (98,04±78,81Vs 74.53 ± 35.60 days) when compared to cows with long hairs. Hair works as a temperature regulator in association with muscles in the skin and may affect reproduction performances during hit stress season. It can be assumed that the length and a total diameter of hairs for the Montbeliarde breed appears to be related to their reproductive efficiency.

Keywords: hair coat, reproduction, Montbeliarde cow, hot season

Procedia PDF Downloads 129
3969 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 160
3968 Variation of Compressive Strength of Hollow Sand Crate Block (6”) with Mix Ratio Using Locally Made Cement (Sokoto Cement)

Authors: Idris Adamu Idris

Abstract:

The Nigerian construction industry is faced with problems of failure of structures/buildings. These failures are attributed to the use of low quality construction materials of which sand crate bock is inclusive. The research was conducted to determine the compressive strength of hollow sand crate block (6”) using locally made cement (Sokoto cement). Samples were tested for 7, 14, 21 and 28 days for mix ratio of 1:3 to 1:12. From the laboratory results obtained, a mix ratio of 1:10 corresponding to a minimum compressive strength of 1.9N/mm2 at 7 days should be adopted. This satisfies the BS 2028, 1364 1986 which specified a minimum compressive strength of 1.8N/mm2 at 7 days. At 28 days of curing, the same mix ratio meets the minimum BS standard of 2.5N/mm2 .

Keywords: buildings, cement, construction, hollow sand crate block, Nigeria

Procedia PDF Downloads 360
3967 Modeling Karachi Dengue Outbreak and Exploration of Climate Structure

Authors: Syed Afrozuddin Ahmed, Junaid Saghir Siddiqi, Sabah Quaiser

Abstract:

Various studies have reported that global warming causes unstable climate and many serious impact to physical environment and public health. The increasing incidence of dengue incidence is now a priority health issue and become a health burden of Pakistan. In this study it has been investigated that spatial pattern of environment causes the emergence or increasing rate of dengue fever incidence that effects the population and its health. The climatic or environmental structure data and the Dengue Fever (DF) data was processed by coding, editing, tabulating, recoding, restructuring in terms of re-tabulating was carried out, and finally applying different statistical methods, techniques, and procedures for the evaluation. Five climatic variables which we have studied are precipitation (P), Maximum temperature (Mx), Minimum temperature (Mn), Humidity (H) and Wind speed (W) collected from 1980-2012. The dengue cases in Karachi from 2010 to 2012 are reported on weekly basis. Principal component analysis is applied to explore the climatic variables and/or the climatic (structure) which may influence in the increase or decrease in the number of dengue fever cases in Karachi. PC1 for all the period is General atmospheric condition. PC2 for dengue period is contrast between precipitation and wind speed. PC3 is the weighted difference between maximum temperature and wind speed. PC4 for dengue period contrast between maximum and wind speed. Negative binomial and Poisson regression model are used to correlate the dengue fever incidence to climatic variable and principal component score. Relative humidity is estimated to positively influence on the chances of dengue occurrence by 1.71% times. Maximum temperature positively influence on the chances dengue occurrence by 19.48% times. Minimum temperature affects positively on the chances of dengue occurrence by 11.51% times. Wind speed is effecting negatively on the weekly occurrence of dengue fever by 7.41% times.

Keywords: principal component analysis, dengue fever, negative binomial regression model, poisson regression model

Procedia PDF Downloads 407
3966 Hybrid Artificial Bee Colony and Least Squares Method for Rule-Based Systems Learning

Authors: Ahcene Habbi, Yassine Boudouaoui

Abstract:

This paper deals with the problem of automatic rule generation for fuzzy systems design. The proposed approach is based on hybrid artificial bee colony (ABC) optimization and weighted least squares (LS) method and aims to find the structure and parameters of fuzzy systems simultaneously. More precisely, two ABC based fuzzy modeling strategies are presented and compared. The first strategy uses global optimization to learn fuzzy models, the second one hybridizes ABC and weighted least squares estimate method. The performances of the proposed ABC and ABC-LS fuzzy modeling strategies are evaluated on complex modeling problems and compared to other advanced modeling methods.

Keywords: automatic design, learning, fuzzy rules, hybrid, swarm optimization

Procedia PDF Downloads 409
3965 Impact of Locally Synthesized Carbon Nanotubes against Some Local Clinical Bacterial Isolates

Authors: Abdul Matin, Muazzama Akhtar, Shahid Nisar, Saddaf Mazzar, Umer Rashid

Abstract:

Antibiotic resistance is an increasing concern worldwide now a day. Neisseria gonorrhea and Staphylococcus aureus are known to cause major human sexually transmitted and respiratory diseases respectively. Nanotechnology is an emerging discipline and its application in various fields especially in medical sciences is gigantic. In the present study, we synthesized multi-walled carbon nanotubes (MWNTs) using acid oxidation method and solubilized MWNTs were with length predominantly >500 nm and diameters ranging from 40 to 50 nm. The locally synthesized MWNTs were used against gram positive and negative bacteria to determine their impact on bacterial growth. Clinical isolates of Neisseria gonorrhea (isolate: 4C-11) and Staphylococcus aureus (isolate: 38541) were obtained from local hospital and normally cultured in LB broth at 37°C. Both clinical strains can be obtained on request from University of Gujarat. Spectophometric assay was performed to determine the impact of MWNTs on bacterial growth in vitro. To determine the effect of MWTNs on test organisms, various concentration of MWNTs were used and recorded observation on various time intervals to understand the growth inhibition pattern. Our results demonstrated that MWNTs exhibited toxic effects to Staphylococcus aureus while showed very limited growth inhibition to Neisseria gonorrhea, which suggests the resistant potential of Neisseria against nanoparticles. Our results clearly demonstrate the gradual decrease in bacterial numbers with passage of time when compared with control. Maximum bacterial inhibition was observed at maximum concentration (50 µg/ml). Our future work will include further characterization and mode of action of our locally synthesized MWNTs. In conclusion, we investigated and reported for the first time the inhibitory potential of locally synthesized MWNTs on local clinical isolates of Staphylococcus aureus and Neisseria gonorrhea.

Keywords: antibacterial activity, multi walled carbon nanotubes, Neisseria gonorrhea, spectrophotometer assay, Staphylococcus aureus

Procedia PDF Downloads 286
3964 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 115
3963 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 113
3962 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 332
3961 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: classifier ensemble, breast cancer survivability, data mining, SEER

Procedia PDF Downloads 294
3960 Using Scale Invariant Feature Transform Features to Recognize Characters in Natural Scene Images

Authors: Belaynesh Chekol, Numan Çelebi

Abstract:

The main purpose of this work is to recognize individual characters extracted from natural scene images using scale invariant feature transform (SIFT) features as an input to K-nearest neighbor (KNN); a classification learner algorithm. For this task, 1,068 and 78 images of English alphabet characters taken from Chars74k data set is used to train and test the classifier respectively. For each character image, We have generated describing features by using SIFT algorithm. This set of features is fed to the learner so that it can recognize and label new images of English characters. Two types of KNN (fine KNN and weighted KNN) were trained and the resulted classification accuracy is 56.9% and 56.5% respectively. The training time taken was the same for both fine and weighted KNN.

Keywords: character recognition, KNN, natural scene image, SIFT

Procedia PDF Downloads 254
3959 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes

Authors: Frank Kuebler, Rolf Steinhilper

Abstract:

Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.

Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process

Procedia PDF Downloads 492
3958 Logistic Regression Model versus Additive Model for Recurrent Event Data

Authors: Entisar A. Elgmati

Abstract:

Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.

Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event

Procedia PDF Downloads 604
3957 Cognitive Weighted Polymorphism Factor: A New Cognitive Complexity Metric

Authors: T. Francis Thamburaj, A. Aloysius

Abstract:

Polymorphism is one of the main pillars of the object-oriented paradigm. It induces hidden forms of class dependencies which may impact software quality, resulting in higher cost factor for comprehending, debugging, testing, and maintaining the software. In this paper, a new cognitive complexity metric called Cognitive Weighted Polymorphism Factor (CWPF) is proposed. Apart from the software structural complexity, it includes the cognitive complexity on the basis of type. The cognitive weights are calibrated based on 27 empirical studies with 120 persons. A case study and experimentation of the new software metric shows positive results. Further, a comparative study is made and the correlation test has proved that CWPF complexity metric is a better, more comprehensive, and more realistic indicator of the software complexity than Abreu’s Polymorphism Factor (PF) complexity metric.

Keywords: cognitive complexity metric, object-oriented metrics, polymorphism factor, software metrics

Procedia PDF Downloads 411
3956 Extended Constraint Mask Based One-Bit Transform for Low-Complexity Fast Motion Estimation

Authors: Oğuzhan Urhan

Abstract:

In this paper, an improved motion estimation (ME) approach based on weighted constrained one-bit transform is proposed for block-based ME employed in video encoders. Binary ME approaches utilize low bit-depth representation of the original image frames with a Boolean exclusive-OR based hardware efficient matching criterion to decrease computational burden of the ME stage. Weighted constrained one-bit transform (WC‑1BT) based approach improves the performance of conventional C-1BT based ME employing 2-bit depth constraint mask instead of a 1-bit depth mask. In this work, the range of constraint mask is further extended to increase ME performance of WC-1BT approach. Experiments reveal that the proposed method provides better ME accuracy compared existing similar ME methods in the literature.

Keywords: fast motion estimation; low-complexity motion estimation, video coding

Procedia PDF Downloads 292
3955 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression

Procedia PDF Downloads 96