Search results for: multivariate regression tree
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4441

Search results for: multivariate regression tree

4171 Interference among Lambsquarters and Oil Rapeseed Cultivars

Authors: Reza Siyami, Bahram Mirshekari

Abstract:

Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.

Keywords: green cover percentage, independent variable, interference, regression

Procedia PDF Downloads 420
4170 Prognostic Impact of Pre-transplant Ferritinemia: A Survival Analysis Among Allograft Patients

Authors: Mekni Sabrine, Nouira Mariem

Abstract:

Background and aim: Allogeneic hematopoietic stem cell transplantation is a curative treatment for several hematological diseases; however, it has a non-negligible morbidity and mortality depending on several prognostic factors, including pre-transplant hyperferritinemia. The aim of our study was to estimate the impact of hyperferritinemia on survivals and on the occurrence of post-transplant complications. Methods: It was a longitudinal study conducted over 8 years and including all patients who had a first allograft. The impact of pretransplant hyperferritinemia (ferritinemia ≥1500) on survivals was studied using the Kaplan Meier method and the COX model for uni- and multivariate analysis. The Khi-deux test and binary logistic regression were used to study the association between pretransplant ferritinemia and post-transplant complications. Results: One hundred forty patients were included with an average age of 26.6 years and a sex ratio (M/F)=1.4. Hyperferritinemia was found in 33% of patients. It had no significant impact on either overall survival (p=0.9) or event -free survival (p=0.6). In multivariate analysis, only the type of disease was independently associated with overall survival (p=0.04) and event-free survival (p=0.002). For post-allograft complications: The occurrence of early documented infections was independently associated with pretransplant hyperferritinemia (p=0.02) and the presence of acute graft versus host disease( GVHD) (p<10-3). The occurrence of acute GVHD was associated with early documented infection (p=0.002) and Cytomegalovirus reactivation (p<10-3). The occurrence of chronic GVHD was associated with the presence of Cytomegalovirus reactivation (p=0.006) and graft source (p=0.009). Conclusion: Our study showed the significant impact of pre-transplant hyperferritinemia on the occurrence of early infections but not on survivals. Early and more accurate assessment iron overload by other tests such as liver magnetic resonance imaging with initiation of chelating treatment could prevent the occurrence of such complications after transplantation.

Keywords: allogeneic, transplants, ferritin, survival

Procedia PDF Downloads 64
4169 Multivariate Analytical Insights into Spatial and Temporal Variation in Water Quality of a Major Drinking Water Reservoir

Authors: Azadeh Golshan, Craig Evans, Phillip Geary, Abigail Morrow, Zoe Rogers, Marcel Maeder

Abstract:

22 physicochemical variables have been determined in water samples collected weekly from January to December in 2013 from three sampling stations located within a major drinking water reservoir. Classical Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) analysis was used to investigate the environmental factors associated with the physico-chemical variability of the water samples at each of the sampling stations. Matrix augmentation MCR-ALS (MA-MCR-ALS) was also applied, and the two sets of results were compared for interpretative clarity. Links between these factors, reservoir inflows and catchment land-uses were investigated and interpreted in relation to chemical composition of the water and their resolved geographical distribution profiles. The results suggested that the major factors affecting reservoir water quality were those associated with agricultural runoff, with evidence of influence on algal photosynthesis within the water column. Water quality variability within the reservoir was also found to be strongly linked to physical parameters such as water temperature and the occurrence of thermal stratification. The two methods applied (MCR-ALS and MA-MCR-ALS) led to similar conclusions; however, MA-MCR-ALS appeared to provide results more amenable to interpretation of temporal and geological variation than those obtained through classical MCR-ALS.

Keywords: drinking water reservoir, multivariate analysis, physico-chemical parameters, water quality

Procedia PDF Downloads 291
4168 A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India

Authors: Nida Rizvi, Deeksha Katyal, Varun Joshi

Abstract:

River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

Keywords: cluster analysis, multivariate statistical techniques, river Hindon, water quality

Procedia PDF Downloads 461
4167 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 69
4166 Transcendental Birth of the Column from the Full Jar Expressed at the Notre Dame of Paris and Saint Germain-des-Pres

Authors: Kang Woobang

Abstract:

The base of the column is not only a support but also the embodiment of profound symbolism full of cosmic energy. Finding the full jars from which various energy emanate at the Notre Dame of Paris and Saint-Germain-des-Pres in France, the author was so shocked. As the column is cosmic tree, from the Full Jar full with cosmic energy emerges the cosmic tree composed of shaft and capital.

Keywords: full picher or jar, transcendental or supernatural birth from yonggi, yonggimun, yonggissak

Procedia PDF Downloads 411
4165 Historical Tree Height Growth Associated with Climate Change in Western North America

Authors: Yassine Messaoud, Gordon Nigh, Faouzi Messaoud, Han Chen

Abstract:

The effect of climate change on tree growth in boreal and temperate forests has received increased interest in the context of global warming. However, most studies were conducted in small areas and with a limited number of tree species. Here, we examined the height growth responses of seventeen tree species to climate change in Western North America. 37009 stands from forest inventory databases in Canada and USA with varying establishment date were selected. Dominant and co-dominant trees from each stand were sampled to determine top tree height at 50 years breast height age. Height was related to historical mean annual and summer temperatures, annual and summer Palmer Drought Severity Index, tree establishment date, slope, aspect, soil fertility as determined by the rate of carbon organic matter decomposition (carbon/nitrogen), geographic locations (latitude, longitude, and elevation), species range (coastal, interior, and both ranges), shade tolerance and leaf form (needle leaves, deciduous needle leaves, and broadleaves). Climate change had mostly a positive effect on tree height growth. The results explained 62.4% of the height growth variance. Since 1880, height growth increase was greater for coastal, high shade tolerant, and broadleaf species. Height growth increased more on steep slopes and high soil fertility soils. Greater height growth was mostly observed at the leading range and upward. Conversely, some species showed the opposite pattern probably due to the increase of drought (coastal Mediterranean area), precipitation and cloudiness (Alaska and British Columbia) and peculiarity (higher latitudes-lower elevations and vice versa) of western North America topography. This study highlights the role of the species ecological amplitude and traits, and geographic locations as the main factors determining the growth response and its magnitude to the recent global climate change.

Keywords: Height growth, global climate change, species range, species characteristics, species ecological amplitude, geographic locations, western North America

Procedia PDF Downloads 185
4164 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 434
4163 Association of Musculoskeletal and Radiological Features with Clinical and Serological Findings in Systemic Sclerosis: A Single-Centre Registry Study

Authors: Rezvan Hosseinian

Abstract:

Aim: Systemic sclerosis (SSc) is a chronic connective tissue disease with the clinical hallmark of skin thickening and tethering. The correlation of musculoskeletal features with other parameters should be considered in SSc patients. Methods: We reviewed the records of all patients who had more than one visit and standard anteroposterior radiography of hand. We used univariate analysis, and factors with p<0.05 were included in logistic regression to find out dependent factors. Results: Overall, 180 SSc patients were enrolled in our study, 161 (89.4%) of whom were women. The median age (IQR) was 47.0 years (16), and 52% had a diffuse subtype of the disease. In multivariate analysis, tendon friction rubs (TFRs) were associated with the presence of calcinosis, muscle tenderness, and flexion contracture (FC) on physical examination (p<0.05). Arthritis showed no differences in the two subtypes of the disease (p=0.98), and in multivariate analysis, there were no correlations between radiographic arthritis and serological and clinical features. The radiographic results indicated that disease duration correlated with joint erosion, acro-osteolysis, resorption of the distal ulna, calcinosis and radiologic FC (p< 0.05). Acro-osteolysis was more frequent in the dcSSc subtype, TFRs, and anti-TOPO I antibody. Radiologic FC showed an association with skin score, calcinosis and haematocrit <30% (p<0.05). Joint flexion on radiography was associated with disease duration, modified Rodnan skin score, calcinosis, and low hematocrit (P<0.01). Conclusion: Disease duration was a main dependent factor for developing joint erosion, acro-osteolysis, bone resorption, calcinosis, and flexion contracture on hand radiography. Acro-osteolysis presented in the severe form of the disease. Acro-osteolysis was the only dependent variable associated with bone demineralization.

Keywords: disease subsets, hand radiography, joint erosion, sclerosis

Procedia PDF Downloads 88
4162 A Regression Model for Residual-State Creep Failure

Authors: Deepak Raj Bhat, Ryuichi Yatabe

Abstract:

In this study, a residual-state creep failure model was developed based on the residual-state creep test results of clayey soils. To develop the proposed model, the regression analyses were done by using the R. The model results of the failure time (tf) and critical displacement (δc) were compared with experimental results and found in close agreements to each others. It is expected that the proposed regression model for residual-state creep failure will be more useful for the prediction of displacement of different clayey soils in the future.

Keywords: regression model, residual-state creep failure, displacement prediction, clayey soils

Procedia PDF Downloads 405
4161 Association of Musculoskeletal and Radiological Features with Clinical and Serological Findings in Systemic Sclerosis: A Single-Centre Registry Study

Authors: Nasrin Azarbani

Abstract:

Aim: Systemic sclerosis (SSc) is a chronic connective tissue disease with the clinical hallmark of skin thickening and tethering. Correlation of musculoskeletal features with other parameters should be considered in SSc patients. Methods: We reviewed the records of all patients who had more than one visit and standard anteroposterior radiography of hand. We used univariate analysis, and factors with p<0.05 were included in logistic regression to find out dependent factors. Results: Overall, 180 SSc patients were enrolled in our study, 161 (89.4%) of whom were women. Median age (IQR) was 47.0 years (16), and 52% had diffuse subtype of the disease. In multivariate analysis, tendon friction rubs (TFRs) was associated with the presence of calcinosis, muscle tenderness, and flexion contracture (FC) on physical examination (p<0.05). Arthritis showed no differences in the two subtypes of the disease (p=0.98), and in multivariate analysis, there were no correlations between radiographic arthritis and serological and clinical features. The radiographic results indicated that disease duration correlated with joint erosion, acro-osteolysis, resorption of distal ulna, calcinosis and radiologic FC (p< 0.05). Acro-osteolysis was more frequent in the dcSSc subtype, TFRs, and anti-TOPO I antibody. Radiologic FC showed an association with skin score, calcinosis and haematocrit <30% (p<0.05). Joint flexion on radiography was associated with disease duration, modified Rodnan skin score, calcinosis, and low haematocrit (P<0.01). Conclusion: Disease duration was a main dependent factor for developing joint erosion, acro-osteolysis, bone resorption, calcinosis, and flexion contracture on hand radiography. Acro-osteolysis presented in the severe form of the disease. Acro-osteolysis was the only dependent variable associated with bone demineralization.

Keywords: sclerosis, disease subsets, joint erosion, musculoskeletal

Procedia PDF Downloads 64
4160 Optimised Path Recommendation for a Real Time Process

Authors: Likewin Thomas, M. V. Manoj Kumar, B. Annappa

Abstract:

Traditional execution process follows the path of execution drawn by the process analyst without observing the behaviour of resource and other real-time constraints. Identifying process model, predicting the behaviour of resource and recommending the optimal path of execution for a real time process is challenging. The proposed AlfyMiner: αyM iner gives a new dimension in process execution with the novel techniques Process Model Analyser: PMAMiner and Resource behaviour Analyser: RBAMiner for recommending the probable path of execution. PMAMiner discovers next probable activity for currently executing activity in an online process using variant matching technique to identify the set of next probable activity, among which the next probable activity is discovered using decision tree model. RBAMiner identifies the resource suitable for performing the discovered next probable activity and observe the behaviour based on; load and performance using polynomial regression model, and waiting time using queueing theory. Based on the observed behaviour αyM iner recommend the probable path of execution with; next probable activity and the best suitable resource for performing it. Experiments were conducted on process logs of CoSeLoG Project1 and 72% of accuracy is obtained in identifying and recommending next probable activity and the efficiency of resource performance was optimised by 59% by decreasing their load.

Keywords: cross-organization process mining, process behaviour, path of execution, polynomial regression model

Procedia PDF Downloads 333
4159 A Fuzzy Nonlinear Regression Model for Interval Type-2 Fuzzy Sets

Authors: O. Poleshchuk, E. Komarov

Abstract:

This paper presents a regression model for interval type-2 fuzzy sets based on the least squares estimation technique. Unknown coefficients are assumed to be triangular fuzzy numbers. The basic idea is to determine aggregation intervals for type-1 fuzzy sets, membership functions of whose are low membership function and upper membership function of interval type-2 fuzzy set. These aggregation intervals were called weighted intervals. Low and upper membership functions of input and output interval type-2 fuzzy sets for developed regression models are considered as piecewise linear functions.

Keywords: interval type-2 fuzzy sets, fuzzy regression, weighted interval

Procedia PDF Downloads 372
4158 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition

Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar

Abstract:

In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.

Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers

Procedia PDF Downloads 42
4157 Hybrid Anomaly Detection Using Decision Tree and Support Vector Machine

Authors: Elham Serkani, Hossein Gharaee Garakani, Naser Mohammadzadeh, Elaheh Vaezpour

Abstract:

Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.

Keywords: decision tree, feature selection, intrusion detection system, support vector machine

Procedia PDF Downloads 262
4156 Climate Related Variability and Stock-Recruitment Relationship of the North Pacific Albacore Tuna

Authors: Ashneel Ajay Singh, Naoki Suzuki, Kazumi Sakuramoto,

Abstract:

The North Pacific albacore (Thunnus alalunga) is a temperate tuna species distributed in the North Pacific which is of significant economic importance to the Pacific Island Nations and Territories. Despite its importance, the stock dynamics and ecological characteristics of albacore still, have gaps in knowledge. The stock-recruitment relationship of the North Pacific stock of albacore tuna was investigated for different density-dependent effects and a regime shift in the stock characteristics in response to changes in environmental and climatic conditions. Linear regression analysis for recruit per spawning biomass (RPS) and recruitment (R) against the female spawning stock biomass (SSB) were significant for the presence of different density-dependent effects and positive for a regime shift in the stock time series. Application of Deming regression to RPS against SSB with the assumption for the presence of observation and process errors in both the dependent and independent variables confirmed the results of simple regression. However, R against SSB results disagreed given variance level of < 3 and agreed with linear regression results given the assumption of variance ≥ 3. Assuming the presence of different density-dependent effects in the albacore tuna time series, environmental and climatic condition variables were compared with R, RPS, and SSB. The significant relationship of R, RPS and SSB were determined with the sea surface temperature (SST), Pacific Decadal Oscillation (PDO) and multivariate El Niño Southern Oscillation (ENSO) with SST being the principal variable exhibiting significantly similar trend with R and RPS. Recruitment is significantly influenced by the dynamics of the SSB as well as environmental conditions which demonstrates that the stock-recruitment relationship is multidimensional. Further investigation of the North Pacific albacore tuna age-class and structure is necessary for further support the results presented here. It is important for fishery managers and decision makers to be vigilant of regime shifts in environmental conditions relating to albacore tuna as it may possibly cause regime shifts in the albacore R and RPS which should be taken into account to effectively and sustainability formulate harvesting plans and management of the species in the North Pacific oceanic region.

Keywords: Albacore tuna, Thunnus alalunga, recruitment, spawning stock biomass, recruits per spawning biomass, sea surface temperature, pacific decadal oscillation, El Niño southern oscillation, density-dependent effects, regime shift

Procedia PDF Downloads 306
4155 Formulating a Flexible-Spread Fuzzy Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

This study proposes a regression model with flexible spreads for fuzzy input-output data to cope with the situation that the existing measures cannot reflect the actual estimation error. The main idea is that a dissemblance index (DI) is carefully identified and defined for precisely measuring the actual estimation error. Moreover, the graded mean integration (GMI) representation is adopted for determining more representative numeric regression coefficients. Notably, to comprehensively compare the performance of the proposed model with other ones, three different criteria are adopted. The results from commonly used test numerical examples and an application to Taiwan's business monitoring indicator illustrate that the proposed dissemblance index method not only produces valid fuzzy regression models for fuzzy input-output data, but also has satisfactory and stable performance in terms of the total estimation error based on these three criteria.

Keywords: dissemblance index, forecasting, fuzzy sets, linear regression

Procedia PDF Downloads 360
4154 Image Compression Based on Regression SVM and Biorthogonal Wavelets

Authors: Zikiou Nadia, Lahdir Mourad, Ameur Soltane

Abstract:

In this paper, we propose an effective method for image compression based on SVM Regression (SVR), with three different kernels, and biorthogonal 2D Discrete Wavelet Transform. SVM regression could learn dependency from training data and compressed using fewer training points (support vectors) to represent the original data and eliminate the redundancy. Biorthogonal wavelet has been used to transform the image and the coefficients acquired are then trained with different kernels SVM (Gaussian, Polynomial, and Linear). Run-length and Arithmetic coders are used to encode the support vectors and its corresponding weights, obtained from the SVM regression. The peak signal noise ratio (PSNR) and their compression ratios of several test images, compressed with our algorithm, with different kernels are presented. Compared with other kernels, Gaussian kernel achieves better image quality. Experimental results show that the compression performance of our method gains much improvement.

Keywords: image compression, 2D discrete wavelet transform (DWT-2D), support vector regression (SVR), SVM Kernels, run-length, arithmetic coding

Procedia PDF Downloads 380
4153 A Machine Learning Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

There has been a need in recent years to predict student academic achievement prior to graduation. This is to assist them in improving their grades, especially for those who have struggled in the past. The purpose of this research is to use supervised learning techniques to create a model that predicts student academic progress. Many scholars have developed models that predict student academic achievement based on characteristics including smoking, demography, culture, social media, parent educational background, parent finances, and family background, to mention a few. This element, as well as the model used, could have misclassified the kids in terms of their academic achievement. As a prerequisite to predicting if the student will perform well in the future on related courses, this model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester. With a 96.7 percent accuracy, the model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost. This model is offered as a desktop application with user-friendly interfaces for forecasting student academic progress for both teachers and students. As a result, both students and professors are encouraged to use this technique to predict outcomes better.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 108
4152 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 369
4151 A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

Authors: Adriano Z. Zambom, Preethi Ravikumar

Abstract:

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.

Keywords: additive model, nonparametric regression, variable selection, Akaike Information Criteria

Procedia PDF Downloads 263
4150 Extraction of Forest Plantation Resources in Selected Forest of San Manuel, Pangasinan, Philippines Using LiDAR Data for Forest Status Assessment

Authors: Mark Joseph Quinto, Roan Beronilla, Guiller Damian, Eliza Camaso, Ronaldo Alberto

Abstract:

Forest inventories are essential to assess the composition, structure and distribution of forest vegetation that can be used as baseline information for management decisions. Classical forest inventory is labor intensive and time-consuming and sometimes even dangerous. The use of Light Detection and Ranging (LiDAR) in forest inventory would improve and overcome these restrictions. This study was conducted to determine the possibility of using LiDAR derived data in extracting high accuracy forest biophysical parameters and as a non-destructive method for forest status analysis of San Manual, Pangasinan. Forest resources extraction was carried out using LAS tools, GIS, Envi and .bat scripts with the available LiDAR data. The process includes the generation of derivatives such as Digital Terrain Model (DTM), Canopy Height Model (CHM) and Canopy Cover Model (CCM) in .bat scripts followed by the generation of 17 composite bands to be used in the extraction of forest classification covers using ENVI 4.8 and GIS software. The Diameter in Breast Height (DBH), Above Ground Biomass (AGB) and Carbon Stock (CS) were estimated for each classified forest cover and Tree Count Extraction was carried out using GIS. Subsequently, field validation was conducted for accuracy assessment. Results showed that the forest of San Manuel has 73% Forest Cover, which is relatively much higher as compared to the 10% canopy cover requirement. On the extracted canopy height, 80% of the tree’s height ranges from 12 m to 17 m. CS of the three forest covers based on the AGB were: 20819.59 kg/20x20 m for closed broadleaf, 8609.82 kg/20x20 m for broadleaf plantation and 15545.57 kg/20x20m for open broadleaf. Average tree counts for the tree forest plantation was 413 trees/ha. As such, the forest of San Manuel has high percent forest cover and high CS.

Keywords: carbon stock, forest inventory, LiDAR, tree count

Procedia PDF Downloads 387
4149 The Best Prediction Data Mining Model for Breast Cancer Probability in Women Residents in Kabul

Authors: Mina Jafari, Kobra Hamraee, Saied Hossein Hosseini

Abstract:

The prediction of breast cancer disease is one of the challenges in medicine. In this paper we collected 528 records of women’s information who live in Kabul including demographic, life style, diet and pregnancy data. There are many classification algorithm in breast cancer prediction and tried to find the best model with most accurate result and lowest error rate. We evaluated some other common supervised algorithms in data mining to find the best model in prediction of breast cancer disease among afghan women living in Kabul regarding to momography result as target variable. For evaluating these algorithms we used Cross Validation which is an assured method for measuring the performance of models. After comparing error rate and accuracy of three models: Decision Tree, Naive Bays and Rule Induction, Decision Tree with accuracy of 94.06% and error rate of %15 is found the best model to predicting breast cancer disease based on the health care records.

Keywords: decision tree, breast cancer, probability, data mining

Procedia PDF Downloads 136
4148 Variations in Wood Traits across Major Gymnosperm and Angiosperm Tree Species and the Driving Factors in China

Authors: Meixia Zhang, Chengjun Ji, Wenxuan Han

Abstract:

Many wood traits are important functional attributes for tree species, connected with resource competition among species, community dynamics, and ecosystem functions. Large variations in these traits exist among taxonomic categories, but variation in these traits between gymnosperms and angiosperms is still poorly documented. This paper explores the systematic differences in 12 traits between the two tree categories and the potential effects of environmental factors and life form. Based on a database of wood traits for major gymnosperm and angiosperm tree species across China, the values of 12 wood traits and their driving factors in gymnosperms vs. angiosperms were compared. The results are summarized below: i) Means of wood traits were all significantly lower in gymnosperms than in angiosperms. ii) Air-dried density (ADD) and tangential shrinkage coefficient (TSC) reflect the basic information of wood traits for gymnosperms, while ADD and radial shrinkage coefficient (RSC) represent those for angiosperms, providing higher explanation power when used as the evaluation index of wood traits. iii) For both gymnosperm and angiosperm species, life form exhibits the largest explanation rate for large-scale spatial patterns of ADD, TSC (RSC), climatic factors the next, and edaphic factors have the least effect, suggesting that life form is the dominant factor controlling spatial patterns of wood traits. Variations in the magnitude and key traits between gymnosperms and angiosperms and the same dominant factors might indicate the evolutionary divergence and convergence in key functional traits among woody plants.

Keywords: allometry, functional traits, phylogeny, shrinkage coefficient, wood density

Procedia PDF Downloads 273
4147 Sexual Behaviours among Iranian Men and Women Aged 15 to 49 Years in Metropolitan Tehran, Iran: A Cross-Sectional Study

Authors: Mahnaz Motamedi, Mohammad Shahbazi, Shahrzad Rahimi-Naghani, Mehrdad Salehi

Abstract:

Introduction and Aim: This study assessed sexual behaviours among men and women aged 15 to 49 years in Tehran. Material and Methods: This was a cross-sectional study conducted on 755 men and women aged 15 to 49 years who were residents of Tehran. To select the participants, a multistage, cluster, random sampling method was used and included different regions of Tehran. The data were collected using the WHO-endorsed Questionnaire of Sexual and Reproductive Health. Descriptive, bivariate, and multivariate analyses were conducted using SPSS version 20. Sexual and reproductive health (SRH) behaviours was a scale variable that was constructed from items of six sections: sexual experiences, characteristics of the first sexual partner, characteristics of the first intercourse, next sexual contact and the consequences of the first sexual contact, homosexual experiences and the causes of sexual abstinence. Results: The mean age at the time of sexual intercourse with penetration (vaginal, anal) was 19.88 in men and 21.82 in women. Multivariate analysis using linear regression showed that by controlling for other variables, gender had a significant relationship with having sexual experience, mean age of first sexual intercourse, and being multi-partner. Thus, women with sexual experience were 0.158 units less than men. The mean age of first intercourse in women was 1.57 units higher than men and being a multi-partner in women was 0.247 less than men (P < 0.001). Sexual experience in very religious and relatively religious individuals was 0.332 and 0.218 units less than those for whom religion did not matter (P < 0.001). 25.6% of men and 40.7% of women who did not have sexual experience at the time of the study stated that their reason for abstinence was their unwillingness to have sex (P < 0.05). 35.9% of men and 16.5% of women stated that the reason for abstinence was not providing a suitable opportunity (P < 0.001). 4.7% of men and 1.7% of women had sexual attraction to the same sex. The difference between men and women was significant (P < 0.001). Conclusion: Sexual relation is also present in singles and younger groups and is not limited to married or final marriage candidates. Therefore, more evaluation should be done in national research and interventions for sexual and reproductive health services should be done at the macro level of policy making.

Keywords: sexual behaviours, Iranian men and women, Iran, cross-sectional study

Procedia PDF Downloads 154
4146 Lifestyle Factors Associated With Overweight/obesity Status In Croatian Adolescents: A Population-Based Study

Authors: Lovro Štefan

Abstract:

The main purpose of the present study was to investigate the associations between the overweight/obesity status and lifestyle factors. In this cross-sectional study, participants were 1950 urban secondary-school students (54.7% of female students) aged 17-18 years old. Dependent variable was body-mass index status derived from self-reported height and weight. The outcome was binarised, where participants with value <25 kg/m2 were collapsed into „normal“, while those ≥25 kg/m2 into „overweight/obesity“ category. Independent variables were gender, type of school, physical activity, sedentary behaviour, self-rated health, self-perceived socioeconomic status and psychological distress. The associations between the dependent and independent variables were analyzed by using multiple logistic regression analysis. In the univariate model, being overweight/obese was significantly associated with being a male student (OR 0.31; 95% CI 0.23 to 0.42), attending a vocational school (OR 1.87; 95% CI 1.42 to 2.48), not meeting the recommendations for moderate-to-vigorous physical activity (OR 0.44; 95% CI 0.22 to 0.88), more time spending in sedentary behaviour (OR 1.53; 95% CI 1.07 to 2.19), poor self-rated health (OR 0.35, 95% CI 0.20 to 0.56) and lower socioeconomic status (OR 0.63; 95% CI 0.48 to 0.84). In the multivariate model, the same associations occured between the dependent and independent variable. In both models, psychological distress was not associated with being overweight/obese. In conclusion, our findings suggest, that lifestyle factors are independently associated with body-mass index

Keywords: body mass index, secondary-school students, Croatia, physical activity, sedentary behaviour, logistic regression

Procedia PDF Downloads 88
4145 Application and Verification of Regression Model to Landslide Susceptibility Mapping

Authors: Masood Beheshtirad

Abstract:

Identification of regions having potential for landslide occurrence is one of the basic measures in natural resources management. Different landslide hazard mapping models are proposed based on the environmental condition and goals. In this research landslide hazard map using multiple regression model were provided and applicability of this model is investigated in Baghdasht watershed. Dependent variable is landslide inventory map and independent variables consist of information layers as Geology, slope, aspect, distance from river, distance from road, fault and land use. For doing this, existing landslides have been identified and an inventory map made. The landslide hazard map is based on the multiple regression provided. The level of similarity potential hazard classes and figures of this model were compared with the landslide inventory map in the SPSS environments. Results of research showed that there is a significant correlation between the potential hazard classes and figures with area of the landslides. The multiple regression model is suitable for application in the Baghdasht Watershed.

Keywords: landslide, mapping, multiple model, regression

Procedia PDF Downloads 322
4144 Impacts of Aquaculture Farms on the Mangroves Forests of Sundarbans, India (2010-2018): Temporal Changes of NDVI

Authors: Sandeep Thakur, Ismail Mondal, Phani Bhusan Ghosh, Papita Das, Tarun Kumar De

Abstract:

Sundarbans Reserve forest of India has been undergoing major transformations in the recent past owing to population pressure and related changes. This has brought about major changes in the spatial landscape of the region especially in the western parts. This study attempts to assess the impacts of the Landcover changes on the mangrove habitats. Time series imageries of Landsat were used to analyze the Normalized Differential Vegetation Index (NDVI) patterns over the western parts of Indian Sundarbans forest in order to assess the heath of the mangroves in the region. The images were subjected to Land use Land cover (LULC) classification using sub-pixel classification techniques in ERDAS Imagine software and the changes were mapped. The spatial proliferation of aquaculture farms during the study period was also mapped. A multivariate regression analysis was carried out between the obtained NDVI values and the LULC classes. Similarly, the observed meteorological data sets (time series rainfall and minimum and maximum temperature) were also statistically correlated for regression. The study demonstrated the application of NDVI in assessing the environmental status of mangroves as the relationship between the changes in the environmental variables and the remote sensing based indices felicitate an efficient evaluation of environmental variables, which can be used in the coastal zone monitoring and development processes.

Keywords: aquaculture farms, LULC, Mangrove, NDVI

Procedia PDF Downloads 180
4143 Hybrid Thresholding Lifting Dual Tree Complex Wavelet Transform with Wiener Filter for Quality Assurance of Medical Image

Authors: Hilal Naimi, Amelbahahouda Adamou-Mitiche, Lahcene Mitiche

Abstract:

The main problem in the area of medical imaging has been image denoising. The most defying for image denoising is to secure data carrying structures like surfaces and edges in order to achieve good visual quality. Different algorithms with different denoising performances have been proposed in previous decades. More recently, models focused on deep learning have shown a great promise to outperform all traditional approaches. However, these techniques are limited to the necessity of large sample size training and high computational costs. This research proposes a denoising approach basing on LDTCWT (Lifting Dual Tree Complex Wavelet Transform) using Hybrid Thresholding with Wiener filter to enhance the quality image. This research describes the LDTCWT as a type of lifting wavelets remodeling that produce complex coefficients by employing a dual tree of lifting wavelets filters to get its real part and imaginary part. Permits the remodel to produce approximate shift invariance, directionally selective filters and reduces the computation time (properties lacking within the classical wavelets transform). To develop this approach, a hybrid thresholding function is modeled by integrating the Wiener filter into the thresholding function.

Keywords: lifting wavelet transform, image denoising, dual tree complex wavelet transform, wavelet shrinkage, wiener filter

Procedia PDF Downloads 162
4142 EWMA and MEWMA Control Charts for Monitoring Mean and Variance in Industrial Processes

Authors: L. A. Toro, N. Prieto, J. J. Vargas

Abstract:

There are many control charts for monitoring mean and variance. Among these, the X y R, X y S, S2 Hotteling and Shewhart control charts, for mentioning some, are widely used for monitoring mean a variance in industrial processes. In particular, the Shewhart charts are based on the information about the process contained in the current observation only and ignore any information given by the entire sequence of points. Moreover, that the Shewhart chart is a control chart without memory. Consequently, Shewhart control charts are found to be less sensitive in detecting smaller shifts, particularly smaller than 1.5 times of the standard deviation. These kind of small shifts are important in many industrial applications. In this study and effective alternative to Shewhart control chart was implemented. In case of univariate process an Exponentially Moving Average (EWMA) control chart was developed and Multivariate Exponentially Moving Average (MEWMA) control chart in case of multivariate process. Both of these charts were based on memory and perform better that Shewhart chart while detecting smaller shifts. In these charts, information the past sample is cumulated up the current sample and then the decision about the process control is taken. The mentioned characteristic of EWMA and MEWMA charts, are of the paramount importance when it is necessary to control industrial process, because it is possible to correct or predict problems in the processes before they come to a dangerous limit.

Keywords: control charts, multivariate exponentially moving average (MEWMA), exponentially moving average (EWMA), industrial control process

Procedia PDF Downloads 351