Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 41074

Search results for: multivariate data analysis

40744 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 291

40743 Probabilistic Approach to the Spatial Identification of the Environmental Sources behind Mortality Rates in Europe

Authors: Alina Svechkina, Boris A. Portnov

Abstract:

In line with a rapid increase in pollution sources and enforcement of stricter air pollution regulation, which lowers pollution levels, it becomes more difficult to identify actual risk sources behind the observed morbidity patterns, and new approaches are required to identify potential risks and take preventive actions. In the present study, we discuss a probabilistic approach to the spatial identification of a priori unidentified environmental health hazards. The underlying assumption behind the tested approach is that the observed adverse health patterns (morbidity, mortality) can become a source of information on the geographic location of environmental risk factors that stand behind them. Using this approach, we analyzed sources of environmental exposure using data on mortality rates available for the year 2015 for NUTS 3 (Nomenclature of Territorial Units for Statistics) subdivisions of the European Union. We identified several areas in the southwestern part of Europe as primary risk sources for the observed mortality patterns. Multivariate regressions, controlled by geographical location, climate conditions, GDP (gross domestic product) per capita, dependency ratios, population density, and the level of road freight revealed that mortality rates decline as a function of distance from the identified hazard location. We recommend the proposed approach an exploratory analysis tool for initial investigation of regional patterns of population morbidity patterns and factors behind it.

Keywords: mortality, environmental hazards, air pollution, distance decay gradient, multi regression analysis, Europe, NUTS3

Procedia PDF Downloads 148

40742 Impact of Diabetes Mellitus Type 2 on Clinical In-Stent Restenosis in First Elective Percutaneous Coronary Intervention Patients

Authors: Leonard Simoni, Ilir Alimehmeti, Ervina Shirka, Endri Hasimi, Ndricim Kallashi, Verona Beka, Suerta Kabili, Artan Goda

Abstract:

Background: Diabetes Mellitus type 2, small vessel calibre, stented length of vessel, complex lesion morphology, and prior bypass surgery have resulted risk factors for In-Stent Restenosis (ISR). However, there are some contradictory results about body mass index (BMI) as a risk factor for ISR. Purpose: We want to identify clinical, lesional and procedural factors that can predict clinical ISR in our patients. Methods: Were enrolled 759 patients who underwent first-time elective PCI with Bare Metal Stents (BMS) from September 2011 to December 2013 in our Department of Cardiology and followed them for at least 1.5 years with a median of 862 days (2 years and 4 months). Only the patients re-admitted with ischemic heart disease underwent control coronary angiography but no routine angiographic control was performed. Patients were categorized in ISR and non-ISR groups and compared between them. Multivariate analysis - Binary Logistic Regression: Forward Conditional Method was used to identify independent predictive risk factors. P was considered statistically significant when <0.05. Results: ISR compared to non-ISR individuals had a significantly lower BMI (25.7±3.3 vs. 26.9±3.7, p=0.004), higher risk anatomy (LM + 3-vessel CAD) (23% vs. 14%, p=0.03), higher number of stents/person used (2.1±1.1 vs. 1.75±0.96, p=0.004), greater length of stents/person used (39.3±21.6 vs. 33.3±18.5, p=0.01), and a lower use of clopidogrel and ASA (together) (95% vs. 99%, p=0.012). They also had a higher, although not statistically significant, prevalence of Diabetes Mellitus (42% vs. 32%, p=0.072) and a greater number of treated vessels (1.36±0.5 vs. 1.26±0.5, p=0.08). In the multivariate analysis, Diabetes Mellitus type 2 and multiple stents used were independent predictors risk factors for In-Stent Restenosis, OR 1.66 [1.03-2.68], p=0.039, and OR 1.44 [1.16-1.78,] p=0.001, respectively. On the other side higher BMI and use of clopidogrel and ASA together resulted protective factors OR 0.88 [0.81-0.95], p=0.001 and OR 0.2 [0.06-0.72] p=0.013, respectively. Conclusion: Diabetes Mellitus and multiple stents are strong predictive risk factors, whereas the use of clopidogrel and ASA together are protective factors for clinical In-Stent Restenosis. Paradoxically High BMI is a protective factor for In-stent Restenosis, probably related to a larger diameter of vessels and consequently a larger diameter of stents implanted in these patients. Further studies are needed to clarify this finding.

Keywords: body mass index, diabetes mellitus, in-stent restenosis, percutaneous coronary intervention

Procedia PDF Downloads 191

40741 The Non-Stationary BINARMA(1,1) Process with Poisson Innovations: An Application on Accident Data

Authors: Y. Sunecher, N. Mamode Khan, V. Jowaheer

Abstract:

This paper considers the modelling of a non-stationary bivariate integer-valued autoregressive moving average of order one (BINARMA(1,1)) with correlated Poisson innovations. The BINARMA(1,1) model is specified using the binomial thinning operator and by assuming that the cross-correlation between the two series is induced by the innovation terms only. Based on these assumptions, the non-stationary marginal and joint moments of the BINARMA(1,1) are derived iteratively by using some initial stationary moments. As regards to the estimation of parameters of the proposed model, the conditional maximum likelihood (CML) estimation method is derived based on thinning and convolution properties. The forecasting equations of the BINARMA(1,1) model are also derived. A simulation study is also proposed where BINARMA(1,1) count data are generated using a multivariate Poisson R code for the innovation terms. The performance of the BINARMA(1,1) model is then assessed through a simulation experiment and the mean estimates of the model parameters obtained are all efficient, based on their standard errors. The proposed model is then used to analyse a real-life accident data on the motorway in Mauritius, based on some covariates: policemen, daily patrol, speed cameras, traffic lights and roundabouts. The BINARMA(1,1) model is applied on the accident data and the CML estimates clearly indicate a significant impact of the covariates on the number of accidents on the motorway in Mauritius. The forecasting equations also provide reliable one-step ahead forecasts.

Keywords: non-stationary, BINARMA(1, 1) model, Poisson innovations, conditional maximum likelihood, CML

Procedia PDF Downloads 112

40740 Early Indications of the Success of Rehabilitating Degraded Lands through the Green Legacy Project Implemented in Ethiopia

Authors: Tamirat Solomon, Aberash Yohannis, Efrem Gulfo

Abstract:

The plantation of trees, which harmonizes the agroecology of the environment, has been implemented in Ethiopia with great concern for a noticeably degraded environment. This study was designed to evaluate the effectiveness of green legacy, species selection and, the rate of survival, and the management status in the study areas. A systematic sampling method was employed to collect the required data from 144 quadrants measuring a 15m radius with an interval of 40m apart. Additionally, 244 sample households were selected for the socioeconomic study in addition to secondary data collected from office recordings. The data collected was analyzed using multivariate analysis, considering exposure and outcome variables. The findings of this study indicated that four exotic tree species, namely; A. salgina, C. fistula, A. indica, and G. robusta, were commonly selected tree species for degraded land restoration in the study areas. Among the seedlings planted at the four study sites, a total of 79.9% survived, and A. salgina was the dominant and best performed species, A. indica was the least survived species in the entire study area. The age of the seedling before planting significantly (p = 0.05) affected the survival potential of most seedlings of species, and the majority (82%) of local communities expressed their positive attitudes and willingness to manage the restoration works in the study areas. It was recommended to consider the inclusion of native species in the restoration effort and evaluate the co-existence of native flora with exotic and its competition for nutrients, water, and light in addition to the invading potentials in the ecosystem. In general, before embarking on degraded land restoration, species selection, adequate preparation of seedlings, and species diversity composition that exactly fit the socioeconomic and ecological demands of the areas must get the attention for the success of the restoration.

Keywords: plantation forest, degraded land, forest restoration, plantation survival, species selection

Procedia PDF Downloads 59

40739 Profitability Assessment of Granite Aggregate Production and the Development of a Profit Assessment Model

Authors: Melodi Mbuyi Mata, Blessing Olamide Taiwo, Afolabi Ayodele David

Abstract:

The purpose of this research is to create empirical models for assessing the profitability of granite aggregate production in Akure, Ondo state aggregate quarries. In addition, an artificial neural network (ANN) model and multivariate predicting models for granite profitability were developed in the study. A formal survey questionnaire was used to collect data for the study. The data extracted from the case study mine for this study includes granite marketing operations, royalty, production costs, and mine production information. The following methods were used to achieve the goal of this study: descriptive statistics, MATLAB 2017, and SPSS16.0 software in analyzing and modeling the data collected from granite traders in the study areas. The ANN and Multi Variant Regression models' prediction accuracy was compared using a coefficient of determination (R²), Root mean square error (RMSE), and mean square error (MSE). Due to the high prediction error, the model evaluation indices revealed that the ANN model was suitable for predicting generated profit in a typical quarry. More quarries in Nigeria's southwest region and other geopolitical zones should be considered to improve ANN prediction accuracy.

Keywords: national development, granite, profitability assessment, ANN models

Procedia PDF Downloads 82

40738 Using SNAP and RADTRAD to Establish the Analysis Model for Maanshan PWR Plant

Authors: J. R. Wang, H. C. Chen, C. Shih, S. W. Chen, J. H. Yang, Y. Chiang

Abstract:

In this study, we focus on the establishment of the analysis model for Maanshan PWR nuclear power plant (NPP) by using RADTRAD and SNAP codes with the FSAR, manuals, and other data. In order to evaluate the cumulative dose at the Exclusion Area Boundary (EAB) and Low Population Zone (LPZ) outer boundary, Maanshan NPP RADTRAD/SNAP model was used to perform the analysis of the DBA LOCA case. The analysis results of RADTRAD were similar to FSAR data. These analysis results were lower than the failure criteria of 10 CFR 100.11 (a total radiation dose to the whole body, 250 mSv; a total radiation dose to the thyroid from iodine exposure, 3000 mSv).

Keywords: RADionuclide, transport, removal, and dose estimation (RADTRAD), symbolic nuclear analysis package (SNAP), dose, PWR

Procedia PDF Downloads 442

40737 Correlates of Comprehensive HIV/AIDS Knowledge and Acceptance Attitude Towards People Living with HIV/AIDS: A Cross-Sectional Study among Unmarried Young Women in Uganda

Authors: Tesfaldet Mekonnen Estifanos, Chen Hui, Afewerki Weldezgi

Abstract:

Background: Youth in general and young females in particular, remain at the center of the HIV/AIDS epidemic. Sexual risk-taking among young unmarried women is relatively high and are the most vulnerable and highly exposed to HIV/AIDS. Improvements in the status of HIV/AIDS knowledge and acceptance attitude towards people living with HIV (PLWHIV) plays a great role in averting the incidence of HIV/AIDS. Thus, the aim of the study was to explore the level and correlates of HIV/AIDS knowledge and accepting attitude toward PLWHIV. Methods: A cross-sectional study was conducted using data from the Uganda Demographic Health Survey 2016 (UDHS-2016). National level representative household surveys using a multistage cluster probability sampling method, face to face interviews with standard questionnaires were performed. Unmarried women aged 15-24 years with a sample size of 2019 were selected from the total sample of 8674 women aged 15-49 years and were analyzed using SPSS version 23. Independent variables such as age, religion, educational level, residence, and wealth index were included. Two binary outcome variables (comprehensive HIV/AIDS knowledge and acceptance attitude toward PLWHIV) were utilized. We used the chi-square test as well as multivariate regression analysis to explore correlations of explanatory variables with the outcome variables. The results were reported by odds ratios (OR) with 95% confidence interval (95% CI), taking a p-value less than 0.05 as significant. Results: Almost all (99.3%) of the unmarried women aged 15-24 years were aware of HIV/AIDS, but only 51.2% had adequate comprehensive knowledge on HIV/AIDS. Only 69.4% knew both methods: using a condom every time had sex, and having only one faithful uninfected partner can prevent HIV/AIDS transmission. About 66.6% of the unmarried women reject at least two common local misconceptions about HIV/AIDS. Moreover, an alarmingly few (20.3%) of the respondents had a positive acceptance attitude to PLWHIV. On multivariate analysis, age (20-24 years), living in urban, being educated and wealthier, were predictors of having adequate comprehensive HIV/AIDS knowledge. On the other hand, research participants with adequate comprehensive knowledge about HIV/AIDS were highly likely (OR, 1.94 95% CI, 1.52-2.46) to have a positive acceptance attitude to PLWHIV than those with inadequate knowledge. Respondents with no education, Muslim, and Pentecostal religion were emerged less likely to have a positive acceptance attitude to PLWHIV. Conclusion: This study found out the highly accepted level of awareness, but the knowledge and positive acceptance attitude are not encouraging. Thus, expanding access to comprehensive sexuality and strengthening educational campaigns on HIV/AIDS in communities, health facilities, and schools is needed with a greater focus on disadvantaged women having low educational level, poor socioeconomic status, and those residing in rural areas. Sexual risk behaviors among the most affected people - young women have also a role in the spread of HIV/AIDS. Hence, further research assessing the significant contributing factors for sexual risk-taking might have a positive impact on the fight against HIV/AIDS.

Keywords: acceptance attitude, HIV/AIDS, knowledge, unmarried women

Procedia PDF Downloads 126

40736 Automated Process Quality Monitoring and Diagnostics for Large-Scale Measurement Data

Authors: Hyun-Woo Cho

Abstract:

Continuous monitoring of industrial plants is one of necessary tasks when it comes to ensuring high-quality final products. In terms of monitoring and diagnosis, it is quite critical and important to detect some incipient abnormal events of manufacturing processes in order to improve safety and reliability of operations involved and to reduce related losses. In this work a new multivariate statistical online diagnostic method is presented using a case study. For building some reference models an empirical discriminant model is constructed based on various past operation runs. When a fault is detected on-line, an on-line diagnostic module is initiated. Finally, the status of the current operating conditions is compared with the reference model to make a diagnostic decision. The performance of the presented framework is evaluated using a dataset from complex industrial processes. It has been shown that the proposed diagnostic method outperforms other techniques especially in terms of incipient detection of any faults occurred.

Keywords: data mining, empirical model, on-line diagnostics, process fault, process monitoring

Procedia PDF Downloads 387

40735 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 482

40734 Efficiency of the Slovak Commercial Banks Applying the DEA Window Analysis

Authors: Iveta Řepková

Abstract:

The aim of this paper is to estimate the efficiency of the Slovak commercial banks employing the Data Envelopment Analysis (DEA) window analysis approach during the period 2003-2012. The research is based on unbalanced panel data of the Slovak commercial banks. Undesirable output was included into analysis of banking efficiency. It was found that most efficient banks were Postovabanka, UniCredit Bank and Istrobanka in CCR model and the most efficient banks were Slovenskasporitelna, Istrobanka and UniCredit Bank in BCC model. On contrary, the lowest efficient banks were found Privatbanka and CitiBank. We found that the largest banks in the Slovak banking market were lower efficient than medium-size and small banks. Results of the paper is that during the period 2003-2008 the average efficiency was increasing and then during the period 2010-2011 the average efficiency decreased as a result of financial crisis.

Keywords: data envelopment analysis, efficiency, Slovak banking sector, window analysis

Procedia PDF Downloads 337

40733 Spatial Variability of Brahmaputra River Flow Characteristics

Authors: Hemant Kumar

Abstract:

Brahmaputra River is known according to the Hindu mythology the son of the Lord Brahma. According to this name, the river Brahmaputra creates mass destruction during the monsoon season in Assam, India. It is a state situated in North-East part of India. This is one of the essential states out of the seven countries of eastern India, where almost all entire Brahmaputra flow carried out. The other states carry their tributaries. In the present case study, the spatial analysis performed in this specific case the number of MODIS data are acquired. In the method of detecting the change, the spray content was found during heavy rainfall and in the flooded monsoon season. By this method, particularly the analysis over the Brahmaputra outflow determines the flooded season. The charged particle-associated in aerosol content genuinely verifies the heavy water content below the ground surface, which is validated by trend analysis through rainfall spectrum data. This is confirmed by in-situ sampled view data from a different position of Brahmaputra River. Further, a Hyperion Hyperspectral 30 m resolution data were used to scan the sediment deposits, which is also confirmed by in-situ sampled view data from a different position.

Keywords: aerosol, change detection, spatial analysis, trend analysis

Procedia PDF Downloads 131

40732 Attribute Analysis of Quick Response Code Payment Users Using Discriminant Non-negative Matrix Factorization

Authors: Hironori Karachi, Haruka Yamashita

Abstract:

Recently, the system of quick response (QR) code is getting popular. Many companies introduce new QR code payment services and the services are competing with each other to increase the number of users. For increasing the number of users, we should grasp the difference of feature of the demographic information, usage information, and value of users between services. In this study, we conduct an analysis of real-world data provided by Nomura Research Institute including the demographic data of users and information of users’ usages of two services; LINE Pay, and PayPay. For analyzing such data and interpret the feature of them, Nonnegative Matrix Factorization (NMF) is widely used; however, in case of the target data, there is a problem of the missing data. EM-algorithm NMF (EMNMF) to complete unknown values for understanding the feature of the given data presented by matrix shape. Moreover, for comparing the result of the NMF analysis of two matrices, there is Discriminant NMF (DNMF) shows the difference of users features between two matrices. In this study, we combine EMNMF and DNMF and also analyze the target data. As the interpretation, we show the difference of the features of users between LINE Pay and Paypay.

Keywords: data science, non-negative matrix factorization, missing data, quality of services

Procedia PDF Downloads 113

40731 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining

Procedia PDF Downloads 417

40730 Curve Fitting by Cubic Bezier Curves Using Migrating Birds Optimization Algorithm

Authors: Mitat Uysal

Abstract:

A new met heuristic optimization algorithm called as Migrating Birds Optimization is used for curve fitting by rational cubic Bezier Curves. This requires solving a complicated multivariate optimization problem. In this study, the solution of this optimization problem is achieved by Migrating Birds Optimization algorithm that is a powerful met heuristic nature-inspired algorithm well appropriate for optimization. The results of this study show that the proposed method performs very well and being able to fit the data points to cubic Bezier Curves with a high degree of accuracy.

Keywords: algorithms, Bezier curves, heuristic optimization, migrating birds optimization

Procedia PDF Downloads 319

40729 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 28

40728 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 230

40727 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 162

40726 Analysis of Lead Time Delays in Supply Chain: A Case Study

Authors: Abdel-Aziz M. Mohamed, Nermeen Coutry

Abstract:

Lead time is an important measure of supply chain performance. It impacts both customer satisfactions as well as the total cost of inventory. This paper presents the result of a study on the analysis of the customer order lead-time for a multinational company. In the study, the lead time was divided into three stages: order entry, order fulfillment, and order delivery. A sample of size 2,425 order lines from the company records were considered for this study. The sample data includes information regarding customer orders from the time of order entry until order delivery. Data regarding the lead time of each sage for different orders were also provided. Summary statistics on lead time data reveals that about 30% of the orders were delivered after the scheduled due date. The result of the multiple linear regression analysis technique revealed that component type, logistics parameter, order size and the customer type have significant impact on lead time. Data analysis on the stages of lead time indicates that stage 2 consumes over 50% of the lead time. Pareto analysis was made to study the reasons for the customer order delay in each of the 3 stages. Recommendation was given to resolve the problem.

Keywords: lead time reduction, customer satisfaction, service quality, statistical analysis

Procedia PDF Downloads 708

40725 Differentiation between Different Rangeland Sites Using Principal Component Analysis in Semi-Arid Areas of Sudan

Authors: Nancy Ibrahim Abdalla, Abdelaziz Karamalla Gaiballa

Abstract:

Rangelands in semi-arid areas provide a good source for feeding huge numbers of animals and serving environmental, economic and social importance; therefore, these areas are considered economically very important for the pastoral sector in Sudan. This paper investigates the means of differentiating between different rangelands sites according to soil types using principal component analysis to assist in monitoring and assessment purposes. Three rangeland sites were identified in the study area as flat sandy sites, sand dune site, and hard clay site. Principal component analysis (PCA) was used to reduce the number of factors needed to distinguish between rangeland sites and produce a new set of data including the most useful spectral information to run satellite image processing. It was performed using selected types of data (two vegetation indices, topographic data and vegetation surface reflectance within the three bands of MODIS data). Analysis with PCA indicated that there is a relatively high correspondence between vegetation and soil of the total variance in the data set. The results showed that the use of the principal component analysis (PCA) with the selected variables showed a high difference, reflected in the variance and eigenvalues and it can be used for differentiation between different range sites.

Keywords: principal component analysis, PCA, rangeland sites, semi-arid areas, soil types

Procedia PDF Downloads 161

40724 Interpretation and Clustering Framework for Analyzing ECG Survey Data

Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif

Abstract:

As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.

Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix

Procedia PDF Downloads 452

40723 Combining Diffusion Maps and Diffusion Models for Enhanced Data Analysis

Authors: Meng Su

Abstract:

High-dimensional data analysis often presents challenges in capturing the complex, nonlinear relationships and manifold structures inherent to the data. This article presents a novel approach that leverages the strengths of two powerful techniques, Diffusion Maps and Diffusion Probabilistic Models (DPMs), to address these challenges. By integrating the dimensionality reduction capability of Diffusion Maps with the data modeling ability of DPMs, the proposed method aims to provide a comprehensive solution for analyzing and generating high-dimensional data. The Diffusion Map technique preserves the nonlinear relationships and manifold structure of the data by mapping it to a lower-dimensional space using the eigenvectors of the graph Laplacian matrix. Meanwhile, DPMs capture the dependencies within the data, enabling effective modeling and generation of new data points in the low-dimensional space. The generated data points can then be mapped back to the original high-dimensional space, ensuring consistency with the underlying manifold structure. Through a detailed example implementation, the article demonstrates the potential of the proposed hybrid approach to achieve more accurate and effective modeling and generation of complex, high-dimensional data. Furthermore, it discusses possible applications in various domains, such as image synthesis, time-series forecasting, and anomaly detection, and outlines future research directions for enhancing the scalability, performance, and integration with other machine learning techniques. By combining the strengths of Diffusion Maps and DPMs, this work paves the way for more advanced and robust data analysis methods.

Keywords: diffusion maps, diffusion probabilistic models (DPMs), manifold learning, high-dimensional data analysis

Procedia PDF Downloads 84

40722 Cigarette Smoking and Alcohol Use among Mauritian Adolescents: Analysis of 2017 WHO Global School-Based Student Health Survey

Authors: Iyanujesu Adereti, Tajudeen Basiru, Ayodamola Olanipekun

Abstract:

Background: Substance abuse among adolescents is of public health concern globally. Despite being the most abused by adolescents, there are limited studies on the prevalence of alcohol use and cigarette smoking among adolescents in Mauritius. Objectives: To determine the prevalence of cigarette smoking, alcohol use and associated correlates among school-going adolescents in Mauritius. Methodology: Data obtained from 2017 WHO Global School-based Student Health Survey (GSHS) survey of 3,012 school-going adolescents in Mauritius was analyzed using STATA. Descriptive statistics were used to obtain prevalence. Bivariate and multivariate logistic regression analysis was used to evaluate predictors of cigarette smoking and alcohol use. Results: Prevalence of alcohol consumption and cigarette smoking were 26.0% and 17.1%, respectively. Smoking and alcohol use was more prevalent among males, younger adolescents, and those in higher school grades (p-value <.000). In multivariable logistic regression, male gender was associated with a higher risk of cigarette smoking (adjusted Odds Ratio (aOR) [95%Confidence Interval (CI)]= 1.51[1.06-2.14]) but lower risk of alcohol use (aOR[95%CI]= 0.69[0.53-0.90]) while older age (mid and late adolescence) and parental smoking were found to be associated with increased risk of alcohol use (aOR[95%CI]= 1.94[1.34-2.99] and 1.36[1.05-1.78] respectively). Marijuana use, truancy, being in a fight and suicide ideation were associated with increased odds of alcohol use (aOR[95%CI]= 3.82[3.39-6.09]; 2.15[1.62-2.87]; 1.83[1.34-2.49] and 1.93[1.38-2.69] respectively) and cigarette smoking (aOR[95%CI]= 17.28[10.4 - 28.51]; 1.73[1.21-2. 49]; 1.67[1.14-2.45] and 2.17[1.43-3.28] respectively) while involvement in sexual activity was associated with reduced risk of alcohol use (aOR[95%CI]= 0.50[0.37-0.68]) and cigarette smoking (aOR[95%CI]= 0.47[0.33-0.69]). Parental support and parental monitoring were uniquely associated with lower risk of cigarette smoking (aOR[95%CI]= 0.69[0.47-0.99] and 0.62[0.43-0.91] respectively). Conclusion: The high prevalence of alcohol use and cigarette smoking in this study shows the need for the government of Mauritius to enhance policies that will help address this issue putting into accounts the various risk and protective factors.

Keywords: adolescent health, alcohol use, cigarette smoking, global school-based student health survey

Procedia PDF Downloads 224

40721 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 372

40720 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders

Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh

Abstract:

Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.

Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches

Procedia PDF Downloads 49

40719 Facility Anomaly Detection with Gaussian Mixture Model

Authors: Sunghoon Park, Hank Kim, Jinwon An, Sungzoon Cho

Abstract:

Internet of Things allows one to collect data from facilities which are then used to monitor them and even predict malfunctions in advance. Conventional quality control methods focus on setting a normal range on a sensor value defined between a lower control limit and an upper control limit, and declaring as an anomaly anything falling outside it. However, interactions among sensor values are ignored, thus leading to suboptimal performance. We propose a multivariate approach which takes into account many sensor values at the same time. In particular Gaussian Mixture Model is used which is trained to maximize likelihood value using Expectation-Maximization algorithm. The number of Gaussian component distributions is determined by Bayesian Information Criterion. The negative Log likelihood value is used as an anomaly score. The actual usage scenario goes like a following. For each instance of sensor values from a facility, an anomaly score is computed. If it is larger than a threshold, an alarm will go off and a human expert intervenes and checks the system. A real world data from Building energy system was used to test the model.

Keywords: facility anomaly detection, gaussian mixture model, anomaly score, expectation maximization algorithm

Procedia PDF Downloads 254

40718 Longitudinal Analysis of Internet Speed Data in the Gulf Cooperation Council Region

Authors: Musab Isah

Abstract:

This paper presents a longitudinal analysis of Internet speed data in the Gulf Cooperation Council (GCC) region, focusing on the most populous cities of each of the six countries – Riyadh, Saudi Arabia; Dubai, UAE; Kuwait City, Kuwait; Doha, Qatar; Manama, Bahrain; and Muscat, Oman. The study utilizes data collected from the Measurement Lab (M-Lab) infrastructure over a five-year period from January 1, 2019, to December 31, 2023. The analysis includes downstream and upstream throughput data for the cities, covering significant events such as the launch of 5G networks in 2019, COVID-19-induced lockdowns in 2020 and 2021, and the subsequent recovery period and return to normalcy. The results showcase substantial increases in Internet speeds across the cities, highlighting improvements in both download and upload throughput over the years. All the GCC countries have achieved above-average Internet speeds that can conveniently support various online activities and applications with excellent user experience.

Keywords: internet data science, internet performance measurement, throughput analysis, internet speed, measurement lab, network diagnostic tool

Procedia PDF Downloads 34

40717 Extreme Temperature Forecast in Mbonge, Cameroon Through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution

Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph

Abstract:

In this paper, temperature extremes are forecast by employing the block maxima method of the generalized extreme value (GEV) distribution to analyse temperature data from the Cameroon Development Corporation (CDC). By considering two sets of data (raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data, while in the simulated data the return values show an increasing trend with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend with an upper bound. This clearly shows that although temperatures in the tropics show a sign of increase in the future, there is a maximum temperature at which there is no exceedance. The results of this paper are very vital in agricultural and environmental research.

Keywords: forecasting, generalized extreme value (GEV), meteorology, return level

Procedia PDF Downloads 453

40716 Analysis of Spatial and Temporal Data Using Remote Sensing Technology

Authors: Kapil Pandey, Vishnu Goyal

Abstract:

Spatial and temporal data analysis is very well known in the field of satellite image processing. When spatial data are correlated with time, series analysis it gives the significant results in change detection studies. In this paper the GIS and Remote sensing techniques has been used to find the change detection using time series satellite imagery of Uttarakhand state during the years of 1990-2010. Natural vegetation, urban area, forest cover etc. were chosen as main landuse classes to study. Landuse/ landcover classes within several years were prepared using satellite images. Maximum likelihood supervised classification technique was adopted in this work and finally landuse change index has been generated and graphical models were used to present the changes.

Keywords: GIS, landuse/landcover, spatial and temporal data, remote sensing

Procedia PDF Downloads 415

40715 State and Determinant of Caregiver’s Mental Health in Thailand: A Household Level Analysis

Authors: Ruttana Phetsitong, Patama Vapattanawong, Malee Sunpuwan, Marc Voelker

Abstract:

The majority of care for older people at home in Thai society falls upon caregivers resulting in caregiver’s mental health problem. Beyond individual characteristics, household factors might have a profound effect on the caregiver’s mental health. But reliable data capturing this at the household level have been limited to date. The objectives of the present study were to explore the levels of Thai caregiver’s mental health and to investigate the factors affecting the mental health at household level. Data were obtained from the 2011 National Survey of Thai Older Persons conducted by the National Statistical Office of Thailand. Caregiver’s mental health was measured by using the 15- items-short version of the Thai Mental Health Indicator (TMHI-15) developed by the Department of Mental Health, the Ministry of Public Health. Multivariate logistic regression models were used to explore the impact of potential factors on caregiver’s mental health. The THMI-15 produced an overall average caregiver mental health score of 30.9 out of 45 (SD 5.3). The score can be categorized into good (34.02-45), fair (27.01-34), and poor (0-27). Duration of care for older people, household wealth, and functional dependency of the older people significantly predicted total caregiver’s mental health. Household economic factor was key in predicting better mental health. Compared to those poorest households, the adjusted effect of the fifth quintile household wealth was high (OR=2.34; 95%CI=1.47-3.73). The findings of this study provide a fuller picture to a better understanding of the level and factors that cause the mental health of Thai caregivers. Health care providers and policymakers should consider these factors when designing interventions aimed at alleviating caregiver’s psychological burden when provided care for older people at home.

Keywords: caregiver’s mental health, household, older people, Thailand

Procedia PDF Downloads 125

‹
1
2
...
9
10
11
12
13
14
15
...
1369
1370
›