Search results for: statistical data analysis
42818 Identification of Hepatocellular Carcinoma Using Supervised Learning Algorithms
Authors: Sagri Sharma
Abstract:
Analysis of diseases integrating multi-factors increases the complexity of the problem and therefore, development of frameworks for the analysis of diseases is an issue that is currently a topic of intense research. Due to the inter-dependence of the various parameters, the use of traditional methodologies has not been very effective. Consequently, newer methodologies are being sought to deal with the problem. Supervised Learning Algorithms are commonly used for performing the prediction on previously unseen data. These algorithms are commonly used for applications in fields ranging from image analysis to protein structure and function prediction and they get trained using a known dataset to come up with a predictor model that generates reasonable predictions for the response to new data. Gene expression profiles generated by DNA analysis experiments can be quite complex since these experiments can involve hypotheses involving entire genomes. The application of well-known machine learning algorithm - Support Vector Machine - to analyze the expression levels of thousands of genes simultaneously in a timely, automated and cost effective way is thus used. The objectives to undertake the presented work are development of a methodology to identify genes relevant to Hepatocellular Carcinoma (HCC) from gene expression dataset utilizing supervised learning algorithms and statistical evaluations along with development of a predictive framework that can perform classification tasks on new, unseen data.Keywords: artificial intelligence, biomarker, gene expression datasets, hepatocellular carcinoma, machine learning, supervised learning algorithms, support vector machine
Procedia PDF Downloads 43142817 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop
Authors: Anuta Mukherjee, Saswati Mukherjee
Abstract:
Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.Keywords: sentiment analysis, twitter, collision theory, discourse analysis
Procedia PDF Downloads 54042816 Dissecting Big Trajectory Data to Analyse Road Network Travel Efficiency
Authors: Rania Alshikhe, Vinita Jindal
Abstract:
Digital innovation has played a crucial role in managing smart transportation. For this, big trajectory data collected from traveling vehicles, such as taxis through installed global positioning system (GPS)-enabled devices can be utilized. It offers an unprecedented opportunity to trace the movements of vehicles in fine spatiotemporal granularity. This paper aims to explore big trajectory data to measure the travel efficiency of road networks using the proposed statistical travel efficiency measure (STEM) across an entire city. Further, it identifies the cause of low travel efficiency by proposed least square approximation network-based causality exploration (LANCE). Finally, the resulting data analysis reveals the causes of low travel efficiency, along with the road segments that need to be optimized to improve the traffic conditions and thus minimize the average travel time from given point A to point B in the road network. Obtained results show that our proposed approach outperforms the baseline algorithms for measuring the travel efficiency of the road network.Keywords: GPS trajectory, road network, taxi trips, digital map, big data, STEM, LANCE
Procedia PDF Downloads 16042815 Risk of Heatstroke Occurring in Indoor Built Environment Determined with Nationwide Sports and Health Database and Meteorological Outdoor Data
Authors: Go Iwashita
Abstract:
The paper describes how the frequencies of heatstroke occurring in indoor built environment are related to the outdoor thermal environment with big statistical data. As the statistical accident data of heatstroke, the nationwide accident data were obtained from the National Agency for the Advancement of Sports and Health (NAASH) . The meteorological database of the Japanese Meteorological Agency supplied data about 1-hour average temperature, humidity, wind speed, solar radiation, and so forth. Each heatstroke data point from the NAASH database was linked to the meteorological data point acquired from the nearest meteorological station where the accident of heatstroke occurred. This analysis was performed for a 10-year period (2005–2014). During the 10-year period, 3,819 cases of heatstroke were reported in the NAASH database for the investigated secondary/high schools of the nine Japanese representative cities. Heatstroke most commonly occurred in the outdoor schoolyard at a wet-bulb globe temperature (WBGT) of 31°C and in the indoor gymnasium during athletic club activities at a WBGT > 31°C. The determined accident ratio (number of accidents during each club activity divided by the club’s population) in the gymnasium during the female badminton club activities was the highest. Although badminton is played in a gymnasium, these WBGT results show that the risk level during badminton under hot and humid conditions is equal to that of baseball or rugby played in the schoolyard. Except sports, the high risk of heatstroke was observed in schools houses during cultural activities. The risk level for indoor environment under hot and humid condition would be equal to that for outdoor environment based on the above results of WBGT. Therefore control measures against hot and humid indoor condition were needed as installing air conditions not only schools but also residences.Keywords: accidents in schools, club activity, gymnasium, heatstroke
Procedia PDF Downloads 21942814 Drug Therapy Problem and Its Contributing Factors among Pediatric Patients with Infectious Diseases Admitted to Jimma University Medical Center, South West Ethiopia: Prospective Observational Study
Authors: Desalegn Feyissa Desu
Abstract:
Drug therapy problem is a significant challenge to provide high quality health care service for the patients. It is associated with morbidity, mortality, increased hospital stay, and reduced quality of life. Moreover, pediatric patients are quite susceptible to drug therapy problems. Thus this study aimed to assess drug therapy problem and its contributing factors among pediatric patients diagnosed with infectious disease admitted to pediatric ward of Jimma university medical center, from April 1 to June 30, 2018. Prospective observational study was conducted among pediatric patients with infectious disease admitted from April 01 to June 30, 2018. Drug therapy problems were identified by using Cipolle’s and strand’s drug related problem classification method. Patient’s written informed consent was obtained after explaining the purpose of the study. Patient’s specific data were collected using structured questionnaire. Data were entered into Epi data version 4.0.2 and then exported to statistical software package version 21.0 for analysis. To identify predictors of drug therapy problems occurrence, multiple stepwise backward logistic regression analysis was done. The 95% CI was used to show the accuracy of data analysis and statistical significance was considered at p-value < 0.05. A total of 304 pediatric patients were included in the study. Of these, 226(74.3%) patients had at least one drug therapy problem during their hospital stay. A total of 356 drug therapy problems were identified among two hundred twenty six patients. Non-compliance (28.65%) and dose too low (27.53%) were the most common type of drug related problems while disease comorbidity [AOR=3.39, 95% CI= (1.89-6.08)], Polypharmacy [AOR=3.16, 95% CI= (1.61-6.20)] and more than six days stay in hospital [AOR=3.37, 95% CI= (1.71-6.64) were independent predictors of drug therapy problem occurrence. Drug therapy problems were common in pediatric patients with infectious disease in the study area. Presence of comorbidity, polypharmacy and prolonged hospital stay were the predictors of drug therapy problem in study area. Therefore, to overcome the significant gaps in pediatric pharmaceutical care, clinical pharmacists, Pediatricians, and other health care professionals have to work in collaboration.Keywords: drug therapy problem, pediatric, infectious disease, Ethiopia
Procedia PDF Downloads 15742813 Anxiety and Depression in Caregivers of Autistic Children
Authors: Mou Juliet Rebeiro, S. M. Abul Kalam Azad
Abstract:
This study was carried out to see the anxiety and depression in caregivers of autistic children. The objectives of the research were to assess depression and anxiety among caregivers of autistic children and to find out the experience of caregivers. For this purpose, the research was conducted on a sample of 39 caregivers of autistic children. Participants were taken from a special school. To collect data for this study each of the caregivers were administered questionnaire comprising scales to measure anxiety and depression and some responses of the participants were taken through interview based on a topic guide. Obtained quantitative data were analyzed by using statistical analysis and qualitative data were analyzed according to themes. Mean of the anxiety score (55.85) and depression score (108.33) is above the cutoff point. Results showed that anxiety and depression is clinically present in caregivers of autistic children. Most of the caregivers experienced behavior, emotional, cognitive and social problems of their child that is linked with anxiety and depression.Keywords: anxiety, autism, caregiver, depression
Procedia PDF Downloads 30842812 Monitoring Blood Pressure Using Regression Techniques
Authors: Qasem Qananwah, Ahmad Dagamseh, Hiam AlQuran, Khalid Shaker Ibrahim
Abstract:
Blood pressure helps the physicians greatly to have a deep insight into the cardiovascular system. The determination of individual blood pressure is a standard clinical procedure considered for cardiovascular system problems. The conventional techniques to measure blood pressure (e.g. cuff method) allows a limited number of readings for a certain period (e.g. every 5-10 minutes). Additionally, these systems cause turbulence to blood flow; impeding continuous blood pressure monitoring, especially in emergency cases or critically ill persons. In this paper, the most important statistical features in the photoplethysmogram (PPG) signals were extracted to estimate the blood pressure noninvasively. PPG signals from more than 40 subjects were measured and analyzed and 12 features were extracted. The features were fed to principal component analysis (PCA) to find the most important independent features that have the highest correlation with blood pressure. The results show that the stiffness index means and standard deviation for the beat-to-beat heart rate were the most important features. A model representing both features for Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP) was obtained using a statistical regression technique. Surface fitting is used to best fit the series of data and the results show that the error value in estimating the SBP is 4.95% and in estimating the DBP is 3.99%.Keywords: blood pressure, noninvasive optical system, principal component analysis, PCA, continuous monitoring
Procedia PDF Downloads 16442811 Attributes That Influence Respondents When Choosing a Mate in Internet Dating Sites: An Innovative Matching Algorithm
Authors: Moti Zwilling, Srečko Natek
Abstract:
This paper aims to present an innovative predictive analytics analysis in order to find the best combination between two consumers who strive to find their partner or in internet sites. The methodology shown in this paper is based on analysis of consumer preferences and involves data mining and machine learning search techniques. The study is composed of two parts: The first part examines by means of descriptive statistics the correlations between a set of parameters that are taken between man and women where they intent to meet each other through the social media, usually the internet. In this part several hypotheses were examined and statistical analysis were taken place. Results show that there is a strong correlation between the affiliated attributes of man and woman as long as concerned to how they present themselves in a social media such as "Facebook". One interesting issue is the strong desire to develop a serious relationship between most of the respondents. In the second part, the authors used common data mining algorithms to search and classify the most important and effective attributes that affect the response rate of the other side. Results exhibit that personal presentation and education background are found as most affective to achieve a positive attitude to one's profile from the other mate.Keywords: dating sites, social networks, machine learning, decision trees, data mining
Procedia PDF Downloads 29942810 Information Extraction Based on Search Engine Results
Authors: Mohammed R. Elkobaisi, Abdelsalam Maatuk
Abstract:
The search engines are the large scale information retrieval tools from the Web that are currently freely available to all. This paper explains how to convert the raw resulted number of search engines into useful information. This represents a new method for data gathering comparing with traditional methods. When a query is submitted for a multiple numbers of keywords, this take a long time and effort, hence we develop a user interface program to automatic search by taking multi-keywords at the same time and leave this program to collect wanted data automatically. The collected raw data is processed using mathematical and statistical theories to eliminate unwanted data and converting it to usable data.Keywords: search engines, information extraction, agent system
Procedia PDF Downloads 43342809 Assessing the Prevalence of Taste Loss Among Adults Who Have Contracted SARS-CoV-2
Authors: Alketa Qafmolla, Mimoza Canga, Edit Xhajanka, Vergjini Mulo, Ramazan Isufi, Vito Antonio Malagnino
Abstract:
COVID-19 is threatening the lives of people all over the world. A number of health problems, including oral health problems, have been linked to SARS-CoV-2 infection. Loss of taste is one of the initial symptoms presented by patients who have COVID-19. Purpose: The aim of the current study is to determine the prevalence of taste loss in young adults aged 18 to 26 who have contracted SARS-CoV-2. Materials and methods: This study is analytical cross-sectional research conducted in Albania from March 2023 to September 2023. Our research included a total of 157 students, of which 100 (63.7%) were female and 57 (36.3%) were male. They were divided into three age groups: 18-20, 21-23, and 24-26 years old. Students willingly agreed to participate in the current study and were assured that their participation would be kept anonymous. The study recorded no dropouts and was conducted in accordance with the Declaration of Helsinki. Statistical analysis was performed using IBM SPSS Statistics Version 23.0 on Microsoft Windows Linux, Chicago, IL, USA. The evaluation of data was done using analysis of variance (ANOVA), with a significance level set at P ≤ 0.05. Results: 113 (72%) of the participants reported loss of taste, while 44 (28%) did not experience any loss of taste. According to the study's data analysis, taste problems typically manifest over three days, with the lowest frequency occurring on the second day and the highest frequency occurring on the fifteenth. 68.7% of participants reported experiencing taste recovery after three weeks. The present study's findings demonstrated a substantial correlation between the duration of the individuals' COVID-19 infection and taste loss (P <0.0003). Based on the statistical analysis of the data, this study shows that there is no association between gender and loss of taste (P = 0.218). The participants reported having undergone the following treatments: prednisolone sodium phosphate (15 mg/5 mL daily), vitamin C (1000 mg), azithromycin (500 mg daily), oral vitamin D3 supplementation of 5000 IU daily, vitamin B12 (2.4 mcg daily), zinc 20 mg daily, Augmentin tablets (625 mg), and magnesium sulfate (4 g/100 mL). Conclusion: Within the limitations of this study conducted in Albania, it can be concluded that loss of taste was present in 72% of participants infected with COVID-19 and recovery was evident after three weeks.Keywords: adult, Albania, COVID-19, cross-sectional study, loss of taste
Procedia PDF Downloads 3642808 Adaptive Process Monitoring for Time-Varying Situations Using Statistical Learning Algorithms
Authors: Seulki Lee, Seoung Bum Kim
Abstract:
Statistical process control (SPC) is a practical and effective method for quality control. The most important and widely used technique in SPC is a control chart. The main goal of a control chart is to detect any assignable changes that affect the quality output. Most conventional control charts, such as Hotelling’s T2 charts, are commonly based on the assumption that the quality characteristics follow a multivariate normal distribution. However, in modern complicated manufacturing systems, appropriate control chart techniques that can efficiently handle the nonnormal processes are required. To overcome the shortcomings of conventional control charts for nonnormal processes, several methods have been proposed to combine statistical learning algorithms and multivariate control charts. Statistical learning-based control charts, such as support vector data description (SVDD)-based charts, k-nearest neighbors-based charts, have proven their improved performance in nonnormal situations compared to that of the T2 chart. Beside the nonnormal property, time-varying operations are also quite common in real manufacturing fields because of various factors such as product and set-point changes, seasonal variations, catalyst degradation, and sensor drifting. However, traditional control charts cannot accommodate future condition changes of the process because they are formulated based on the data information recorded in the early stage of the process. In the present paper, we propose a SVDD algorithm-based control chart, which is capable of adaptively monitoring time-varying and nonnormal processes. We reformulated the SVDD algorithm into a time-adaptive SVDD algorithm by adding a weighting factor that reflects time-varying situations. Moreover, we defined the updating region for the efficient model-updating structure of the control chart. The proposed control chart simultaneously allows efficient model updates and timely detection of out-of-control signals. The effectiveness and applicability of the proposed chart were demonstrated through experiments with the simulated data and the real data from the metal frame process in mobile device manufacturing.Keywords: multivariate control chart, nonparametric method, support vector data description, time-varying process
Procedia PDF Downloads 30342807 Modeling the Demand for the Healthcare Services Using Data Analysis Techniques
Authors: Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Roman D. Zaitsev
Abstract:
Rapidly evolving modern data analysis technologies in healthcare play a large role in understanding the operation of the system and its characteristics. Nowadays, one of the key tasks in urban healthcare is to optimize the resource allocation. Thus, the application of data analysis in medical institutions to solve optimization problems determines the significance of this study. The purpose of this research was to establish the dependence between the indicators of the effectiveness of the medical institution and its resources. Hospital discharges by diagnosis; hospital days of in-patients and in-patient average length of stay were selected as the performance indicators and the demand of the medical facility. The hospital beds by type of care, medical technology (magnetic resonance tomography, gamma cameras, angiographic complexes and lithotripters) and physicians characterized the resource provision of medical institutions for the developed models. The data source for the research was an open database of the statistical service Eurostat. The choice of the source is due to the fact that the databases contain complete and open information necessary for research tasks in the field of public health. In addition, the statistical database has a user-friendly interface that allows you to quickly build analytical reports. The study provides information on 28 European for the period from 2007 to 2016. For all countries included in the study, with the most accurate and complete data for the period under review, predictive models were developed based on historical panel data. An attempt to improve the quality and the interpretation of the models was made by cluster analysis of the investigated set of countries. The main idea was to assess the similarity of the joint behavior of the variables throughout the time period under consideration to identify groups of similar countries and to construct the separate regression models for them. Therefore, the original time series were used as the objects of clustering. The hierarchical agglomerate algorithm k-medoids was used. The sampled objects were used as the centers of the clusters obtained, since determining the centroid when working with time series involves additional difficulties. The number of clusters used the silhouette coefficient. After the cluster analysis it was possible to significantly improve the predictive power of the models: for example, in the one of the clusters, MAPE error was only 0,82%, which makes it possible to conclude that this forecast is highly reliable in the short term. The obtained predicted values of the developed models have a relatively low level of error and can be used to make decisions on the resource provision of the hospital by medical personnel. The research displays the strong dependencies between the demand for the medical services and the modern medical equipment variable, which highlights the importance of the technological component for the successful development of the medical facility. Currently, data analysis has a huge potential, which allows to significantly improving health services. Medical institutions that are the first to introduce these technologies will certainly have a competitive advantage.Keywords: data analysis, demand modeling, healthcare, medical facilities
Procedia PDF Downloads 14742806 The Influence of the Vocational Teachers Empowerment toward the Vocational High Schools’ Performance Based on the Education National Standards of Indonesia
Authors: Abdul Haris Setiawan
Abstract:
Teachers empowerment is one of the important factors considered to contribute significantly to the achievement of the national education goals. This study was conducted to determine the influence on the vocational teachers empowerment toward the performance of the vocational high schools based on the Education National Standards of Indonesia. The population of the study was all vocational teachers at the State Vocational High schools in Surakarta, Central Java Province, Indonesia. The sampling technique used proportional random sampling technique. This study used a quantitative descriptive statistical analysis techniques. The data was collected using questionnaires. The data has been collected and then tested using analysis requirements test. Having tested using the requirements analysis and then the data processed using regression analysis between the independent and dependent variables to determine the effect and the regression equation. The results of the study found that the level of vocational high schools’ performance based on the Education National Standards of Indonesia was 74.29%, including in the high category; the level of vocational teachers empowerment was 76.20%, including in the high category; there was a positive influence of vocational teachers empowerment toward the vocational high schools’ performance based on the Education National Standards of Indonesia with a correlation coefficient of 0,886, and a contribution of 78.50% with the regression equation Y = 79.431 +0.534 X.Keywords: vocational teachers, empowerment, vocational high school, the education national standards
Procedia PDF Downloads 39742805 The Impact of Artificial Intelligence on Qualty Conrol and Quality
Authors: Mary Moner Botros Fanawel
Abstract:
Many companies use the statistical tool named as statistical quality control, and which can have a high cost for the companies interested on these statistical tools. The evaluation of the quality of products and services is an important topic, but the reduction of the cost of the implantation of the statistical quality control also has important benefits for the companies. For this reason, it is important to implement a economic design for the various steps included into the statistical quality control. In this paper, we describe some relevant aspects related to the economic design of a quality control chart for the proportion of defective items. They are very important because the suggested issues can reduce the cost of implementing a quality control chart for the proportion of defective items. Note that the main purpose of this chart is to evaluate and control the proportion of defective items of a production process.Keywords: model predictive control, hierarchical control structure, genetic algorithm, water quality with DBPs objectives proportion, type I error, economic plan, distribution function bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics
Procedia PDF Downloads 6642804 Analysis of Expression Data Using Unsupervised Techniques
Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe
Abstract:
his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation
Procedia PDF Downloads 15242803 Estimating Knowledge Flow Patterns of Business Method Patents with a Hidden Markov Model
Authors: Yoonjung An, Yongtae Park
Abstract:
Knowledge flows are a critical source of faster technological progress and stouter economic growth. Knowledge flows have been accelerated dramatically with the establishment of a patent system in which each patent is required by law to disclose sufficient technical information for the invention to be recreated. Patent analysis, thus, has been widely used to help investigate technological knowledge flows. However, the existing research is limited in terms of both subject and approach. Particularly, in most of the previous studies, business method (BM) patents were not covered although they are important drivers of knowledge flows as other patents. In addition, these studies usually focus on the static analysis of knowledge flows. Some use approaches that incorporate the time dimension, yet they still fail to trace a true dynamic process of knowledge flows. Therefore, we investigate dynamic patterns of knowledge flows driven by BM patents using a Hidden Markov Model (HMM). An HMM is a popular statistical tool for modeling a wide range of time series data, with no general theoretical limit in regard to statistical pattern classification. Accordingly, it enables characterizing knowledge patterns that may differ by patent, sector, country and so on. We run the model in sets of backward citations and forward citations to compare the patterns of knowledge utilization and knowledge dissemination.Keywords: business method patents, dynamic pattern, Hidden-Markov Model, knowledge flow
Procedia PDF Downloads 33342802 Vehicles Analysis, Assessment and Redesign Related to Ergonomics and Human Factors
Authors: Susana Aragoneses Garrido
Abstract:
Every day, the roads are scenery of numerous accidents involving vehicles, producing thousands of deaths and serious injuries all over the world. Investigations have revealed that Human Factors (HF) are one of the main causes of road accidents in modern societies. Distracted driving (including external or internal aspects of the vehicle), which is considered as a human factor, is a serious and emergent risk to road safety. Consequently, a further analysis regarding this issue is essential due to its transcendence on today’s society. The objectives of this investigation are the detection and assessment of the HF in order to provide solutions (including a better vehicle design), which might mitigate road accidents. The methodology of the project is divided in different phases. First, a statistical analysis of public databases is provided between Spain and The UK. Second, data is classified in order to analyse the major causes involved in road accidents. Third, a simulation between different paths and vehicles is presented. The causes related to the HF are assessed by Failure Mode and Effects Analysis (FMEA). Fourth, different car models are evaluated using the Rapid Upper Body Assessment (RULA). Additionally, the JACK SIEMENS PLM tool is used with the intention of evaluating the Human Factor causes and providing the redesign of the vehicles. Finally, improvements in the car design are proposed with the intention of reducing the implication of HF in traffic accidents. The results from the statistical analysis, the simulations and the evaluations confirm that accidents are an important issue in today’s society, especially the accidents caused by HF resembling distractions. The results explore the reduction of external and internal HF through the global analysis risk of vehicle accidents. Moreover, the evaluation of the different car models using RULA method and the JACK SIEMENS PLM prove the importance of having a good regulation of the driver’s seat in order to avoid harmful postures and therefore distractions. For this reason, a car redesign is proposed for the driver to acquire the optimum position and consequently reducing the human factors in road accidents.Keywords: analysis vehicles, asssesment, ergonomics, car redesign
Procedia PDF Downloads 34242801 Prevalance and Factors Associated with Domestic Violence among Preganant Women in Southwest Ethiopia
Authors: Bediru Abamecha
Abstract:
Background: Domestic violence is a global problem that occurs regardless of culture, ethnicity or socio-economic class. It is known to be responsible for numerous hospital visits undertaken by women. Violence on pregnant women is a health and social problem that poses particular risks to the woman and her unborn child. Objective: The Objective of this study will be to assess prevalence of domestic violence and its correalates among pregnant women in Manna Woreda of Jimma Zone. Methods: Simple Random Sampling technique will be used to select 12 kebeles (48% of the study area) and Systematic Sampling will be used to reach to the house hold in selected kebeles in manna woreda of Jimma zone, south west Ethiopia from february 15-25, 2011. An in-depth interview will be conducted on Women affairs, police office and Nurses working and minimum of 4FGD with 6-8 members on pregnant women and selected male from the community. SPSS version 16.0 will be used to enter, clean and analyze the data. Descriptive statistics such as mean or median for continuous variables and percent for categorical variables will be made. Bivariate analysis will be used to check the association between independent variables and domestic violence. Variables found to have association with domestic violence will be entered to multiple logistic regressions for controlling the possible effect of confounders and finally the variables which had significance association will be identified on basis of OR, with 95% CI. All statistical significance will be considered at p<0.05. The qualitative data will be summarized manually and thematic analysis will be performed and finally both will be triangulated.Keywords: ante natal care, ethiopian demographic and health survey, domestic violence, statistical package for social science
Procedia PDF Downloads 52442800 Predictive Analytics for Theory Building
Authors: Ho-Won Jung, Donghun Lee, Hyung-Jin Kim
Abstract:
Predictive analytics (data analysis) uses a subset of measurements (the features, predictor, or independent variable) to predict another measurement (the outcome, target, or dependent variable) on a single person or unit. It applies empirical methods in statistics, operations research, and machine learning to predict the future, or otherwise unknown events or outcome on a single or person or unit, based on patterns in data. Most analyses of metabolic syndrome are not predictive analytics but statistical explanatory studies that build a proposed model (theory building) and then validate metabolic syndrome predictors hypothesized (theory testing). A proposed theoretical model forms with causal hypotheses that specify how and why certain empirical phenomena occur. Predictive analytics and explanatory modeling have their own territories in analysis. However, predictive analytics can perform vital roles in explanatory studies, i.e., scientific activities such as theory building, theory testing, and relevance assessment. In the context, this study is to demonstrate how to use our predictive analytics to support theory building (i.e., hypothesis generation). For the purpose, this study utilized a big data predictive analytics platform TM based on a co-occurrence graph. The co-occurrence graph is depicted with nodes (e.g., items in a basket) and arcs (direct connections between two nodes), where items in a basket are fully connected. A cluster is a collection of fully connected items, where the specific group of items has co-occurred in several rows in a data set. Clusters can be ranked using importance metrics, such as node size (number of items), frequency, surprise (observed frequency vs. expected), among others. The size of a graph can be represented by the numbers of nodes and arcs. Since the size of a co-occurrence graph does not depend directly on the number of observations (transactions), huge amounts of transactions can be represented and processed efficiently. For a demonstration, a total of 13,254 metabolic syndrome training data is plugged into the analytics platform to generate rules (potential hypotheses). Each observation includes 31 predictors, for example, associated with sociodemographic, habits, and activities. Some are intentionally included to get predictive analytics insights on variable selection such as cancer examination, house type, and vaccination. The platform automatically generates plausible hypotheses (rules) without statistical modeling. Then the rules are validated with an external testing dataset including 4,090 observations. Results as a kind of inductive reasoning show potential hypotheses extracted as a set of association rules. Most statistical models generate just one estimated equation. On the other hand, a set of rules (many estimated equations from a statistical perspective) in this study may imply heterogeneity in a population (i.e., different subpopulations with unique features are aggregated). Next step of theory development, i.e., theory testing, statistically tests whether a proposed theoretical model is a plausible explanation of a phenomenon interested in. If hypotheses generated are tested statistically with several thousand observations, most of the variables will become significant as the p-values approach zero. Thus, theory validation needs statistical methods utilizing a part of observations such as bootstrap resampling with an appropriate sample size.Keywords: explanatory modeling, metabolic syndrome, predictive analytics, theory building
Procedia PDF Downloads 28342799 Time Series Modelling and Prediction of River Runoff: Case Study of Karkheh River, Iran
Authors: Karim Hamidi Machekposhti, Hossein Sedghi, Abdolrasoul Telvari, Hossein Babazadeh
Abstract:
Rainfall and runoff phenomenon is a chaotic and complex outcome of nature which requires sophisticated modelling and simulation methods for explanation and use. Time Series modelling allows runoff data analysis and can be used as forecasting tool. In the paper attempt is made to model river runoff data and predict the future behavioural pattern of river based on annual past observations of annual river runoff. The river runoff analysis and predict are done using ARIMA model. For evaluating the efficiency of prediction to hydrological events such as rainfall, runoff and etc., we use the statistical formulae applicable. The good agreement between predicted and observation river runoff coefficient of determination (R2) display that the ARIMA (4,1,1) is the suitable model for predicting Karkheh River runoff at Iran.Keywords: time series modelling, ARIMA model, river runoff, Karkheh River, CLS method
Procedia PDF Downloads 34542798 Analysis of Patient No-Shows According to Health Conditions
Authors: Sangbok Lee
Abstract:
There has been much effort on process improvement for outpatient clinics to provide quality and acute care to patients. One of the efforts is no-show analysis or prediction. This work analyzes patient no-shows along with patient health conditions. The health conditions refer to clinical symptoms that each patient has, out of the followings; hyperlipidemia, diabetes, metastatic solid tumor, dementia, chronic obstructive pulmonary disease, hypertension, coronary artery disease, myocardial infraction, congestive heart failure, atrial fibrillation, stroke, drug dependence abuse, schizophrenia, major depression, and pain. A dataset from a regional hospital is used to find the relationship between the number of the symptoms and no-show probabilities. Additional analysis reveals how each symptom or combination of symptoms affects no-shows. In the above analyses, cross-classification of patients by age and gender is carried out. The findings from the analysis will be used to take extra care to patients with particular health conditions. They will be forced to visit clinics by being informed about their health conditions and possible consequences more clearly. Moreover, this work will be used in the preparation of making institutional guidelines for patient reminder systems.Keywords: healthcare system, no show analysis, process improvment, statistical data analysis
Procedia PDF Downloads 23542797 The Effect of Core Training on Physical Fitness Characteristics in Male Volleyball Players
Authors: Sibel Karacaoglu, Fatma Ç. Kayapinar
Abstract:
The aim of the study is to investigate the effect of the core training program on physical fitness characteristics and body composition in male volleyball players. 26 male university volleyball team players aged between 19 to 24 years who had no health problems and injury participated in the study. Subjects were divided into training (TG) and control groups (CG) as randomly. Data from twenty-one players who completed all training sessions were used for statistical analysis (TG,n=11; CG,n=10). A core training program was applied to the training group three days a week for 10 weeks. On the other hand, the control group did not receive any training. Before and after the 10-week training program, pre- and post-testing comprised of body composition measurements (weight, BMI, bioelectrical impedance analysis) and physical fitness measurements including flexibility (sit and reach test), muscle strength (back, leg and grip strength by dynamometer), muscle endurance (sit-ups and push-ups tests), power (one-legged jump and vertical jump tests), speed (20m sprint, 30m sprint) and balance tests (one-legged standing test) were performed. Changes of pre- and post- test values of the groups were determined by using dependent t test. According to the statistical analysis of data, no significant difference was found in terms of body composition in the both groups for pre- and post- test values. In the training group, all physical fitness measurements improved significantly after core training program (p<0.05) except 30m speed and handgrip strength (p>0.05). On the hand, only 20m speed test values improved after post-test period (p<0.05), but the other physical fitness tests values did not differ (p>0.05) between pre- and post- test measurement in the control group. The results of the study suggest that the core training program has positive effect on physical fitness characteristics in male volleyball players.Keywords: body composition, core training, physical fitness, volleyball
Procedia PDF Downloads 34942796 Part of Speech Tagging Using Statistical Approach for Nepali Text
Authors: Archit Yajnik
Abstract:
Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm
Procedia PDF Downloads 33142795 Development of Automatic Laser Scanning Measurement Instrument
Authors: Chien-Hung Liu, Yu-Fen Chen
Abstract:
This study used triangular laser probe and three-axial direction mobile platform for surface measurement, programmed it and applied it to real-time analytic statistics of different measured data. This structure was used to design a system integration program: using triangular laser probe for scattering or reflection non-contact measurement, transferring the captured signals to the computer through RS-232, and using RS-485 to control the three-axis platform for a wide range of measurement. The data captured by the laser probe are formed into a 3D surface. This study constructed an optical measurement application program in the concept of visual programming language. First, the signals are transmitted to the computer through RS-232/RS-485, and then the signals are stored and recorded in graphic interface timely. This programming concept analyzes various messages, and makes proper presentation graphs and data processing to provide the users with friendly graphic interfaces and data processing state monitoring, and identifies whether the present data are normal in graphic concept. The major functions of the measurement system developed by this study are thickness measurement, SPC, surface smoothness analysis, and analytical calculation of trend line. A result report can be made and printed promptly. This study measured different heights and surfaces successfully, performed on-line data analysis and processing effectively, and developed a man-machine interface for users to operate.Keywords: laser probe, non-contact measurement, triangulation measurement principle, statistical process control, labVIEW
Procedia PDF Downloads 36642794 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands
Authors: Julio Albuja, David Zaldumbide
Abstract:
Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.Keywords: algorithms, data, decision tree, transformation
Procedia PDF Downloads 37942793 Exploratory Study of the Influencing Factors for Hotels' Competitors
Authors: Asma Ameur, Dhafer Malouche
Abstract:
Hotel competitiveness research is an essential phase of the marketing strategy for any hotel. Certainly, knowing the hotels' competitors helps the hotelier to grasp its position in the market and the citizen to make the right choice in picking a hotel. Thus, competitiveness is an important indicator that can be influenced by various factors. In fact, the issue of competitiveness, this ability to cope with competition, remains a difficult and complex concept to define and to exploit. Therefore, the purpose of this article is to make an exploratory study to calculate a competitiveness indicator for hotels. Further on, this paper makes it possible to determine the criteria of direct or indirect effect on the image and the perception of a hotel. The actual research is used to look into the right model for hotel ‘competitiveness. For this reason, we exploit different theoretical contributions in the field of machine learning. Thus, we use some statistical techniques such as the Principal Component Analysis (PCA) to reduce the dimensions, as well as other techniques of statistical modeling. This paper presents a survey covering of the techniques and methods in hotel competitiveness research. Furthermore, this study allows us to deduct the significant variables that influence the determination of hotel’s competitors. Lastly, the discussed experiences in this article found that the hotel competitors are influenced by several factors with different rates.Keywords: competitiveness, e-reputation, hotels' competitors, online hotel’ review, principal component analysis, statistical modeling
Procedia PDF Downloads 12242792 Artificial Intelligence Approach to Water Treatment Processes: Case Study of Daspoort Treatment Plant, South Africa
Authors: Olumuyiwa Ojo, Masengo Ilunga
Abstract:
Artificial neural network (ANN) has broken the bounds of the convention programming, which is actually a function of garbage in garbage out by its ability to mimic the human brain. Its ability to adopt, adapt, adjust, evaluate, learn and recognize the relationship, behavior, and pattern of a series of data set administered to it, is tailored after the human reasoning and learning mechanism. Thus, the study aimed at modeling wastewater treatment process in order to accurately diagnose water control problems for effective treatment. For this study, a stage ANN model development and evaluation methodology were employed. The source data analysis stage involved a statistical analysis of the data used in modeling in the model development stage, candidate ANN architecture development and then evaluated using a historical data set. The model was developed using historical data obtained from Daspoort Wastewater Treatment plant South Africa. The resultant designed dimensions and model for wastewater treatment plant provided good results. Parameters considered were temperature, pH value, colour, turbidity, amount of solids and acidity. Others are total hardness, Ca hardness, Mg hardness, and chloride. This enables the ANN to handle and represent more complex problems that conventional programming is incapable of performing.Keywords: ANN, artificial neural network, wastewater treatment, model, development
Procedia PDF Downloads 15442791 Assessing the Self-Directed Learning Skills of the Undergraduate Nursing Students in a Medical University in Bahrain: A Quantitative Study
Authors: Catherine Mary Abou-Zaid
Abstract:
This quantitative study discusses the concerns with the self-directed learning (SDL) skills of the undergraduate nursing students in a medical university in Bahrain. The nursing undergraduate student SDL study was conducted taking all 4 years and compiling data collected from the students themselves by survey questionnaire. The aim of the study is to understand and change the attitudes of self-directed learning among the undergraduate students. The SDL of the undergraduate student nurses has been noticed to be lacking and motivation to actually perform without supervision while out-with classrooms are very low. Their use of the resources available on the virtual learning environment and also within the university is not as good as it should be for a university student at this level. They do not use them to their own advantage. They are not prepared for the transition from high school to an academic environment such as a university or college. For some students it is the first time in their academic lives that they have faced sharing a classroom with the opposite sex. For some this is a major issue and we as academics need to be aware of all issues that they come to higher education with. Design Methodology: The design methodology that was chosen was a quantitative design using convenience sampling of the students who would be asked to complete survey questionnaire. This sampling method was chosen because of the time constraint. This was completed by the undergraduate students themselves while in class. The questionnaire was analyzed by the statistical package for social sciences (SPSS), the results interpreted by the researcher and the findings published in the paper. The analyzed data will also be reported on and from this information we as educators will be able to see the student’s weaknesses regarding self-directed learning. The aims and objectives of the research will be used as recommendations for the improvement of resources for the students to improve their SDL skills. Conclusion: The results will be able to give the educators an insight to how we can change the self-directed learning techniques of the students and enable them to embrace the skills and to focus more on being self-directed in their studies rather than having to be put on to a SDL pathway from the educators themselves. This evidence will come from the analysis of the statistical data. It may even change the way in which the students are selected for the nursing programme. These recommendations will be reported to the head of school and also to the nursing faculty.Keywords: self-directed learning, undergraduate students, transition, statistical package for social sciences (SPSS), higher education
Procedia PDF Downloads 32042790 In-Depth Analysis of Involved Factors to Car-Motorcycle Accidents in Budapest City
Authors: Danish Farooq, Janos Juhasz
Abstract:
Car-motorcycle accidents have been observed higher in recent years, which caused mainly riders’ fatalities and serious injuries. In-depth crash investigation methods aim to investigate the main factors which are likely involved in fatal road accidents and injury outcomes. The main objective of this study is to investigate the involved factors in car-motorcycle accidents in Budapest city. The procedure included statistical analysis and data sampling to identify car-motorcycle accidents by dominant accident types based on collision configurations. The police report was used as a data source for specified accidents, and simulation models were plotted according to scale (M 1:200). Car-motorcycle accidents were simulated in Virtual Crash software for 5 seconds before the collision. The simulation results showed that the main involved factors to car-motorcycle accidents were human behavior and view obstructions. The comprehensive, in-depth analysis also found that most of the car drivers and riders were unable to perform collision avoidance manoeuvres before the collision. This study can help the traffic safety authorities to focus on simulated involved factors to solve road safety issues in car-motorcycle accidents. The study also proposes safety measures to improve safe movements among road users.Keywords: car motorcycle accidents, in-depth analysis, microscopic simulation, safety measures
Procedia PDF Downloads 15442789 Regional Hydrological Extremes Frequency Analysis Based on Statistical and Hydrological Models
Authors: Hadush Kidane Meresa
Abstract:
The hydrological extremes frequency analysis is the foundation for the hydraulic engineering design, flood protection, drought management and water resources management and planning to utilize the available water resource to meet the desired objectives of different organizations and sectors in a country. This spatial variation of the statistical characteristics of the extreme flood and drought events are key practice for regional flood and drought analysis and mitigation management. For different hydro-climate of the regions, where the data set is short, scarcity, poor quality and insufficient, the regionalization methods are applied to transfer at-site data to a region. This study aims in regional high and low flow frequency analysis for Poland River Basins. Due to high frequent occurring of hydrological extremes in the region and rapid water resources development in this basin have caused serious concerns over the flood and drought magnitude and frequencies of the river in Poland. The magnitude and frequency result of high and low flows in the basin is needed for flood and drought planning, management and protection at present and future. Hydrological homogeneous high and low flow regions are formed by the cluster analysis of site characteristics, using the hierarchical and C- mean clustering and PCA method. Statistical tests for regional homogeneity are utilized, by Discordancy and Heterogeneity measure tests. In compliance with results of the tests, the region river basin has been divided into ten homogeneous regions. In this study, frequency analysis of high and low flows using AM for high flow and 7-day minimum low flow series is conducted using six statistical distributions. The use of L-moment and LL-moment method showed a homogeneous region over entire province with Generalized logistic (GLOG), Generalized extreme value (GEV), Pearson type III (P-III), Generalized Pareto (GPAR), Weibull (WEI) and Power (PR) distributions as the regional drought and flood frequency distributions. The 95% percentile and Flow duration curves of 1, 7, 10, 30 days have been plotted for 10 stations. However, the cluster analysis performed two regions in west and east of the province where L-moment and LL-moment method demonstrated the homogeneity of the regions and GLOG and Pearson Type III (PIII) distributions as regional frequency distributions for each region, respectively. The spatial variation and regional frequency distribution of flood and drought characteristics for 10 best catchment from the whole region was selected and beside the main variable (streamflow: high and low) we used variables which are more related to physiographic and drainage characteristics for identify and delineate homogeneous pools and to derive best regression models for ungauged sites. Those are mean annual rainfall, seasonal flow, average slope, NDVI, aspect, flow length, flow direction, maximum soil moisture, elevation, and drainage order. The regional high-flow or low-flow relationship among one streamflow characteristics with (AM or 7-day mean annual low flows) some basin characteristics is developed using Generalized Linear Mixed Model (GLMM) and Generalized Least Square (GLS) regression model, providing a simple and effective method for estimation of flood and drought of desired return periods for ungauged catchments.Keywords: flood , drought, frequency, magnitude, regionalization, stochastic, ungauged, Poland
Procedia PDF Downloads 605