Search results for: Correlated Binary data

7253 A System for Analyzing and Eliciting Public Grievances Using Cache Enabled Big Data

Abstract:

The system for analyzing and eliciting public grievances serves its main purpose to receive and process all sorts of complaints from the public and respond to users. Due to the more number of complaint data becomes big data which is difficult to store and process. The proposed system uses HDFS to store the big data and uses MapReduce to process the big data. The concept of cache was applied in the system to provide immediate response and timely action using big data analytics. Cache enabled big data increases the response time of the system. The unstructured data provided by the users are efficiently handled through map reduce algorithm. The processing of complaints takes place in the order of the hierarchy of the authority. The drawbacks of the traditional database system used in the existing system are set forth by our system by using Cache enabled Hadoop Distributed File System. MapReduce framework codes have the possible to leak the sensitive data through computation process. We propose a system that add noise to the output of the reduce phase to avoid signaling the presence of sensitive data. If the complaints are not processed in the ample time, then automatically it is forwarded to the higher authority. Hence it ensures assurance in processing. A copy of the filed complaint is sent as a digitally signed PDF document to the user mail id which serves as a proof. The system report serves to be an essential data while making important decisions based on legislation.

Keywords: Big Data, Hadoop, HDFS, Caching, MapReduce, web personalization, e-governance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587

7252 Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.

Keywords: automatic clustering, bioinformatics tool, gene ontology, protein sequence annotation, semantic similarity search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3122

7251 Cobalamin, Folate and Metabolic Syndrome Parameters in Pediatric Morbid Obesity and Metabolic Syndrome

Authors: Mustafa M. Donma, Orkide Donma

Abstract:

Obesity is known to be associated with many clinically important diseases including metabolic syndrome (MetS). Vitamin B₁₂ plays essential roles in fat and protein metabolisms and its cooperation with vitamin B₉is well-known. The aim of this study is to investigate the possible contributions as well as associations of these micronutrients upon obesity and MetS during childhood. A total of 128 children admitted to Namik Kemal University, Medical Faculty, Department of Pediatrics Outpatient Clinics were included into the scope of this study. The mean age±SEM of 92 morbid obese (MO) children and 36 with MetS were 118.3±3.8 months and 129.5±6.4 months, respectively (p > 0.05). The study was approved by Namık Kemal University, Medical Faculty Ethics Committee. Written informed consent forms were obtained from the parents. Demographic features and anthropometric measurements were recorded. WHO BMI-for age percentiles were used. The values above 99 percentile were defined as MO. Components of MetS [waist circumference (WC), fasting blood glucose (FBG), triacylglycerol (TRG), high density lipoprotein cholesterol (HDL-Chol), systolic pressure (SP), diastolic pressure (DP)] were determined. Routine laboratory tests were performed. Serum vitamin B₁₂ concentrations were measured using electrochemiluminescence immunoassay. Vitamin B₉ was analyzed by an immunoassay analyzer. Values for vitamin B₁₂ < 148 pmol/L, 148-221 pmol/L, > 221 pmol/L were accepted as low, borderline and normal, respectively. Vitamin B₉ levels ≤ 4 mcg/L defined deficiency state. Statistical evaluations were performed by SPSSx Version 16.0. p≤0.05 was accepted as statistical significance level. Statistically higher body mass index (BMI), WC, hip circumference (C) and neck C were calculated in MetS group compared to children with MO. No difference was noted for head C. All MetS components differed between the groups (SP, DP p < 0.001; WC, FBG, TRG p < 0.01; HDL-Chol p < 0.05). Significantly decreased vitamin B₉ and vitamin B₁₂ levels were detected (p < 0.05) in children with MetS. In both groups percentage of folate deficiency was 5.5%. No cases were below < 148 pmol/L. However, in MO group 14.3% and in MetS group 22.2% of the cases were of borderline status. In MO group B₁₂ levels were negatively correlated with BMI, WC, hip C and head C, but not with neck C. WC, hip C, head C and neck C were all negatively correlated with HDL-Chol. None of these correlations were observed in the group of children with MetS. Strong positive correlation between FBG and insulin as well as strong negative correlation between TRG and HDL-Chol detected in MO children were lost in MetS group. Deficiency state end-products of both B₉and B₁₂ may interfere with the expected profiles of MetS components. In this study, the alterations in MetS components affected vitamin B₁₂ metabolism and also its associations with anthropometric body measurements. Further increases in vitamin B₁₂and vitamin B₉deficiency in MetS associated with the increased vitamin B₁₂as well as vitamin B₉ deficiency metabolites may add to MetS parameters.

Keywords: Children, cobalamin, folate, metabolic syndrome, obesity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1131

7250 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.

Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3684

7249 Mobile Phone as a Tool for Data Collection in Field Research

Authors: Sandro Mourão, Karla Okada

Abstract:

The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.

Keywords: Data Gathering, Field Research, Mobile Phone, Survey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2049

7248 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737

7247 Measurement of Greenhouse Gas Emissions from Sugarcane Plantation Soil in Thailand

Authors: Wilaiwan Sornpoon, Sébastien Bonnet, Savitri Garivait

Abstract:

Continuous measurements of greenhouse gases (GHGs) emitted from soils are required to understand diurnal and seasonal variations in soil emissions and related mechanism. This understanding plays an important role in appropriate quantification and assessment of the overall change in soil carbon flow and budget. This study proposes to monitor GHGs emissions from soil under sugarcane cultivation in Thailand. The measurements were conducted over 379 days. The results showed that the total net amount of GHGs emitted from sugarcane plantation soil amounts to 36 Mg CO_2eqha^-1. Carbon dioxide (CO₂) and nitrous oxide (N₂O) were found to be the main contributors to the emissions. For methane (CH₄), the net emission was found to be almost zero. The measurement results also confirmed that soil moisture content and GHGs emissions are positively correlated.

Keywords: Soil, GHG emission, Sugarcane, Agriculture, Thailand.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2436

7246 CAGE Questionnaire as a Screening Tool for Hazardous Drinking in an Acute Admissions Ward: Frequency of Application and Comparison with AUDIT-C Questionnaire

Authors: Ammar Ayad Issa Al-Rifaie, Zuhreya Muazu, Maysam Ali Abdulwahid, Dermot Gleeson

Abstract:

The aim of this audit was to examine the efficiency of alcohol history documentation and screening for hazardous drinkers at the Medical Admission Unit (MAU) of Northern General Hospital (NGH), Sheffield, to identify any potential for enhancing clinical practice. Data were collected from medical clerking sheets, ICE system and directly from 82 patients by three junior medical doctors using both CAGE questionnaire and AUDIT-C tool for newly admitted patients to MAU in NGH, in the period between January and March 2015. Alcohol consumption was documented in around two-third of the patient sample and this was documented fairly accurately by health care professionals. Some used subjective words such as 'social drinking' in the alcohol units’ section of the history. CAGE questionnaire was applied to only four patients and none of the patients had documented advice, education or referral to an alcohol liaison team. AUDIT-C tool had identified 30.4%, while CAGE 10.9%, of patients admitted to the NGH MAU as hazardous drinkers. The amount of alcohol the patient consumes positively correlated with the score of AUDIT-C (Pearson correlation 0.83). Re-audit is planned to be carried out after integrating AUDIT-C tool as labels in the notes and presenting a brief teaching session to junior doctors. Alcohol misuse screening is not adequately undertaken and no appropriate action is being offered to hazardous drinkers. CAGE questionnaire is poorly applied to patients and when satisfactory and adequately used has low sensitivity to detect hazardous drinkers in comparison with AUDIT-C tool. Re-audit of alcohol screening practice after introducing AUDIT-C tool in clerking sheets (as labels) is required to compare the findings and conclude the audit cycle.

Keywords: Alcohol screening, AUDIT-C, CAGE, Hazardous drinking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1903

7245 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: Cluster analysis, education, mathematics, profiles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 887

7244 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data

Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan

Abstract:

The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.

Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2384

7243 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning

Authors: Chunming Xu

Abstract:

Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.

Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441

7242 Cost Effective Real-Time Image Processing Based Optical Mark Reader

Authors: Amit Kumar, Himanshu Singal, Arnav Bhavsar

Abstract:

In this modern era of automation, most of the academic exams and competitive exams are Multiple Choice Questions (MCQ). The responses of these MCQ based exams are recorded in the Optical Mark Reader (OMR) sheet. Evaluation of the OMR sheet requires separate specialized machines for scanning and marking. The sheets used by these machines are special and costs more than a normal sheet. Available process is non-economical and dependent on paper thickness, scanning quality, paper orientation, special hardware and customized software. This study tries to tackle the problem of evaluating the OMR sheet without any special hardware and making the whole process economical. We propose an image processing based algorithm which can be used to read and evaluate the scanned OMR sheets with no special hardware required. It will eliminate the use of special OMR sheet. Responses recorded in normal sheet is enough for evaluation. The proposed system takes care of color, brightness, rotation, little imperfections in the OMR sheet images.

Keywords: OMR, image processing, hough circle transform, interpolation, detection, Binary Thresholding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534

7241 On the Reduction of Side Effects in Tomography

Authors: V. Masilamani, C. Vanniarajan, Kamala Krithivasan

Abstract:

As the Computed Tomography(CT) requires normally hundreds of projections to reconstruct the image, patients are exposed to more X-ray energy, which may cause side effects such as cancer. Even when the variability of the particles in the object is very less, Computed Tomography requires many projections for good quality reconstruction. In this paper, less variability of the particles in an object has been exploited to obtain good quality reconstruction. Though the reconstructed image and the original image have same projections, in general, they need not be the same. In addition to projections, if a priori information about the image is known, it is possible to obtain good quality reconstructed image. In this paper, it has been shown by experimental results why conventional algorithms fail to reconstruct from a few projections, and an efficient polynomial time algorithm has been given to reconstruct a bi-level image from its projections along row and column, and a known sub image of unknown image with smoothness constraints by reducing the reconstruction problem to integral max flow problem. This paper also discusses the necessary and sufficient conditions for uniqueness and extension of 2D-bi-level image reconstruction to 3D-bi-level image reconstruction.

Keywords: Discrete Tomography, Image Reconstruction, Projection, Computed Tomography, Integral Max Flow Problem, Smooth Binary Image.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365

7240 Slope Effect in Emission Evaluation to Assess Real Pollutant Factors

Authors: G. Meccariello, L. Della Ragione

Abstract:

The exposure to outdoor air pollution causes lung cancer and increases the risk of bladder cancer. Because air pollution in urban areas is mainly caused by transportation, it is necessary to evaluate pollutant exhaust emissions from vehicles during their realworld use. Nevertheless their evaluation and reduction is a key problem, especially in the cities, that account for more than 50% of world population. A particular attention was given to the slope variability along the streets during each journey performed by the instrumented vehicle. In this paper we dealt with the problem of describing a quantitatively approach for the reconstruction of GPS coordinates and altitude, in the context of correlation study between driving cycles / emission / geographical location, during an experimental campaign realized with some instrumented cars. Finally the slope analysis can be correlated to the emission and consumption values in a specific road position, and it could be evaluated its influence on their behaviour.

Keywords: Air pollution, Driving cycles, GPS signal, Slope, Emission factor, fuel consumption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1895

7239 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1710

7238 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1067

7237 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain

Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami

Abstract:

To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.

Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 802

7236 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach

Authors: Sarisa Pinkham, Kanyarat Bussaban

Abstract:

The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.

Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752

7235 Automatic Generation of Ontology from Data Source Directed by Meta Models

Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas

Abstract:

Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.

Keywords: Meta model, model, ontology, data source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1993

7234 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia

Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray

Abstract:

The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.

Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1996

7233 A Similarity Function for Global Quality Assessment of Retinal Vessel Segmentations

Authors: Arturo Aquino, Manuel Emilio Gegundez, Jose Manuel Bravo, Diego Marin

Abstract:

Retinal vascularity assessment plays an important role in diagnosis of ophthalmic pathologies. The employment of digital images for this purpose makes possible a computerized approach and has motivated development of many methods for automated vascular tree segmentation. Metrics based on contingency tables for binary classification have been widely used for evaluating performance of these algorithms and, concretely, the accuracy has been mostly used as measure of global performance in this topic. However, this metric shows very poor matching with human perception as well as other notable deficiencies. Here, a new similarity function for measuring quality of retinal vessel segmentations is proposed. This similarity function is based on characterizing the vascular tree as a connected structure with a measurable area and length. Tests made indicate that this new approach shows better behaviour than the current one does. Generalizing, this concept of measuring descriptive properties may be used for designing functions for measuring more successfully segmentation quality of other complex structures.

Keywords: Retinal vessel segmentation, quality assessment, performanceevaluation, similarity function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1494

7232 Test Data Compression Using a Hybrid of Bitmask Dictionary and 2n Pattern Runlength Coding Methods

Authors: C. Kalamani, K. Paramasivam

Abstract:

In VLSI, testing plays an important role. Major problem in testing are test data volume and test power. The important solution to reduce test data volume and test time is test data compression. The Proposed technique combines the bit maskdictionary and 2n pattern run length-coding method and provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. This method has been implemented using Mat lab and HDL Language to reduce test data volume and memory requirements. This method is applied on various benchmark test sets and compared the results with other existing methods. The proposed technique can achieve a compression ratio up to 86%.

Keywords: Bit Mask dictionary, 2n pattern run length code, system-on-chip, SOC, test data compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1915

7231 A Hybrid Data Mining Method for the Medical Classification of Chest Pain

Authors: Sung Ho Ha, Seong Hyeon Joo

Abstract:

Data mining techniques have been used in medical research for many years and have been known to be effective. In order to solve such problems as long-waiting time, congestion, and delayed patient care, faced by emergency departments, this study concentrates on building a hybrid methodology, combining data mining techniques such as association rules and classification trees. The methodology is applied to real-world emergency data collected from a hospital and is evaluated by comparing with other techniques. The methodology is expected to help physicians to make a faster and more accurate classification of chest pain diseases.

Keywords: Data mining, medical decisions, medical domainknowledge, chest pain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215

7230 Factors Affecting the Work Efficiency of Employees of Suan Sunandha Rajabhat University

Authors: Unnop Panpuang

Abstract:

The objectives of this project are to study on the work efficiency of the employees, sorted by their profiles, and to study on the relation between job attributes and work efficiency of employees of Suan Sunandha Rajabhat University. The samples used for this study are 292 employees. The statistics used in this study are frequencies, standard deviations, One-way ANOVA and Pearson’s correlation coefficient. Majority of respondent were male with an undergraduate degree, married and lives together. The average age of respondents was between 31-41 years old, married and the educational background are higher than bachelor’s degree. The job attribute is correlated to the work efficiency with the statistical significance level of.o1. This concurs with the predetermined hypothesis. The correlation between the two main factors is in the moderate level. All the categories of job attributes such as the variety of skills, job clarity, job importance, freedom to do work are considered separately.

Keywords: Employees, Job Attributes, Work Efficiency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3644

7229 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: Data mining, textile production, decision trees, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532

7228 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

Authors: Mahdi Esmaeili, Mansour Tarafdar

Abstract:

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1470

7227 Generator of Hypotheses an Approach of Data Mining Based on Monotone Systems Theory

Authors: Rein Kuusik, Grete Lind

Abstract:

Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).

Keywords: data mining, monotone systems, pattern, rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1253

7226 The Evaluation of Subclinical Hypothyroidism in Children with Morbid Obesity

Authors: Mustafa M. Donma, Orkide Donma

Abstract:

Cardiovascular (CV) pathology is one of the expected consequences of excessive fat gain. The role of zinc (Zn) in thyroid hormone metabolism (THM) is a matter of debate. Both thyroid stimulating hormone (TSH) and Zn levels are subject to variation in obese individuals. Zn participates in THM. It is closely related to TSH. Since thyroid hormones are required for Zn absorption, hypothyroidism can lead to Zn deficiency and vice versa. Zn exhibits protective effects on CV health and it is inversely correlated with CV markers in childhood obesity. The association between subclinical hypothyroidism (SCHT) and metabolic disorders is under investigation due to its clinical importance. SCHT is defined as the elevated serum TSH levels in the presence of normal free thyroxin (T4) concentrations. The aim of this study is to evaluate the associations between TSH levels and Zn concentrations in SCHT cases detected in morbid obese (MO) children with and without metabolic syndrome (MetS) [(MOMetS+ and MOMetS-)], respectively. 42 children were present in each study group. Informed consent forms were obtained. Tekrdag Namik Kemal University Faculty of Medicine Non-Interventional Clinical Investigations Ethical Committee approved the study protocol. World Health Organization criteria were used for obesity classification. Children with age and sex-dependent body mass index percentile values above 99 were defined as MO. Children exhibiting at least two of MetS criteria were included in MOMetS+ group. Elevated fasting blood glucose, elevated triglycerides (TRG)/decreased high density lipoprotein-cholesterol (HDL-C) concentrations, elevated blood pressure values in addition to central obesity were listed as MetS criteria. Anthropometric measures were recorded. Routine biochemical analyses were performed. In MOMetS- group 13, in MOMetS+ group 15 children were with SCHT. Statistical analyses were performed. p < 0.05 was accepted as statistically significant. In MOMetS- and MOMetS+ groups, TSH levels were 4.1 ± 2.9 mU/L and 4.6 ± 3.1 mU/L, respectively. Corresponding values for SCHT cases were 7.3 ± 3.1 mU/L and 8.0 ± 2.7 mU/L. Free T4 levels were within normal limits. Zn concentrations were negatively correlated with TSH levels in both groups. Significant negative correlation calculated in MOMetS+ group (r = -0.909; p < 0.001) was much stronger than that found in MOMetS- group (r = -0.706; p < 0.05). This strong correlation (r = -0.909; p < 0.001) calculated for cases with SCHT in MOMetS+ group was much lower in the same group (r = -0.793; p < 0.001) when all cases were considered. In conclusion, the presence of strong correlations between TSH and Zn in SCHT in both MOMetS- and MOMetS+ groups have pointed out that MO children were under the threat of CV pathologies. The detection of the much stronger correlation in MOMetS+ group in comparison with the correlation found in MOMetS- group was the indicator of greater CV risk due to the presence of MetS. In MOMetS+ group, correlation in SCHT cases found higher than correlation calculated for all cases confirmed much higher CV risk due to the contribution of SCHT.

Keywords: Cardiovascular risk, child morbid obesity, subclinical hypothyroidism, zinc.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 228

7225 The Relation between Social Capital and Trust with Social Network Analysis

Authors: Safak Baykal

Abstract:

The purpose of this study is analyzing the relationship between trust and social capital of people with using Social Network Analysis. In this study, two aspects of social capital will be focused: Bonding, homophilous social capital (BoSC), and Bridging, heterophilous social capital (BrSC). These two aspects diverge each other regarding to the social theories. The other concept of the study is Trust (Tr), namely interpersonal trust, willing to ascribe good intentions to and have confidence in the words and actions of other people. In this study, the sample group, 61 people, was selected from a private firm from the defense industry. The relation between BoSC/BrSC and Tr is shown by using Social Network Analysis (SNA) and statistical analysis with Likert type-questionnaire. The results of the analysis show the Cronbach’s alpha value is 0.756 and social capital values (BoSC/BrSC) is not correlated with Tr values of the people.

Keywords: Social capital, interpersonal trust, social network analysis (SNA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2582

7224 Categorical Data Modeling: Logistic Regression Software

Authors: Abdellatif Tchantchane

Abstract:

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

Keywords: Logistic regression, Matlab, Categorical data, Influential observation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1877