Search results for: data validation
7269 The Resource Description Framework (RDF) as a Modern Structure for Medical Data
Authors: Gabriela Lindemann, Danilo Schmidt, Thomas Schrader, Dietmar Keune
Abstract:
The amount and heterogeneity of data in biomedical research, notably in interdisciplinary fields, requires new methods for the collection, presentation and analysis of information. Important data from laboratory experiments as well as patient trials are available but come out of distributed resources. The Charité - University Hospital Berlin has established together with the German Research Foundation (DFG) a new information service centre for kidney diseases and transplantation (Open European Nephrology Science Centre - OpEN.SC). Beside a collaborative aspect to create new research groups every single partner or institution of this science information centre making his own data available is allowed to search the whole data pool of the various involved centres. A core task is the implementation of a non-restricting open data structure for the various different data sources. We decided to use a modern RDF model and in a first phase transformed original data coming from the web-based Electronic Patient Record database TBase©.
Keywords: Medical databases, Resource Description Framework (RDF), metadata repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20327268 An Empirical Validation of the Linear- Hyperbolic Approximation of the I-V Characteristic of a Solar Cell Generator
Authors: A. A. Penin
Abstract:
An empirical linearly-hyperbolic approximation of the I - V characteristic of a solar cell is presented. This approximation is based on hyperbolic dependence of a current of p-n junctions on voltage for large currents. Such empirical approximation is compared with the early proposed formal linearly-hyperbolic approximation of a solar cell. The expressions defining laws of change of parameters of formal approximation at change of a photo current of family of characteristics are received. It allows simplifying a finding of parameters of approximation on actual curves, to specify their values. Analytical calculation of load regime for linearly - hyperbolic model leads to quadratic equation. Also, this model allows to define soundly a deviation from the maximum power regime and to compare efficiency of regimes of solar cells with different parameters.
Keywords: a solar cell generator, I − V characteristic, p − n junction, approximation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14207267 XML Data Management in Compressed Relational Database
Authors: Hongzhi Wang, Jianzhong Li, Hong Gao
Abstract:
XML is an important standard of data exchange and representation. As a mature database system, using relational database to support XML data may bring some advantages. But storing XML in relational database has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to use compressed XML data in relational database. In this paper, a compressed relational database technology supporting XML data is presented. Original relational storage structure is adaptive to XPath query process. The compression method keeps this feature. Besides traditional relational database techniques, additional query process technologies on compressed relations and for special structure for XML are presented. In this paper, technologies for XQuery process in compressed relational database are presented..Keywords: XML, compression, query processing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18067266 A System for Analyzing and Eliciting Public Grievances Using Cache Enabled Big Data
Authors: P. Kaladevi, N. Giridharan
Abstract:
The system for analyzing and eliciting public grievances serves its main purpose to receive and process all sorts of complaints from the public and respond to users. Due to the more number of complaint data becomes big data which is difficult to store and process. The proposed system uses HDFS to store the big data and uses MapReduce to process the big data. The concept of cache was applied in the system to provide immediate response and timely action using big data analytics. Cache enabled big data increases the response time of the system. The unstructured data provided by the users are efficiently handled through map reduce algorithm. The processing of complaints takes place in the order of the hierarchy of the authority. The drawbacks of the traditional database system used in the existing system are set forth by our system by using Cache enabled Hadoop Distributed File System. MapReduce framework codes have the possible to leak the sensitive data through computation process. We propose a system that add noise to the output of the reduce phase to avoid signaling the presence of sensitive data. If the complaints are not processed in the ample time, then automatically it is forwarded to the higher authority. Hence it ensures assurance in processing. A copy of the filed complaint is sent as a digitally signed PDF document to the user mail id which serves as a proof. The system report serves to be an essential data while making important decisions based on legislation.Keywords: Big Data, Hadoop, HDFS, Caching, MapReduce, web personalization, e-governance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15927265 Design and Validation of an Aerodynamic Model of the Cessna Citation X Horizontal Stabilizer Using both OpenVSP and Digital Datcom
Authors: Marine Segui, Matthieu Mantilla, Ruxandra Mihaela Botez
Abstract:
This research is the part of a major project at the Research Laboratory in Active Controls, Avionics and Aeroservoelasticity (LARCASE) aiming to improve a Cessna Citation X aircraft cruise performance with an application of the morphing wing technology on its horizontal tail. However, the horizontal stabilizer of the Cessna Citation X turns around its span axis with an angle between -8 and 2 degrees. Within this range, the horizontal stabilizer generates certainly some unwanted drag. To cancel this drag, the LARCASE proposes to trim the aircraft with a horizontal stabilizer equipped by a morphing wing technology. This technology aims to optimize aerodynamic performances by changing the conventional horizontal tail shape during the flight. As a consequence, this technology will be able to generate enough lift on the horizontal tail to balance the aircraft without an unwanted drag generation. To conduct this project, an accurate aerodynamic model of the horizontal tail is firstly required. This aerodynamic model will finally allow precise comparison between a conventional horizontal tail and a morphed horizontal tail results. This paper presents how this aerodynamic model was designed. In this way, it shows how the 2D geometry of the horizontal tail was collected and how the unknown airfoil’s shape of the horizontal tail has been recovered. Finally, the complete horizontal tail airfoil shape was found and a comparison between aerodynamic polar of the real horizontal tail and the horizontal tail found in this paper shows a maximum difference of 0.04 on the lift or the drag coefficient which is very good. Aerodynamic polar data of the aircraft horizontal tail are obtained from the CAE Inc. level D research aircraft flight simulator of the Cessna Citation X.
Keywords: Aerodynamic, Cessna, Citation X, coefficient, Datcom, drag, lift, longitudinal, model, OpenVSP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15127264 Application of “Multiple Risk Communicator“ to the Personal Information Leakage Problem
Authors: Mitsuhiro Taniyama, Yuu Hidaka, Masato Arai, Satoshi Kai, Hiromi Igawa, Hiroshi Yajima, Ryoichi Sasaki
Abstract:
Along with the progress of our information society, various risks are becoming increasingly common, causing multiple social problems. For this reason, risk communications for establishing consensus among stakeholders who have different priorities have become important. However, it is not always easy for the decision makers to agree on measures to reduce risks based on opposing concepts, such as security, privacy and cost. Therefore, we previously developed and proposed the “Multiple Risk Communicator" (MRC) with the following functions: (1) modeling the support role of the risk specialist, (2) an optimization engine, and (3) displaying the computed results. In this paper, MRC program version 1.0 is applied to the personal information leakage problem. The application process and validation of the results are discussed.Keywords: Decision Making, Personal Information Leakage Problem, Risk Communication, Risk Management
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16087263 Learning Theories within Coaching Process
Authors: P. Fazel
Abstract:
These days we face with so many advertisements in magazines, those mentioned coaching is pragmatic specialties which help people make change in their lives. Up to know Specialty coaches are not necessarily therapists, consultants or psychologist, thus they may not know psychological theories. The International Coach Federation identifies "facilitating learning and results" as one of its four core coach competencies, without understanding learning theories coaching practice hangs in theoretical abyss. Thus the aim of this article is investigating learning theories within coaching process. Therefore, I reviewed some cognitive and behavioral learning theories and analyzed their contribution with coaching process which has been introduced in mentor coaches and ICF certified coaches' papers and books. The result demonstrated that coaching profession is strongly grounded in learning theories, and it will be strengthened by the validation of theories and evidence-based research as we move forward. Thus, it needs more research in order to applying effective theoretical frameworks.
Keywords: Coaching, Learning theories. Cognitive learning theories, behavioral learning theories.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 164277262 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
Authors: S.Aranganayagi, K.Thangavel
Abstract:
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36907261 Mobile Phone as a Tool for Data Collection in Field Research
Authors: Sandro Mourão, Karla Okada
Abstract:
The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.Keywords: Data Gathering, Field Research, Mobile Phone, Survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20597260 FEA Modeling of Material Removal Rate in Electrical Discharge Machining of Al6063/SiC Composites
Authors: U. K. Vishwakarma , A. Dvivedi, P. Kumar
Abstract:
Metal matrix composites (MMC) are generating extensive interest in diverse fields like defense, aerospace, electronics and automotive industries. In this present investigation, material removal rate (MRR) modeling has been carried out using an axisymmetric model of Al-SiC composite during electrical discharge machining (EDM). A FEA model of single spark EDM was developed to calculate the temperature distribution.Further, single spark model was extended to simulate the second discharge. For multi-discharge machining material removal was calculated by calculating the number of pulses. Validation of model has been done by comparing the experimental results obtained under the same process parameters with the analytical results. A good agreement was found between the experimental results and the theoretical value.Keywords: Electrical Discharge Machining, FEA, Metal matrix composites, Multi-discharge
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 37337259 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis
Authors: N. R. N. Idris, S. Baharom
Abstract:
A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.
Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17417258 Intragenic MicroRNAs Binding Sites in MRNAs of Genes Involved in Carcinogenesis
Authors: Olga A. Berillo, Assel S. Issabekova, Anatoly T. Ivashchenko
Abstract:
MiRNAs participate in gene regulation of translation. Some studies have investigated the interactions between genes and intragenic miRNAs. It is important to study the miRNA binding sites of genes involved in carcinogenesis. RNAHybrid 2.1 and ERNAhybrid programmes were used to compute the hybridization free energy of miRNA binding sites. Of these 54 mRNAs, 22.6%, 37.7%, and 39.7% of miRNA binding sites were present in the 5'UTRs, CDSs, and 3'UTRs, respectively. The density of the binding sites for miRNAs in the 5'UTR ranged from 1.6 to 43.2 times and from 1.8 to 8.0 times greater than in the CDS and 3'UTR, respectively. Three types of miRNA interactions with mRNAs have been revealed: 5'- dominant canonical, 3'-compensatory, and complementary binding sites. MiRNAs regulate gene expression, and information on the interactions between miRNAs and mRNAs could be useful in molecular medicine. We recommend that newly described sites undergo validation by experimental investigation.Keywords: Exon, intron, miRNA, oncogene.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20067257 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.
Keywords: Cluster analysis, education, mathematics, profiles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8947256 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data
Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan
Abstract:
The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23897255 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning
Authors: Chunming Xu
Abstract:
Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14477254 Numerical Analysis and Experimental Validation of Detector Pressure Housing Subject to HPHT
Authors: Hafeez Syed, Harit Naik
Abstract:
Reservoirs with high pressures and temperatures (HPHT) that were considered to be atypical in the past are now frequent targets for exploration. For downhole oilfield drilling tools and components, the temperature and pressure affect the mechanical strength. To address this issue, a finite element analysis (FEA) for 206.84 MPa (30 ksi) pressure and 165°C has been performed on the pressure housing of the measurement-while-drilling/logging-whiledrilling (MWD/LWD) density tool. The density tool is a MWD/LWD sensor that measures the density of the formation. One of the components of the density tool is the pressure housing that is positioned in the tool. The FEA results are compared with the experimental test performed on the pressure housing of the density tool. Past results show a close match between the numerical results and the experimental test. This FEA model can be used for extreme HPHT and ultra HPHT analyses, and/or optimal design changes.Keywords: FEA, HPHT, M/LWD, Oil & Gas
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16427253 Mechanical Qualification Test Campaign on the Demise Observation Capsule
Authors: B. Tiseo, V. Quaranta, G. Bruno, R. Gardi, T. Watts, S. Dussy
Abstract:
This paper describes the qualification test campaign performed on the Demise Observation Capsule DOC-EQM as part of the Future Launch Preparatory Program FLPP3. The mechanical environment experienced during launch ascent and separation phase was first identified and then replicated in terms of sine, random and shock vibration. The loads identification is derived by selecting the worst possible case. Vibration and shock qualification test performed at CIRA Space Qualification laboratory is herein described. Mechanical fixtures’ design and validation, carried out by means of FEM, is also addressed due to its fundamental role in the vibrational test campaign. The Demise Observation Capsule (DOC) successfully passed the qualification test campaign. Functional test and resonance search have not been point any fault and damages of the capsule.
Keywords: Capsule, demise, DOC, launch environment, Re-Entry, qualification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5797252 IntelligentLogger: A Heavy-Duty Vehicles Fleet Management System Based on IoT and Smart Prediction Techniques
Authors: D. Goustouridis, A. Sideris, I. Sdrolias, G. Loizos, N.-Alexander Tatlas, S. M. Potirakis
Abstract:
Both daily and long-term management of a heavy-duty vehicles and construction machinery fleet is an extremely complicated and hard to solve issue. This is mainly due to the diversity of the fleet vehicles – machinery, which concerns not only the vehicle types, but also their age/efficiency, as well as the fleet volume, which is often of the order of hundreds or even thousands of vehicles/machineries. In the present paper we present “InteligentLogger”, a holistic heavy-duty fleet management system covering a wide range of diverse fleet vehicles. This is based on specifically designed hardware and software for the automated vehicle health status and operational cost monitoring, for smart maintenance. InteligentLogger is characterized by high adaptability that permits to be tailored to practically any heavy-duty vehicle/machinery (of different technologies -modern or legacy- and of dissimilar uses). Contrary to conventional logistic systems, which are characterized by raised operational costs and often errors, InteligentLogger provides a cost-effective and reliable integrated solution for the e-management and e-maintenance of the fleet members. The InteligentLogger system offers the following unique features that guarantee successful heavy-duty vehicles/machineries fleet management: (a) Recording and storage of operating data of motorized construction machinery, in a reliable way and in real time, using specifically designed Internet of Things (IoT) sensor nodes that communicate through the available network infrastructures, e.g., 3G/LTE; (b) Use on any machine, regardless of its age, in a universal way; (c) Flexibility and complete customization both in terms of data collection, integration with 3rd party systems, as well as in terms of processing and drawing conclusions; (d) Validation, error reporting & correction, as well as update of the system’s database; (e) Artificial intelligence (AI) software, for processing information in real time, identifying out-of-normal behavior and generating alerts; (f) A MicroStrategy based enterprise BI, for modeling information and producing reports, dashboards, and alerts focusing on vehicles– machinery optimal usage, as well as maintenance and scraping policies; (g) Modular structure that allows low implementation costs in the basic fully functional version, but offers scalability without requiring a complete system upgrade.
Keywords: E-maintenance, predictive maintenance, IoT sensor nodes, cost optimization, artificial intelligence, heavy-duty vehicles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7717251 Assessment of Aminopolyether on 18F-FDG Samples
Authors: Renata L. C. Leão, João E. Nascimento, Natalia C. E. S. Nascimento, Elaine S. Vasconcelos, Mércia L. Oliveira
Abstract:
The quality control procedures of a radiopharmaceutical include the assessment of its chemical purity. The method suggested by international pharmacopeias consists of a thin layer chromatographic run. In this paper, the method proposed by the United States Pharmacopeia (USP) is compared to a direct method to determine the final concentration of aminopolyether in Fludeoxyglucose (18F-FDG) preparations. The approach (no chromatographic run) was achieved by placing the thin-layer chromatography (TLC) plate directly on an iodine vapor chamber. Both methods were validated and they showed adequate results to determine the concentration of aminopolyether in 18F-FDG preparations. However, the direct method is more sensitive, faster and simpler when compared to the reference method (with chromatographic run), and it may be chosen for use in routine quality control of 18F-FDG.
Keywords: Chemical purity, Kryptofix 222, thin layer chromatography, validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8397250 Determining Cluster Boundaries Using Particle Swarm Optimization
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17207249 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm
Authors: Ameur Abdelkader, Abed Bouarfa Hafida
Abstract:
Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.
Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10767248 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain
Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami
Abstract:
To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.
Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8207247 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach
Authors: Sarisa Pinkham, Kanyarat Bussaban
Abstract:
The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.
Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17587246 Automatic Generation of Ontology from Data Source Directed by Meta Models
Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas
Abstract:
Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.
Keywords: Meta model, model, ontology, data source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19987245 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia
Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray
Abstract:
The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.
Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20027244 Test Data Compression Using a Hybrid of Bitmask Dictionary and 2n Pattern Runlength Coding Methods
Authors: C. Kalamani, K. Paramasivam
Abstract:
In VLSI, testing plays an important role. Major problem in testing are test data volume and test power. The important solution to reduce test data volume and test time is test data compression. The Proposed technique combines the bit maskdictionary and 2n pattern run length-coding method and provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. This method has been implemented using Mat lab and HDL Language to reduce test data volume and memory requirements. This method is applied on various benchmark test sets and compared the results with other existing methods. The proposed technique can achieve a compression ratio up to 86%.Keywords: Bit Mask dictionary, 2n pattern run length code, system-on-chip, SOC, test data compression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19217243 Parameter Sensitivity Analysis of Artificial Neural Network for Predicting Water Turbidity
Authors: Chia-Ling Chang, Chung-Sheng Liao
Abstract:
The present study focuses on the discussion over the parameter of Artificial Neural Network (ANN). Sensitivity analysis is applied to assess the effect of the parameters of ANN on the prediction of turbidity of raw water in the water treatment plant. The result shows that transfer function of hidden layer is a critical parameter of ANN. When the transfer function changes, the reliability of prediction of water turbidity is greatly different. Moreover, the estimated water turbidity is less sensitive to training times and learning velocity than the number of neurons in the hidden layer. Therefore, it is important to select an appropriate transfer function and suitable number of neurons in the hidden layer in the process of parameter training and validation.Keywords: Artificial Neural Network (ANN), sensitivity analysis, turbidity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28147242 A Hybrid Data Mining Method for the Medical Classification of Chest Pain
Authors: Sung Ho Ha, Seong Hyeon Joo
Abstract:
Data mining techniques have been used in medical research for many years and have been known to be effective. In order to solve such problems as long-waiting time, congestion, and delayed patient care, faced by emergency departments, this study concentrates on building a hybrid methodology, combining data mining techniques such as association rules and classification trees. The methodology is applied to real-world emergency data collected from a hospital and is evaluated by comparing with other techniques. The methodology is expected to help physicians to make a faster and more accurate classification of chest pain diseases.Keywords: Data mining, medical decisions, medical domainknowledge, chest pain.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22227241 Knowledge Discovery and Data Mining Techniques in Textile Industry
Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler
Abstract:
This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.Keywords: Data mining, textile production, decision trees, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15397240 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern
Authors: Mahdi Esmaeili, Mansour Tarafdar
Abstract:
The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1477