Search results for: Predictive Data Mining
6900 Sensitivity Analysis for Determining Priority of Factors Controlling SOC Content in Semiarid Condition of West of Iran
Authors: Y. Parvizi, M. Gorji, M.H. Mahdian, M. Omid
Abstract:
Soil organic carbon (SOC) plays a key role in soil fertility, hydrology, contaminants control and acts as a sink or source of terrestrial carbon content that can affect the concentration of atmospheric CO2. SOC supports the sustainability and quality of ecosystems, especially in semi-arid region. This study was conducted to determine relative importance of 13 different exploratory climatic, soil and geometric factors on the SOC contents in one of the semiarid watershed zones in Iran. Two methods canonical discriminate analysis (CDA) and feed-forward back propagation neural networks were used to predict SOC. Stepwise regression and sensitivity analysis were performed to identify relative importance of exploratory variables. Results from sensitivity analysis showed that 7-2-1 neural networks and 5 inputs in CDA models output have highest predictive ability that explains %70 and %65 of SOC variability. Since neural network models outperformed CDA model, it should be preferred for estimating SOC.Keywords: Soil organic carbon, modeling, neural networks, CDA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14356899 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
Authors: S.Aranganayagi, K.Thangavel
Abstract:
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36906898 WebGD: A CORBA-based Document Classification and Retrieval System on the Web
Authors: Fuyang Peng, Bo Deng, Chao Qi, Mou Zhan
Abstract:
This paper presents the design and implementation of the WebGD, a CORBA-based document classification and retrieval system on Internet. The WebGD makes use of such techniques as Web, CORBA, Java, NLP, fuzzy technique, knowledge-based processing and database technology. Unified classification and retrieval model, classifying and retrieving with one reasoning engine and flexible working mode configuration are some of its main features. The architecture of WebGD, the unified classification and retrieval model, the components of the WebGD server and the fuzzy inference engine are discussed in this paper in detail.Keywords: Text Mining, document classification, knowledgeprocessing, fuzzy logic, Web, CORBA
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18486897 Mobile Phone as a Tool for Data Collection in Field Research
Authors: Sandro Mourão, Karla Okada
Abstract:
The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.Keywords: Data Gathering, Field Research, Mobile Phone, Survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20596896 An Overview of Construction and Demolition Waste as Coarse Aggregate in Concrete
Authors: S. R. Shamili, J. Karthikeyan
Abstract:
Fast development of the total populace and far and wide urbanization has surprisingly expanded the advancement of the construction industry. As a result of these activities, old structures are being demolished to make new buildings. Due to these large-scale demolitions, a huge amount of debris is generated all over the world, which results in a landfill. The use of construction and demolition waste as landfill causes groundwater contamination, which is hazardous. Using construction and demolition waste as aggregate can reduce the use of natural aggregates and the problem of mining. The objective of this study is to provide a detailed overview on how the construction and demolition waste material has been used as aggregate in structural concrete. In this study, the preparation, classification, and composition of construction and demolition wastes are also discussed.
Keywords: Aggregate, construction and demolition waste, landfill, large scale demolition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6436895 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis
Authors: N. R. N. Idris, S. Baharom
Abstract:
A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.
Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17406894 Physiological Effects on Scientist Astronaut Candidates: Hypobaric Training Assessment
Authors: Pedro Llanos, Diego García
Abstract:
This paper is addressed to expanding our understanding of the effects of hypoxia training on our bodies to better model its dynamics and leverage some of its implications and effects on human health. Hypoxia training is a recommended practice for military and civilian pilots that allow them to recognize their early hypoxia signs and symptoms, and Scientist Astronaut Candidates (SACs) who underwent hypobaric hypoxia (HH) exposure as part of a training activity for prospective suborbital flight applications. This observational-analytical study describes physiologic responses and symptoms experienced by a SAC group before, during and after HH exposure and proposes a model for assessing predicted versus observed physiological responses. A group of individuals with diverse Science Technology Engineering Mathematics (STEM) backgrounds conducted a hypobaric training session to an altitude up to 22,000 ft (FL220) or 6,705 meters, where heart rate (HR), breathing rate (BR) and core temperature (Tc) were monitored with the use of a chest strap sensor pre and post HH exposure. A pulse oximeter registered levels of saturation of oxygen (SpO2), number and duration of desaturations during the HH chamber flight. Hypoxia symptoms as described by the SACs during the HH training session were also registered. This data allowed to generate a preliminary predictive model of the oxygen desaturation and O2 pressure curve for each subject, which consists of a sixth-order polynomial fit during exposure, and a fifth or fourth-order polynomial fit during recovery. Data analysis showed that HR and BR showed no significant differences between pre and post HH exposure in most of the SACs, while Tc measures showed slight but consistent decrement changes. All subjects registered SpO2 greater than 94% for the majority of their individual HH exposures, but all of them presented at least one clinically significant desaturation (SpO2 < 85% for more than 5 seconds) and half of the individuals showed SpO2 below 87% for at least 30% of their HH exposure time. Finally, real time collection of HH symptoms presented temperature somatosensory perceptions (SP) for 65% of individuals, and task-focus issues for 52.5% of individuals as the most common HH indications. 95% of the subjects experienced HH onset symptoms below FL180; all participants achieved full recovery of HH symptoms within 1 minute of donning their O2 mask. The current HH study performed on this group of individuals suggests a rapid and fully reversible physiologic response after HH exposure as expected and obtained in previous studies. Our data showed consistent results between predicted versus observed SpO2 curves during HH suggesting a mathematical function that may be used to model HH performance deficiencies. During the HH study, real-time HH symptoms were registered providing evidenced SP and task focusing as the earliest and most common indicators. Finally, an assessment of HH signs of symptoms in a group of heterogeneous, non-pilot individuals showed similar results to previous studies in homogeneous populations of pilots.
Keywords: Altitude sickness, cabin pressure, hypobaric chamber training, symptoms and altitude, slow onset hypoxia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4166893 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.
Keywords: Cluster analysis, education, mathematics, profiles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8926892 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data
Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan
Abstract:
The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23896891 Predictive Modelling Techniques in Sediment Yield and Hydrological Modelling
Authors: Adesoji T. Jaiyeola, Josiah Adeyemo
Abstract:
This paper presents an extensive review of literature relevant to the modelling techniques adopted in sediment yield and hydrological modelling. Several studies relating to sediment yield are discussed. Many research areas of sedimentation in rivers, runoff and reservoirs are presented. Different types of hydrological models, different methods employed in selecting appropriate models for different case studies are analysed. Applications of evolutionary algorithms and artificial intelligence techniques are discussed and compared especially in water resources management and modelling. This review concentrates on Genetic Programming (GP) and fully discusses its theories and applications. The successful applications of GP as a soft computing technique were reviewed in sediment modelling. Some fundamental issues such as benchmark, generalization ability, bloat, over-fitting and other open issues relating to the working principles of GP are highlighted. This paper concludes with the identification of some research gaps in hydrological modelling and sediment yield.Keywords: Artificial intelligence, evolutionary algorithm, genetic programming, sediment yield.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18616890 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning
Authors: Chunming Xu
Abstract:
Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14476889 Predicting Mortality among Acute Burn Patients Using BOBI Score vs. FLAMES Score
Authors: S. Moustafa El Shanawany, I. Labib Salem, F. Mohamed Magdy Badr El Dine, H. Tag El Deen Abd Allah
Abstract:
Thermal injuries remain a global health problem and a common issue encountered in forensic pathology. They are a devastating cause of morbidity and mortality in children and adults especially in developing countries, causing permanent disfigurement, scarring and grievous hurt. Burns have always been a matter of legal concern in cases of suicidal burns, self-inflicted burns for false accusation and homicidal attempts. Assessment of burn injuries as well as rating permanent disabilities and disfigurement following thermal injuries for the benefit of compensation claims represents a challenging problem. This necessitates the development of reliable scoring systems to yield an expected likelihood of permanent disability or fatal outcome following burn injuries. The study was designed to identify the risk factors of mortality in acute burn patients and to evaluate the applicability of FLAMES (Fatality by Longevity, APACHE II score, Measured Extent of burn, and Sex) and BOBI (Belgian Outcome in Burn Injury) model scores in predicting the outcome. The study was conducted on 100 adult patients with acute burn injuries admitted to the Burn Unit of Alexandria Main University Hospital, Egypt from October 2014 to October 2015. Victims were examined after obtaining informed consent and the data were collected in specially designed sheets including demographic data, burn details and any associated inhalation injury. Each burn patient was assessed using both BOBI and FLAMES scoring systems. The results of the study show the mean age of patients was 35.54±12.32 years. Males outnumbered females (55% and 45%, respectively). Most patients were accidently burnt (95%), whereas suicidal burns accounted for the remaining 5%. Flame burn was recorded in 82% of cases. As well, 8% of patients sustained more than 60% of total burn surface area (TBSA) burns, 19% of patients needed mechanical ventilation, and 19% of burnt patients died either from wound sepsis, multi-organ failure or pulmonary embolism. The mean length of hospital stay was 24.91±25.08 days. The mean BOBI score was 1.07±1.27 and that of the FLAMES score was -4.76±2.92. The FLAMES score demonstrated an area under the receiver operating characteristic (ROC) curve of 0.95 which was significantly higher than that of the BOBI score (0.883). A statistically significant association was revealed between both predictive models and the outcome. The study concluded that both scoring systems were beneficial in predicting mortality in acutely burnt patients. However, the FLAMES score could be applied with a higher level of accuracy.Keywords: BOBI, Burns, FLAMES, scoring systems, outcome.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11736888 Comparative Study Using Weka for Red Blood Cells Classification
Authors: Jameela Ali Alkrimi, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy
Abstract:
Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithms tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital - Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.
Keywords: K-Nearest Neighbors, Neural Network, Radial Basis Function, Red blood cells, Support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29966887 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain
Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami
Abstract:
To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.
Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8196886 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach
Authors: Sarisa Pinkham, Kanyarat Bussaban
Abstract:
The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.
Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17586885 Automatic Generation of Ontology from Data Source Directed by Meta Models
Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas
Abstract:
Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.
Keywords: Meta model, model, ontology, data source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19986884 Development of Predictive Model for Surface Roughness in End Milling of Al-SiCp Metal Matrix Composites using Fuzzy Logic
Authors: M. Chandrasekaran, D. Devarasiddappa
Abstract:
Metal matrix composites have been increasingly used as materials for components in automotive and aerospace industries because of their improved properties compared with non-reinforced alloys. During machining the selection of appropriate machining parameters to produce job for desired surface roughness is of great concern considering the economy of manufacturing process. In this study, a surface roughness prediction model using fuzzy logic is developed for end milling of Al-SiCp metal matrix composite component using carbide end mill cutter. The surface roughness is modeled as a function of spindle speed (N), feed rate (f), depth of cut (d) and the SiCp percentage (S). The predicted values surface roughness is compared with experimental result. The model predicts average percentage error as 4.56% and mean square error as 0.0729. It is observed that surface roughness is most influenced by feed rate, spindle speed and SiC percentage. Depth of cut has least influence.Keywords: End milling, fuzzy logic, metal matrix composites, surface roughness
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21716883 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia
Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray
Abstract:
The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.
Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20026882 Test Data Compression Using a Hybrid of Bitmask Dictionary and 2n Pattern Runlength Coding Methods
Authors: C. Kalamani, K. Paramasivam
Abstract:
In VLSI, testing plays an important role. Major problem in testing are test data volume and test power. The important solution to reduce test data volume and test time is test data compression. The Proposed technique combines the bit maskdictionary and 2n pattern run length-coding method and provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. This method has been implemented using Mat lab and HDL Language to reduce test data volume and memory requirements. This method is applied on various benchmark test sets and compared the results with other existing methods. The proposed technique can achieve a compression ratio up to 86%.Keywords: Bit Mask dictionary, 2n pattern run length code, system-on-chip, SOC, test data compression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19216881 Issue Reorganization Using the Measure of Relevance
Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim
Abstract:
The need to extract R&D keywords from issues and use them to retrieve R&D information is increasing rapidly. However, it is difficult to identify related issues or distinguish them. Although the similarity between issues cannot be identified, with an R&D lexicon, issues that always share the same R&D keywords can be determined. In detail, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Furthermore, the relationship among issues that share the same R&D keywords can be shown in a more systematic way by clustering them according to keywords. Thus, sharing R&D results and reusing R&D technology can be facilitated. Indirectly, redundant investment in R&D can be reduced as the relevant R&D information can be shared among corresponding issues and the reusability of related R&D can be improved. Therefore, a methodology to cluster issues from the perspective of common R&D keywords is proposed to satisfy these demands.
Keywords: Clustering, Social Network Analysis, Text Mining, Topic Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20386880 Categorical Data Modeling: Logistic Regression Software
Authors: Abdellatif Tchantchane
Abstract:
A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.
Keywords: Logistic regression, Matlab, Categorical data, Influential observation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18826879 A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures
Authors: Silvina Caíno-Lores, Jesús Carretero
Abstract:
Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that improving data locality is key to move towards exascale infrastructures efficiently, as solutions to this problem aim to reduce the bandwidth consumed in data transfers, and the overheads that arise from them. There are several techniques that attempt to move computations closer to the data. In this survey we analyse the different mechanisms that have been proposed to provide data locality for large scale high-performance and high-throughput systems. This survey intends to assist scientific computing community in understanding the various technical aspects and strategies that have been reported in recent literature regarding data locality. As a result, we present an overview of locality-oriented techniques, which are grouped in four main categories: application development, task scheduling, in-memory computing and storage platforms. Finally, the authors include a discussion on future research lines and synergies among the former techniques.Keywords: Co-scheduling, data-centric, data-intensive, data locality, in-memory storage, large scale.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14926878 Correction of Infrared Data for Electrical Components on a Board
Authors: Seong-Ho Song, Ki-Seob Kim, Seop-Hyeong Park, Seon-Woo Lee
Abstract:
In this paper, the data correction algorithm is suggested when the environmental air temperature varies. To correct the infrared data in this paper, the initial temperature or the initial infrared image data is used so that a target source system may not be necessary. The temperature data obtained from infrared detector show nonlinear property depending on the surface temperature. In order to handle this nonlinear property, Taylor series approach is adopted. It is shown that the proposed algorithm can reduce the influence of environmental temperature on the components in the board. The main advantage of this algorithm is to use only the initial temperature of the components on the board rather than using other reference device such as black body sources in order to get reference temperatures.Keywords: Infrared camera, Temperature Data compensation, Environmental Ambient Temperature, Electric Component
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15276877 Predictive Models for Compressive Strength of High Performance Fly Ash Cement Concrete for Pavements
Authors: S. M. Gupta, Vanita Aggarwal, Som Nath Sachdeva
Abstract:
The work reported through this paper is an experimental work conducted on High Performance Concrete (HPC) with super plasticizer with the aim to develop some models suitable for prediction of compressive strength of HPC mixes. In this study, the effect of varying proportions of fly ash (0% to 50% @ 10% increment) on compressive strength of high performance concrete has been evaluated. The mix designs studied were M30, M40 and M50 to compare the effect of fly ash addition on the properties of these concrete mixes. In all eighteen concrete mixes that have been designed, three were conventional concretes for three grades under discussion and fifteen were HPC with fly ash with varying percentages of fly ash. The concrete mix designing has been done in accordance with Indian standard recommended guidelines. All the concrete mixes have been studied in terms of compressive strength at 7 days, 28 days, 90 days, and 365 days. All the materials used have been kept same throughout the study to get a perfect comparison of values of results. The models for compressive strength prediction have been developed using Linear Regression method (LR), Artificial Neural Network (ANN) and Leave-One-Out Validation (LOOV) methods.
Keywords: ANN, concrete mixes, compressive strength, fly ash, high performance concrete, linear regression, strength prediction models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20786876 A Generalised Relational Data Model
Authors: Georgia Garani
Abstract:
A generalised relational data model is formalised for the representation of data with nested structure of arbitrary depth. A recursive algebra for the proposed model is presented. All the operations are formally defined. The proposed model is proved to be a superset of the conventional relational model (CRM). The functionality and validity of the model is shown by a prototype implementation that has been undertaken in the functional programming language Miranda.Keywords: nested relations, recursive algebra, recursive nested operations, relational data model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15596875 WiFi Data Offloading: Bundling Method in a Canvas Business Model
Authors: Majid Mokhtarnia, Alireza Amini
Abstract:
Mobile operators deal with increasing in the data traffic as a critical issue. As a result, a vital responsibility of the operators is to deal with such a trend in order to create added values. This paper addresses a bundling method in a Canvas business model in a WiFi Data Offloading (WDO) strategy by which some elements of the model may be affected. In the proposed method, it is supposed to sell a number of data packages for subscribers in which there are some packages with a free given volume of data-offloaded WiFi complimentary. The paper on hands analyses this method in the views of attractiveness and profitability. The results demonstrate that the quality of implementation of the WDO strongly affects the final result and helps the decision maker to make the best one.
Keywords: Bundling, canvas business model, telecommunication, WiFi Data Offloading.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8906874 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption
Authors: Victor Onomza Waziri, John K. Alhassan, Idris Ismaila, Moses Noel Dogonyaro
Abstract:
This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute a theoretical presentations in a high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.
Keywords: Data Analytics, Security, Privacy, Bootstrapping, and Fully Homomorphic Encryption Scheme.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34586873 Ozone Decomposition over Silver-Loaded Perlite
Authors: Krassimir Genov, Vladimir Georgiev, Todor Batakliev, Dipak K. Sarker
Abstract:
The Bulgarian natural expanded mineral obtained from Bentonite AD perlite (A deposit of "The Broken Mountain" for perlite mining, near by the village of Vodenicharsko, in the municipality of Djebel), was loaded with silver (as ion form - Ag+ 2 and 5 wt% by the incipient wetness impregnation method), and as atomic silver - Ag0 using Tollen-s reagent (silver mirror reaction). Some physicochemical characterization of the samples are provided via: DC arc-AES, XRD, DR-IR and UV-VIS. The aim of this work was to obtain and test the silver-loaded catalyst for ozone decomposition. So the samples loaded with atomic silver show ca. 80% conversion of ozone 20 minutes after the reaction start. Then conversion decreases to ca. 20 % but stay stable during the prolongation of time.
Keywords: aluminum-silicates, Ag/perlite expanded glass, ozone decomposition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22686872 Effective Software-Based Solution for Processing Mass Downstream Data in Interactive Push VOD System
Authors: Ni Hong, Wu Guobin, Wu Gang, Pan Liang
Abstract:
Interactive push VOD system is a new kind of system that incorporates push technology and interactive technique. It can push movies to users at high speeds at off-peak hours for optimal network usage so as to save bandwidth. This paper presents effective software-based solution for processing mass downstream data at terminals of interactive push VOD system, where the service can download movie according to a viewer-s selection. The downstream data is divided into two catalogs: (1) the carousel data delivered according to DSM-CC protocol; (2) IP data delivered according to Euro-DOCSIS protocol. In order to accelerate download speed and reduce data loss rate at terminals, this software strategy introduces caching, multi-thread and resuming mechanisms. The experiments demonstrate advantages of the software-based solution.Keywords: DSM-CC, data carousel, Euro-DOCSIS, push VOD.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14896871 Approaches and Schemes for Storing DTD-Independent XML Data in Relational Databases
Authors: Mehdi Emadi, Masoud Rahgozar, Adel Ardalan, Alireza Kazerani, Mohammad Mahdi Ariyan
Abstract:
The volume of XML data exchange is explosively increasing, and the need for efficient mechanisms of XML data management is vital. Many XML storage models have been proposed for storing XML DTD-independent documents in relational database systems. Benchmarking is the best way to highlight pros and cons of different approaches. In this study, we use a common benchmarking scheme, known as XMark to compare the most cited and newly proposed DTD-independent methods in terms of logical reads, physical I/O, CPU time and duration. We show the effect of Label Path, extracting values and storing in another table and type of join needed for each method's query answering.Keywords: XML Data Management, XPath, DTD-IndependentXML Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881