Search results for: Data security
7380 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-
Authors: Nieto Bernal Wilson, Carmona Suarez Edgar
Abstract:
The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.
Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14787379 Distributed Data-Mining by Probability-Based Patterns
Authors: M. Kargar, F. Gharbalchi
Abstract:
In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15577378 K-Means for Spherical Clusters with Large Variance in Sizes
Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan
Abstract:
Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.Keywords: K-Means, Data Clustering, Cluster Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32817377 Representing Data without Lost Compression Properties in Time Series: A Review
Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan
Abstract:
Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.
Keywords: Compression properties, uncertainty, uncertain time series, mining technique, weather prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16207376 Are XBRL-based Financial Reports Better than Non-XBRL Reports? A Quality Assessment
Authors: Zhenkun Wang, Simon S. Gao
Abstract:
Using a scoring system, this paper provides a comparative assessment of the quality of data between XBRL formatted financial reports and non-XBRL financial reports. It shows a major improvement in the quality of data of XBRL formatted financial reports. Although XBRL formatted financial reports do not show much advantage in the quality at the beginning, XBRL financial reports lately display a large improvement in the quality of data in almost all aspects. With the improved XBRL web data managing, presentation and analysis applications, XBRL formatted financial reports have a much better accessibility, are more accurate and better in timeliness.Keywords: Data Quality; Financial Report; Information; XBRL
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25667375 Modeling of Random Variable with Digital Probability Hyper Digraph: Data-Oriented Approach
Authors: A. Habibizad Navin, M. Naghian Fesharaki, M. Mirnia, M. Kargar
Abstract:
In this paper we introduce Digital Probability Hyper Digraph for modeling random variable as the hierarchical data-oriented model.Keywords: Data-Oriented Models, Data Structure, DigitalProbability Hyper Digraph, Random Variable, Statistic andProbability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12737374 Study of Efficiency and Capability LZW++ Technique in Data Compression
Authors: Yusof. Mohd Kamir, Mat Deris. Mohd Sufian, Abidin. Ahmad Faisal Amri
Abstract:
The purpose of this paper is to show efficiency and capability LZWµ in data compression. The LZWµ technique is enhancement from existing LZW technique. The modification the existing LZW is needed to produce LZWµ technique. LZW read one by one character at one time. Differ with LZWµ technique, where the LZWµ read three characters at one time. This paper focuses on data compression and tested efficiency and capability LZWµ by different data format such as doc type, pdf type and text type. Several experiments have been done by different types of data format. The results shows LZWµ technique is better compared to existing LZW technique in term of file size.
Keywords: Data Compression, Huffman Encoding, LZW, LZWµ, RLL, Size.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20897373 Quality Evaluation of Ready to Eat Potatoes’ Produce in Flexible Packaging
Authors: Sandra Muizniece-Brasava, Aija Ruzaike, Lija Dukalska, Ilze Stokmane, Liene Strauta
Abstract:
Experiments have been carried out at the Latvia University of Agriculture Department of Food Technology. The aim of this work was to assess the effect of thermal treatment in flexible retort pouch packaging on the quality of potatoes’ produce during the storage time. Samples were evaluated immediately after retort thermal treatment; and following 1; 2; 3 and 4 storage months at the ambient temperature of +18±2ºC in vacuum packaging from polyamide/polyethylene (PA/PE) and aluminum/polyethylene (Al/PE) film pouches with barrier properties. Experimentally the quality of the potatoes’ produce in dry butter and mushroom dressings was characterized by measuring pH, hardness, color, microbiological properties and sensory evaluation. The sterilization was effective in protecting the produce from physical, chemical, and microbial quality degradation. According to the study of obtained data, it can be argued that the selected product processing technology and packaging materials could be applied to provide the safety and security during four-month storage period.
Keywords: Potatoes’ produce, shelf life, retort thermal treatment and packaging.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31117372 Impact of Stack Caches: Locality Awareness and Cost Effectiveness
Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang
Abstract:
Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.
Keywords: Hit rate, Locality of program, Stack cache, and Stack data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15097371 A Second Look at Gesture-Based Passwords: Usability and Vulnerability to Shoulder-Surfing Attacks
Authors: Lakshmidevi Sreeramareddy, Komalpreet Kaur, Nane Pothier
Abstract:
For security purposes, it is important to detect passwords entered by unauthorized users. With traditional alphanumeric passwords, if the content of a password is acquired and correctly entered by an intruder, it is impossible to differentiate the password entered by the intruder from those entered by the authorized user because the password entries contain precisely the same character set. However, no two entries for the gesture-based passwords, even those entered by the person who created the password, will be identical. There are always variations between entries, such as the shape and length of each stroke, the location of each stroke, and the speed of drawing. It is possible that passwords entered by the unauthorized user contain higher levels of variations when compared with those entered by the authorized user (the creator). The difference in the levels of variations may provide cues to detect unauthorized entries. To test this hypothesis, we designed an empirical study, collected and analyzed the data with the help of machine-learning algorithms. The results of the study are significant.
Keywords: Authentication, gesture-based passwords, machine learning algorithms, shoulder-surfing attacks, usability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6167370 Cross Project Software Fault Prediction at Design Phase
Authors: Pradeep Singh, Shrish Verma
Abstract:
Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. Earlier we predicted the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven datasets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.Keywords: Software Metrics, Fault prediction, Cross project, Within project.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25467369 Extreme Temperature Forecast in Mbonge, Cameroon through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution
Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph
Abstract:
In this paper, temperature extremes are forecast by employing the block maxima method of the Generalized extreme value(GEV) distribution to analyse temperature data from the Cameroon Development Corporation (C.D.C). By considering two sets of data (Raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data while in the simulated data, the return values show an increasing trend but with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend but with an upper bound. This clearly shows that temperatures in the tropics even-though show a sign of increasing in the future, there is a maximum temperature at which there is no exceedence. The results of this paper are very vital in Agricultural and Environmental research.Keywords: Return level, Generalized extreme value (GEV), Meteorology, Forecasting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21067368 Mining Multicity Urban Data for Sustainable Population Relocation
Authors: Xu Du, Aparna S. Varde
Abstract:
In this research, we propose to conduct diagnostic and predictive analysis about the key factors and consequences of urban population relocation. To achieve this goal, urban simulation models extract the urban development trends as land use change patterns from a variety of data sources. The results are treated as part of urban big data with other information such as population change and economic conditions. Multiple data mining methods are deployed on this data to analyze nonlinear relationships between parameters. The result determines the driving force of population relocation with respect to urban sprawl and urban sustainability and their related parameters. This work sets the stage for developing a comprehensive urban simulation model for catering to specific questions by targeted users. It contributes towards achieving sustainability as a whole.Keywords: Data Mining, Environmental Modeling, Sustainability, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17837367 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data
Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon
Abstract:
Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Keywords: Ant colony system, biological data, clustering, DNA chip.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19747366 The Resource Description Framework (RDF) as a Modern Structure for Medical Data
Authors: Gabriela Lindemann, Danilo Schmidt, Thomas Schrader, Dietmar Keune
Abstract:
The amount and heterogeneity of data in biomedical research, notably in interdisciplinary fields, requires new methods for the collection, presentation and analysis of information. Important data from laboratory experiments as well as patient trials are available but come out of distributed resources. The Charité - University Hospital Berlin has established together with the German Research Foundation (DFG) a new information service centre for kidney diseases and transplantation (Open European Nephrology Science Centre - OpEN.SC). Beside a collaborative aspect to create new research groups every single partner or institution of this science information centre making his own data available is allowed to search the whole data pool of the various involved centres. A core task is the implementation of a non-restricting open data structure for the various different data sources. We decided to use a modern RDF model and in a first phase transformed original data coming from the web-based Electronic Patient Record database TBase©.
Keywords: Medical databases, Resource Description Framework (RDF), metadata repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20327365 XML Data Management in Compressed Relational Database
Authors: Hongzhi Wang, Jianzhong Li, Hong Gao
Abstract:
XML is an important standard of data exchange and representation. As a mature database system, using relational database to support XML data may bring some advantages. But storing XML in relational database has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to use compressed XML data in relational database. In this paper, a compressed relational database technology supporting XML data is presented. Original relational storage structure is adaptive to XPath query process. The compression method keeps this feature. Besides traditional relational database techniques, additional query process technologies on compressed relations and for special structure for XML are presented. In this paper, technologies for XQuery process in compressed relational database are presented..Keywords: XML, compression, query processing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18067364 A System for Analyzing and Eliciting Public Grievances Using Cache Enabled Big Data
Authors: P. Kaladevi, N. Giridharan
Abstract:
The system for analyzing and eliciting public grievances serves its main purpose to receive and process all sorts of complaints from the public and respond to users. Due to the more number of complaint data becomes big data which is difficult to store and process. The proposed system uses HDFS to store the big data and uses MapReduce to process the big data. The concept of cache was applied in the system to provide immediate response and timely action using big data analytics. Cache enabled big data increases the response time of the system. The unstructured data provided by the users are efficiently handled through map reduce algorithm. The processing of complaints takes place in the order of the hierarchy of the authority. The drawbacks of the traditional database system used in the existing system are set forth by our system by using Cache enabled Hadoop Distributed File System. MapReduce framework codes have the possible to leak the sensitive data through computation process. We propose a system that add noise to the output of the reduce phase to avoid signaling the presence of sensitive data. If the complaints are not processed in the ample time, then automatically it is forwarded to the higher authority. Hence it ensures assurance in processing. A copy of the filed complaint is sent as a digitally signed PDF document to the user mail id which serves as a proof. The system report serves to be an essential data while making important decisions based on legislation.Keywords: Big Data, Hadoop, HDFS, Caching, MapReduce, web personalization, e-governance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15927363 Employee Motivation Factors That Affect Job Performance of Suan Sunandha Rajabhat University Employee
Authors: Orawan Boriban, Phatthanan Chaiyabut
Abstract:
The purpose of this research is to study motivation factors and also to study factors relation to job performance to compare motivation factors under the personal factor classification such as gender, age, income, educational level, marital status, and working duration; and to study the relationship between Motivation Factors and Job Performance with job satisfactions. The sample groups utilized in this research were 400 Suan Sunandha Rajabhat University employees. This research is a quantitative research using questionnaires as research instrument. The statistics applied for data analysis including percentage, mean, and standard deviation. In addition, the difference analysis was conducted by t value computing, one-way analysis of variance and Pearson’s correlation coefficient computing. The findings of the study results were as follows the findings showed that the aspects of job promotion and salary were at the moderate levels. Additionally, the findings also showed that the motivations that affected the revenue branch chiefs’ job performance were job security, job accomplishment, policy and management, job promotion, and interpersonal relation.
Keywords: Motivation Factors, Job Performance, Suan Sunandha Rajabhat University Employee.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28507362 Association of Smoking with Chest Radiographic and Lung Function Findings in Retired Bauxite Mining Workers
Authors: L. R. Ferreira, R. C. G. Bianchi, L. C.R. Ferreira, C. M. Galhardi, E. P. Baciuk, L. H. Oliveira
Abstract:
Inhalation hazards are associated with potentially injurious exposure and increased risk for lung diseases, within the bauxite mining industry, especially for the smelter workers. Smoking is related to decreased lung function and leads to chronic lung diseases. This study had the objective to evaluate whether smoking is related to functional and radiographic respiratory changes in retired bauxite mining workers. Methods: This was a retrospective and cross-sectional study involving the analysis of database information of 140 retired bauxite mining workers from Poços de Caldas-MG evaluated at Worker’s Health Reference Center and at the Social Security Brazilian National Institute, from July 1st, 2015 until June 30th, 2016. The workers were divided into three groups: non-smokers (n = 47), ex-smokers (n = 46), and smokers (n = 47). The data included: age, gender, spirometry results, and the presence or not of pulmonary pleural and/or parenchymal changes in chest radiographs. Chi-Squared test was used (p < 0,05). Results: In the smokers’ group, 83% of spirometry tests and 64% of chest x-rays were altered. In the non-smokers’ group, 19% of spirometry tests and 13% of chest x-rays were altered. In the ex-smokers’ group, 35% of spirometry tests and 30% of chest x-rays were altered. Most of the results were statistically significant. Results demonstrated a significant difference between smokers’ and non-smokers’ groups in regard to spirometric and radiographic pulmonary alterations. Ex-smokers’ and non-smokers’ group demonstrated better results when compared to the smokers’ group in relation to altered spirometry and radiograph findings. These data may contribute to planning strategies to enhance smoking cessation programs within the bauxite mining industry.
Keywords: Bauxite mining, spirometry, chest radiography, smoking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7017361 Reduction of Energy Consumption Using Smart Home Techniques in the Household Sector
Authors: Ahmed Al-Adaileh, Souheil Khaddaj
Abstract:
Outcomes of exhaustion of natural resources started influencing each spirit on this planet. Energy is an essential factor in this aspect. To restore the circumstance to the appropriate track, all attempts must focus on two fundamental branches: producing electricity from clean and renewable reserves and decreasing the overall unnecessary consumption of energy. The focal point of this paper will be on lessening the power consumption in the household's segment. This paper is an attempt to give a clear understanding of a framework called Reduction of Energy Consumption in Household Sector (RECHS) and how it should help householders to reduce their power consumption by substituting their household appliances, turning-off the appliances when stand-by modus is detected, and scheduling their appliances operation periods. Technically, the framework depends on utilizing Z-Wave compatible plug-ins which will be connected to the usual house devices to gauge and control them remotely and semi-automatically. The suggested framework underpins numerous quality characteristics, for example, integrability, scalability, security and adaptability.
Keywords: Smart energy management systems, internet of things, wireless mesh networks, microservices, cloud computing, big data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7897360 Trusting Smart Speakers: Analysing the Different Levels of Trust between Technologies
Authors: Alec Wells, Aminu Bello Usman, Justin McKeown
Abstract:
The growing usage of smart speakers raises many privacy and trust concerns compared to other technologies such as smart phones and computers. In this study, a proxy measure of trust is used to gauge users’ opinions on three different technologies based on an empirical study, and to understand which technology most people are most likely to trust. The collected data were analysed using the Kruskal-Wallis H test to determine the statistical differences between the users’ trust level of the three technologies: smart speaker, computer and smart phone. The findings of the study revealed that despite the wide acceptance, ease of use and reputation of smart speakers, people find it difficult to trust smart speakers with their sensitive information via the Direct Voice Input (DVI) and would prefer to use a keyboard or touchscreen offered by computers and smart phones. Findings from this study can inform future work on users’ trust in technology based on perceived ease of use, reputation, perceived credibility and risk of using technologies via DVI.
Keywords: Direct voice input, risk, security, technology and trust.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5957359 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
Authors: S.Aranganayagi, K.Thangavel
Abstract:
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36907358 Mobile Phone as a Tool for Data Collection in Field Research
Authors: Sandro Mourão, Karla Okada
Abstract:
The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.Keywords: Data Gathering, Field Research, Mobile Phone, Survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20597357 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis
Authors: N. R. N. Idris, S. Baharom
Abstract:
A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.
Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17417356 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.
Keywords: Cluster analysis, education, mathematics, profiles.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8947355 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data
Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan
Abstract:
The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23897354 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning
Authors: Chunming Xu
Abstract:
Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14477353 Determining Cluster Boundaries Using Particle Swarm Optimization
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17207352 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm
Authors: Ameur Abdelkader, Abed Bouarfa Hafida
Abstract:
Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.
Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10767351 The U.S. Missile Defense Shield and Global Security Destabilization: An Inconclusive Link
Authors: Michael A. Unbehauen, Gregory D. Sloan, Alberto J. Squatrito
Abstract:
Missile proliferation and global stability are intrinsically linked. Missile threats continually appear at the forefront of global security issues. North Korea’s recently demonstrated nuclear and intercontinental ballistic missile (ICBM) capabilities, for the first time since the Cold War, renewed public interest in strategic missile defense capabilities. To protect from limited ICBM attacks from so-called rogue actors, the United States developed the Ground-based Midcourse Defense (GMD) system. This study examines if the GMD missile defense shield has contributed to a safer world or triggered a new arms race. Based upon increased missile-related developments and the lack of adherence to international missile treaties, it is generally perceived that the GMD system is a destabilizing factor for global security. By examining the current state of arms control treaties as well as existing missile arsenals and ongoing efforts in technologies to overcome U.S. missile defenses, this study seeks to analyze the contribution of GMD to global stability. A thorough investigation cannot ignore that, through the establishment of this limited capability, the U.S. violated longstanding, successful weapons treaties and caused concern among states that possess ICBMs. GMD capability contributes to the perception that ICBM arsenals could become ineffective, creating an imbalance in favor of the United States, leading to increased global instability and tension. While blame for the deterioration of global stability and non-adherence to arms control treaties is often placed on U.S. missile defense, the facts do not necessarily support this view. The notion of a renewed arms race due to GMD is supported neither by current missile arsenals nor by the inevitable development of new and enhanced missile technology, to include multiple independently targeted reentry vehicles (MIRVs), maneuverable reentry vehicles (MaRVs), and hypersonic glide vehicles (HGVs). The methodology in this study encapsulates a period of time, pre- and post-GMD introduction, while analyzing international treaty adherence, missile counts and types, and research in new missile technologies. The decline in international treaty adherence, coupled with a measurable increase in the number and types of missiles or research in new missile technologies during the period after the introduction of GMD, could be perceived as a clear indicator of GMD contributing to global instability. However, research into improved technology (MIRV, MaRV and HGV) prior to GMD, as well as a decline of various global missile inventories and testing of systems during this same period, would seem to invalidate this theory. U.S. adversaries have exploited the perception of the U.S. missile defense shield as a destabilizing factor as a pretext to strengthen and modernize their militaries and justify their policies. As a result, it can be concluded that global stability has not significantly decreased due to GMD; but rather, the natural progression of technological and missile development would inherently include innovative and dynamic approaches to target engagement, deterrence, and national defense.
Keywords: Arms control, arms race, global security, GMD, ICBM, missile defense, proliferation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1155