Search results for: data stream mining
7123 Big Brain: A Single Database System for a Federated Data Warehouse Architecture
Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf
Abstract:
Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.Keywords: Data integration, data warehousing, federated architecture, online analytical processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7107122 An In-Depth Analysis of Open Data Portals as an Emerging Public E-Service
Authors: Martin Lnenicka
Abstract:
Governments collect and produce large amounts of data. Increasingly, governments worldwide have started to implement open data initiatives and also launch open data portals to enable the release of these data in open and reusable formats. Therefore, a large number of open data repositories, catalogues and portals have been emerging in the world. The greater availability of interoperable and linkable open government data catalyzes secondary use of such data, so they can be used for building useful applications which leverage their value, allow insight, provide access to government services, and support transparency. The efficient development of successful open data portals makes it necessary to evaluate them systematic, in order to understand them better and assess the various types of value they generate, and identify the required improvements for increasing this value. Thus, the attention of this paper is directed particularly to the field of open data portals. The main aim of this paper is to compare the selected open data portals on the national level using content analysis and propose a new evaluation framework, which further improves the quality of these portals. It also establishes a set of considerations for involving businesses and citizens to create eservices and applications that leverage on the datasets available from these portals.
Keywords: Big data, content analysis, criteria comparison, data quality, open data, open data portals, public sector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 30827121 Secure peerTalk Using PEERT System
Authors: Nebu Tom John, N. Dhinakaran
Abstract:
Multiparty voice over IP (MVoIP) systems allows a group of people to freely communicate each other via the internet, which have many applications such as online gaming, teleconferencing, online stock trading etc. Peertalk is a peer to peer multiparty voice over IP system (MVoIP) which is more feasible than existing approaches such as p2p overlay multicast and coupled distributed processing. Since the stream mixing and distribution are done by the peers, it is vulnerable to major security threats like nodes misbehavior, eavesdropping, Sybil attacks, Denial of Service (DoS), call tampering, Man in the Middle attacks etc. To thwart the security threats, a security framework called PEERTS (PEEred Reputed Trustworthy System for peertalk) is implemented so that efficient and secure communication can be carried out between peers.
Keywords: Key management system, peer-to-peer voice streaming, reputed trust management system, voice-over-IP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18827120 Unsteady Water Boundary Layer Flow with Non-Uniform Mass Transfer
Authors: G. Revathi, P. Saikrishnan
Abstract:
In the present analysis an unsteady laminar forced convection water boundary layer flow is considered. The fluid properties such as viscosity and Prandtl number are taken as variables such that those are inversely proportional to temperature. By using quasi-linearization technique the nonlinear coupled partial differential equations are linearized and the numerical solutions are obtained by using implicit finite difference scheme with the appropriate selection of step sizes. Non-similar solutions have been obtained from the starting point of the stream-wise coordinate to the point where skin friction value vanishes. The effect non-uniform mass transfer along the surface of the cylinder through slot is studied on the skin friction and heat transfer coefficients.Keywords: Boundary layer, heat transfer, non-similar solution, non-uniform mass, unsteady flow.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19687119 File System-Based Data Protection Approach
Authors: Jaechun No
Abstract:
As data to be stored in storage subsystems tremendously increases, data protection techniques have become more important than ever, to provide data availability and reliability. In this paper, we present the file system-based data protection (WOWSnap) that has been implemented using WORM (Write-Once-Read-Many) scheme. In the WOWSnap, once WORM files have been created, only the privileged read requests to them are allowed to protect data against any intentional/accidental intrusions. Furthermore, all WORM files are related to their protection cycle that is a time period during which WORM files should securely be protected. Once their protection cycle is expired, the WORM files are automatically moved to the general-purpose data section without any user interference. This prevents the WORM data section from being consumed by unnecessary files. We evaluated the performance of WOWSnap on Linux cluster.Keywords: Data protection, Protection cycle, WORM
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16797118 Numerical Evaluation of the Aerodynamic Efficiency of the Stevens and Jolly Vertical- Axis Windmill (1895)
Authors: M. Raciti Castelli, E. Benini
Abstract:
This paper presents a numerical investigation of the unsteady flow around an American 19th century vertical-axis windmill: the Stevens & Jolly rotor, patented on April 16, 1895. The computational approach used is based on solving the complete transient Reynolds-Averaged Navier-Stokes (t-RANS) equations: a full campaign of numerical simulation has been performed using the k-ω SST turbulence model. Flow field characteristics have been investigated for several values of tip speed ratio and for a constant unperturbed free-stream wind velocity of 6 m/s, enabling the study of some unsteady flow phenomena in the rotor wake. Finally, the global power generated from the windmill has been determined for each simulated angular velocity, allowing the calculation of the rotor power-curve.Keywords: CFD, vertical-axis rotor, windmill.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14457117 A Text Clustering System based on k-means Type Subspace Clustering and Ontology
Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang
Abstract:
This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24627116 Current Status of Nitrogen Saturation in the Upper Reaches of the Kanna River, Japan
Authors: Sakura Yoshii, Masakazu Abe, Akihiro Iijima
Abstract:
Nitrogen saturation has become one of the serious issues in the field of forest environment. The watershed protection forests located in the downwind hinterland of Tokyo Metropolitan Area are believed to be facing nitrogen saturation. In this study, we carefully focus on the balance of nitrogen between load and runoff. Annual nitrogen load via atmospheric deposition was estimated to 461.1 t-N/year in the upper reaches of the Kanna River. Annual nitrogen runoff to the forested headwater stream of the Kanna River was determined to 184.9 t-N/year, corresponding to 40.1% of the total nitrogen load. Clear seasonal change in NO3-N concentration was still observed. Therefore, watershed protection forest of the Kanna River is most likely to be in Stage-1 on the status of nitrogen saturation.
Keywords: Atmospheric deposition, Nitrogen accumulation, Denitrification, Forest ecosystems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17147115 Landscape Data Transformation: Categorical Descriptions to Numerical Descriptors
Authors: Dennis A. Apuan
Abstract:
Categorical data based on description of the agricultural landscape imposed some mathematical and analytical limitations. This problem however can be overcome by data transformation through coding scheme and the use of non-parametric multivariate approach. The present study describes data transformation from qualitative to numerical descriptors. In a collection of 103 random soil samples over a 60 hectare field, categorical data were obtained from the following variables: levels of nitrogen, phosphorus, potassium, pH, hue, chroma, value and data on topography, vegetation type, and the presence of rocks. Categorical data were coded, and Spearman-s rho correlation was then calculated using PAST software ver. 1.78 in which Principal Component Analysis was based. Results revealed successful data transformation, generating 1030 quantitative descriptors. Visualization based on the new set of descriptors showed clear differences among sites, and amount of variation was successfully measured. Possible applications of data transformation are discussed.Keywords: data transformation, numerical descriptors, principalcomponent analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15057114 The Development of Decision Support System for Waste Management; a Review
Authors: M. S. Bani, Z. A. Rashid, K. H. K. Hamid, M. E. Harbawi, A.B.Alias, M. J. Aris
Abstract:
Most Decision Support Systems (DSS) for waste management (WM) constructed are not widely marketed and lack practical applications. This is due to the number of variables and complexity of the mathematical models which include the assumptions and constraints required in decision making. The approach made by many researchers in DSS modelling is to isolate a few key factors that have a significant influence to the DSS. This segmented approach does not provide a thorough understanding of the complex relationships of the many elements involved. The various elements in constructing the DSS must be integrated and optimized in order to produce a viable model that is marketable and has practical application. The DSS model used in assisting decision makers should be integrated with GIS, able to give robust prediction despite the inherent uncertainties of waste generation and the plethora of waste characteristics, and gives optimal allocation of waste stream for recycling, incineration, landfill and composting.Keywords: Review, decision support system, GIS and waste management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 37477113 Performance Analysis of Search Medical Imaging Service on Cloud Storage Using Decision Trees
Authors: González A. Julio, Ramírez L. Leonardo, Puerta A. Gabriel
Abstract:
Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.
Keywords: Cloud storage, decision trees, diagnostic image, search, telemedicine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9487112 A Survey of Semantic Integration Approaches in Bioinformatics
Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir
Abstract:
Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18197111 The Reliability of the Improved e-N Method for Transition Prediction as Checked by PSE Method
Authors: Caihong Su
Abstract:
Transition prediction of boundary layers has always been an important problem in fluid mechanics both theoretically and practically, yet notwithstanding the great effort made by many investigators, there is no satisfactory answer to this problem. The most popular method available is so-called e-N method which is heavily dependent on experiments and experience. The author has proposed improvements to the e-N method, so to reduce its dependence on experiments and experience to a certain extent. One of the key assumptions is that transition would occur whenever the velocity amplitude of disturbance reaches 1-2% of the free stream velocity. However, the reliability of this assumption needs to be verified. In this paper, transition prediction on a flat plate is investigated by using both the improved e-N method and the parabolized stability equations (PSE) methods. The results show that the transition locations predicted by both methods agree reasonably well with each other, under the above assumption. For the supersonic case, the critical velocity amplitude in the improved e-N method should be taken as 0.013, whereas in the subsonic case, it should be 0.018, both are within the range 1-2%.Keywords: Boundary layer, e-N method, PSE, Transition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15087110 Effects of Oilfield Water Treated by Electroflocculation and Reverse Osmosis in a Typical Brazilian Semiarid Soil
Authors: P. S. A. Souza, M. R. C. Marques, M. M. Rigo, A. A. Cerqueira, J. L. Paiva, F. Merçon, D. V. Perez
Abstract:
Produced water (PW), which is water extracted along with oil, is the largest waste stream in the oil and gas industry. With the proper treatment, this wastewater can be used in agricultural irrigation. This study evaluated the effects the application of PW treated by electroflocculation (EF) and combined electroflocculation-reverse osmosis (EF-RO) on soil salinity and sodification parameters. Excessive sodium levels in PW treated by EF may affect soil structural stability and plant growth, and tends to accumulate in upper layers, displacing the nutrient K to deeper layers of the soil profile. PW treated by EF-RO did not promote salinization and soil sodification, indicating that this combined technique may be a viable alternative for oily water treatment aiming at irrigation use in semiarid regions.
Keywords: Electroflocculation, irrigation, produced water, reverse osmosis, soil.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5857109 A Normalization-based Robust Watermarking Scheme Using Zernike Moments
Authors: Say Wei Foo, Qi Dong
Abstract:
Digital watermarking has become an important technique for copyright protection but its robustness against attacks remains a major problem. In this paper, we propose a normalizationbased robust image watermarking scheme. In the proposed scheme, original host image is first normalized to a standard form. Zernike transform is then applied to the normalized image to calculate Zernike moments. Dither modulation is adopted to quantize the magnitudes of Zernike moments according to the watermark bit stream. The watermark extracting method is a blind method. Security analysis and false alarm analysis are then performed. The quality degradation of watermarked image caused by the embedded watermark is visually transparent. Experimental results show that the proposed scheme has very high robustness against various image processing operations and geometric attacks.
Keywords: Image watermarking, Image normalization, Zernike moments, Robustness.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17557108 Edible Oil Industry Wastewater Treatment by Microfiltration with Ceramic Membrane
Authors: Zita Šereš, Dragana Šoronja Simović, Ljubica Dokić, Lidietta Giorno, Biljana Pajin, Cecilia Hodur, Nikola Maravić
Abstract:
Membrane technology is convenient for separation of suspended solids, colloids and high molecular weight materials that are present. The idea is that the waste stream from edible oil industry, after the separation of oil by using skimmers is subjected to microfiltration and the obtained permeate can be used again in the production process. The wastewater from edible oil industry was used for the microfiltration. For the microfiltration of this effluent a tubular membrane was used with a pore size of 200 nm at transmembrane pressure in range up to 3 bar and in range of flow rate up to 300 L/h. Box–Behnken design was selected for the experimental work and the responses considered were permeate flux and chemical oxygen demand (COD) reduction. The reduction of the permeate COD was in the range 40-60% according to the feed. The highest permeate flux achieved during the process of microfiltration was 160 L/m2h.
Keywords: Ceramic membrane, edible oil, microfiltration, wastewater.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16287107 Efficient Lossless Compression of Weather Radar Data
Authors: Wei-hua Ai, Wei Yan, Xiang Li
Abstract:
Data compression is used operationally to reduce bandwidth and storage requirements. An efficient method for achieving lossless weather radar data compression is presented. The characteristics of the data are taken into account and the optical linear prediction is used for the PPI images in the weather radar data in the proposed method. The next PPI image is identical to the current one and a dramatic reduction in source entropy is achieved by using the prediction algorithm. Some lossless compression methods are used to compress the predicted data. Experimental results show that for the weather radar data, the method proposed in this paper outperforms the other methods.
Keywords: Lossless compression, weather radar data, optical linear prediction, PPI image
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22577106 Artificial Intelligence Techniques Applications for Power Disturbances Classification
Authors: K.Manimala, Dr.K.Selvi, R.Ahila
Abstract:
Artificial Intelligence (AI) methods are increasingly being used for problem solving. This paper concerns using AI-type learning machines for power quality problem, which is a problem of general interest to power system to provide quality power to all appliances. Electrical power of good quality is essential for proper operation of electronic equipments such as computers and PLCs. Malfunction of such equipment may lead to loss of production or disruption of critical services resulting in huge financial and other losses. It is therefore necessary that critical loads be supplied with electricity of acceptable quality. Recognition of the presence of any disturbance and classifying any existing disturbance into a particular type is the first step in combating the problem. In this work two classes of AI methods for Power quality data mining are studied: Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). We show that SVMs are superior to ANNs in two critical respects: SVMs train and run an order of magnitude faster; and SVMs give higher classification accuracy.
Keywords: back propagation network, power quality, probabilistic neural network, radial basis function support vector machine
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15577105 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises
Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto
Abstract:
The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.
Keywords: Data management, digitization, Industry 4.0, knowledge engineering, metamodel.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14587104 A Methodology for Data Migration between Different Database Management Systems
Authors: Bogdan Walek, Cyril Klimes
Abstract:
In present days the area of data migration is very topical. Current tools for data migration in the area of relational database have several disadvantages that are presented in this paper. We propose a methodology for data migration of the database tables and their data between various types of relational database systems (RDBMS). The proposed methodology contains an expert system. The expert system contains a knowledge base that is composed of IFTHEN rules and based on the input data suggests appropriate data types of columns of database tables. The proposed tool, which contains an expert system, also includes the possibility of optimizing the data types in the target RDBMS database tables based on processed data of the source RDBMS database tables. The proposed expert system is shown on data migration of selected database of the source RDBMS to the target RDBMS.
Keywords: Expert system, fuzzy, data migration, database, relational database, data type, relational database management system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34927103 Vortex Shedding on Combined Bodies at Incidence to a Uniform Air Stream
Authors: T. Yavuz, Y. E. Akansu, M. Sarıoglu, M. Ozmert
Abstract:
Vortex-shedding phenomenon of the flow around combined two bodies having various geometries and sizes has been investigated experimentally in the Reynolds number range between 4.1x103 and 1.75x104. To see the effect of the rotation of the bodies on the vortex shedding, the combined bodies were rotated from 0° to 180°. The combined models have a cross section composing of a main circular cylinder and an attached circular or square cylinder. Results have shown that Strouhal numbers for two cases were changed considerably with the angle of incidence, while it was found to be largely independent of Reynolds number at 150. Characteristics of the vortex formation region and location of flow attachments, reattachments, and separations were observed by means of the flow visualizations. Depending on the inclination angle the effects of flow attachment, separation and reattachment on vortex-shedding phenomenon have been discussed.Keywords: Bluff body, vortex shedding, flow separation, flow reattachment
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21237102 Water Pollution in Soshanguve Environs of South Africa
Authors: O. I. Nkwonta, G. M. Ochieng
Abstract:
Surface water pollution is one of the serious environmental problems in rural areas of South Africa due to discharge of household waste into the streams, turning them into open sewers. In this study, samples of water were collected from a stream in Soshanguve and analysed. The result showed that pollution in the area was caused by man and its activities. The water quality in the area was found to have deterioted significantly after water runoff from farms and household wastes. The result shows, fertilizer runoff contributes 50% of the pollution while pesticides and sediments contribute up to 10% respectively in the streams, while household waste contributes up to 30%. This study gives an outline of the sources of water pollution in the area and provides a process of creating a clean and unpolluted environment for Soshanguve community in Pretoria north in order to achieve the 7th aim of the millennium development goals by 2015, which is ensuring environmental sustainability.Keywords: Fertilizer, Household waste, Pollution, Roughing filters.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38467101 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions
Authors: K. Hardy, A. Maurushat
Abstract:
Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.
Keywords: Big data, open data, productivity, transparency.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16367100 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data
Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin
Abstract:
Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.
Keywords: Big data, correlation analysis, data recommendation system, urban data network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11057099 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment – A Practical Example
Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh
Abstract:
With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.
Keywords: Data integration, disease-related malnutrition, expert systems, mobile health.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22007098 Evaluation of Water Quality for the Kurtbogazi Dam Outlet and the Streams Feeding the Dam in Ankara, Turkey
Authors: G. Tozsin, F. Bakir, C. Acar, E. Koç
Abstract:
Kurtbogazi Dam has gained special meaning for Ankara, Turkey for the last decade due to the rapid depletion of nearby resources of drinking water. In this study, the results of the analyses of Kurtbogazi Dam outlet water and the rivers flowing into the Kurtbogazi Dam were discussed for the period of last five years between 2008 and 2012. Some physical and chemical properties (pH, temperature, biochemical oxygen demand (BOD5), nitrate, phosphate and chlorine) of these water resources were evaluated. They were classified according to the Council Directive (75/440/EEC). Moreover, the properties of these surface waters were assessed to determine the quality of water for drinking and irrigation purposes using Piper, US Salinity Laboratory and Wilcox diagrams. The results showed that all the water resources are acceptable level as surface water except for Pazar Stream in terms of ortho-phosphate and BOD5 concentration for 2008.Keywords: Kurtbogazi dam, water quality assessment, Ankara water, water supply.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19017097 Full-genomic Network Inference for Non-model organisms: A Case Study for the Fungal Pathogen Candida albicans
Authors: Jörg Linde, Ekaterina Buyko, Robert Altwasser, Udo Hahn, Reinhard Guthke
Abstract:
Reverse engineering of full-genomic interaction networks based on compendia of expression data has been successfully applied for a number of model organisms. This study adapts these approaches for an important non-model organism: The major human fungal pathogen Candida albicans. During the infection process, the pathogen can adapt to a wide range of environmental niches and reversibly changes its growth form. Given the importance of these processes, it is important to know how they are regulated. This study presents a reverse engineering strategy able to infer fullgenomic interaction networks for C. albicans based on a linear regression, utilizing the sparseness criterion (LASSO). To overcome the limited amount of expression data and small number of known interactions, we utilize different prior-knowledge sources guiding the network inference to a knowledge driven solution. Since, no database of known interactions for C. albicans exists, we use a textmining system which utilizes full-text research papers to identify known regulatory interactions. By comparing with these known regulatory interactions, we find an optimal value for global modelling parameters weighting the influence of the sparseness criterion and the prior-knowledge. Furthermore, we show that soft integration of prior-knowledge additionally improves the performance. Finally, we compare the performance of our approach to state of the art network inference approaches.
Keywords: Pathogen, network inference, text-mining, Candida albicans, LASSO, mutual information, reverse engineering, linear regression, modelling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16737096 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes
Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin
Abstract:
Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.Keywords: Missing data, Imputation, Missing Data Techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16687095 Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists
Authors: George E. Tsekouras, Evi Sampanikou
Abstract:
We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.Keywords: Aesthetic judgment, comics artists, cluster analysis, categorical data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16347094 IoT Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework
Authors: Femi Elegbeleye, Seani Rananga
Abstract:
This paper focused on cost effective storage architecture using fog and cloud data storage gateway, and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. Several results obtained from this study on data privacy models show that when two or more data privacy models are integrated via a fog storage gateway, we often have more secure data. Our main focus in the study is to design a framework for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, including its structure, and its interrelationships.
Keywords: IoT, fog storage, cloud storage, data analysis, data privacy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 244