Search results for: Data storage
7523 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment – A Practical Example
Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh
With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.
Keywords: Data integration, disease-related malnutrition, expert systems, mobile health.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22007522 Internal Surface Measurement of Nanoparticle with Polarization-interferometric Nonlinear Confocal Microscope
Authors: Chikara Egami, Kazuhiro Kuwahara
Polarization-interferometric nonlinear confocal microscopy is proposed for measuring a nano-sized particle with optical anisotropy. The anisotropy in the particle was spectroscopically imaged through a three-dimensional distribution of third-order nonlinear dielectric polarization photoinduced.Keywords: nanoparticle, optical storage, microscope
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13647521 A Survey on Supply Chain Management and E Commerce Technology Adoption among Logistics Service Providers in Johor
Authors: Mohd Iskandar bin Illyas Tan, Iziati Saadah bt Ibrahim
Logistics is part of the supply chain processes that plans, implements, and controls the efficient and effective forward and reverse flow and storage of goods, services, and related information between the point of origin and the point of consumption in order to meet customer requirements. This research aims to investigate the current status and future direction of the use of Information Technology (IT) for logistics, focusing on Supply Chain Management (SCM) and E-Commerce adoption in Johor. Therefore, this research stresses on the type of technology being adopted, factors, benefits and barriers affecting the innovation in SCM and ECommerce technology adoption among Logistics Service Providers (LSP). A mailed questionnaire survey was conducted to collect data from 265 logistics companies in Johor. The research revealed that SCM technology adoption among LSP was higher as they had adopted SCM technology in various business processes while they perceived a high level of benefits from SCM adoption. Obviously, ECommerce technology adoption among LSP is relatively low.
Keywords: E-Commerce, Johor, Logistics Service Providers, Supply Chain Management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31167520 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes
Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin
Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.Keywords: Missing data, Imputation, Missing Data Techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16677519 Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists
Authors: George E. Tsekouras, Evi Sampanikou
We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.Keywords: Aesthetic judgment, comics artists, cluster analysis, categorical data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16347518 Electronic System Design for Respiratory Signal Processing
Authors: C. Matiz C., N. Olarte L., A. Rubiano F.
This paper presents the design related to the electronic system design of the respiratory signal, including phases for processing, followed by the transmission and reception of this signal and finally display. The processing of this signal is added to the ECG and temperature sign, put up last year. Under this scheme is proposed that in future also be conditioned blood pressure signal under the same final printed circuit and worked.Keywords: Conditioning, Respiratory Signal, Storage, Teleconsultation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23557517 Assets Integrity Management in Oil and Gas Production Facilities Through Corrosion Mitigation and Inspection Strategy: A Case Study of Sarir Oilfield
Authors: Iftikhar Ahmad, Youssef Elkezza
Sarir oilfield is in North Africa. It has facilities of oil and gas production. The assets of the Sarir oilfield can be divided into five following categories, namely: (i) Well bore and wellheads; (ii) Vessels such as separators, desalters, and gas processing facilities; (iii) Pipelines including all flow lines, trunk lines, and shipping lines; (iv) storage tanks; (v) Other assets such as turbines and compressors, etc. The nature of the petroleum industry recognizes the potential human, environmental and financial consequences that can result from failing to maintain the integrity of wellheads, vessels, tanks, pipelines, and other assets. The importance of effective asset integrity management increases as the industry infrastructure continues to age. The primary objective of assets integrity management (AIM) is to maintain assets in a fit-for-service condition while extending their remaining life in the most reliable, safe, and cost-effective manner. Corrosion management is one of the important aspects of successful asset integrity management. It covers corrosion mitigation, monitoring, inspection, and risk evaluation. External corrosion on pipelines, well bores, buried assets, and bottoms of tanks is controlled with a combination of coatings by cathodic protection, while the external corrosion on surface equipment, wellheads, and storage tanks is controlled by coatings. The periodic cleaning of the pipeline by pigging helps in the prevention of internal corrosion. Further, internal corrosion of pipelines is prevented by chemical treatment and controlled operations. This paper describes the integrity management system used in the Sarir oil field for its oil and gas production facilities based on standard practices of corrosion mitigation and inspection.
Keywords: Assets integrity management, corrosion prevention in oilfield assets, corrosion management in oilfield, corrosion prevention and inspection activities.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727516 Physical and Microbiological Evaluation of Chitosan Films: Effect of Essential Oils and Storage
Authors: N. Valderrama, W. Albarracín, N. Algecira
The effect of the inclusion of thyme and rosemary essential oils into chitosan films, as well as the microbiological and physical properties when storing chitosan film with and without the mentioned inclusion was studied. The film forming solution was prepared by dissolving chitosan (2%, w/v), polysorbate 80 (4% w/w CH) and glycerol (16% w/w CH) in aqueous lactic acid solutions (control). The thyme (TEO) and rosemary (REO) essential oils (EOs) were included 1:1 w/w (EOs:CH) on their combination 50/50 (TEO:REO). The films were stored at temperatures of 5, 20, 33°C and a relative humidity of 75% during four weeks. The films with essential oil inclusion did not show an antimicrobial activity against strains. This behavior could be explained because the chitosan only inhibits the growth of microorganisms in direct contact with the active sites. However, the inhibition capacity of TEO was higher than the REO and a synergic effect between TEO:REO was found for S. enteritidis strains in the chitosan solution. Some physical properties were modified by the inclusion of essential oils. The addition of essential oils does not affect the mechanical properties (tensile strength, elongation at break, puncture deformation), the water solubility, the swelling index nor the DSC behavior. However, the essential oil inclusion can significantly decrease the thickness, the moisture content, and the L* value of films whereas the b* value increased due to molecular interactions between the polymeric matrix, the loosing of the structure, and the chemical modifications. On the other hand, the temperature and time of storage changed some physical properties on the chitosan films. This could have occurred because of chemical changes, such as swelling in the presence of high humidity air and the reacetylation of amino groups. In the majority of cases, properties such as moisture content, tensile strength, elongation at break, puncture deformation, a*, b*, chrome, 7E increased whereas water resistance, swelling index, L*, and hue angle decreased.
Keywords: Chitosan, food additives, modified films, polymers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29927515 The Impact of System and Data Quality on Organizational Success in the Kingdom of Bahrain
Authors: Amal M. Alrayes
Data and system quality play a central role in organizational success, and the quality of any existing information system has a major influence on the effectiveness of overall system performance. Given the importance of system and data quality to an organization, it is relevant to highlight their importance on organizational performance in the Kingdom of Bahrain. This research aims to discover whether system quality and data quality are related, and to study the impact of system and data quality on organizational success. A theoretical model based on previous research is used to show the relationship between data and system quality, and organizational impact. We hypothesize, first, that system quality is positively associated with organizational impact, secondly that system quality is positively associated with data quality, and finally that data quality is positively associated with organizational impact. A questionnaire was conducted among public and private organizations in the Kingdom of Bahrain. The results show that there is a strong association between data and system quality, that affects organizational success.Keywords: Data quality, performance, system quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21187514 The Most Secure Smartphone Operating System: A Survey
Authors: Sundus Ayyaz, Saad Rehman
In the recent years, a fundamental revolution in the Mobile Phone technology from just being able to provide voice and short message services to becoming the most essential part of our lives by connecting to network and various app stores for downloading software apps of almost every activity related to our life from finding location to banking from getting news updates to downloading HD videos and so on. This progress in Smart Phone industry has modernized and transformed our way of living into a trouble-free world. The smart phone has become our personal computers with the addition of significant features such as multi core processors, multi-tasking, large storage space, bluetooth, WiFi, including large screen and cameras. With this evolution, the rise in the security threats have also been amplified. In Literature, different threats related to smart phones have been highlighted and various precautions and solutions have been proposed to keep the smart phone safe which carries all the private data of a user. In this paper, a survey has been carried out to find out the most secure and the most unsecure smart phone operating system among the most popular smart phones in use today.Keywords: Smart phone, operating system, security threats, Android, iOS, Balckberry, Windows.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 41807513 Gas-Liquid Flow on Smooth and Textured Inclined Planes
Authors: J.J. Cooke, S. Gu, L.M. Armstrong, K.H. Luo
Carbon Capture & Storage (CCS) is one of the various methods that can be used to reduce the carbon footprint of the energy sector. This paper focuses on the absorption of CO2 from flue gas using packed columns, whose efficiency is highly dependent on the structure of the liquid films within the column. To study the characteristics of liquid films a CFD solver, OpenFOAM is utilised to solve two-phase, isothermal film flow using the volume-of-fluid (VOF) method. The model was validated using existing experimental data and the Nusselt theory. It was found that smaller plate inclination angles, with respect to the horizontal plane, resulted in larger wetted areas on smooth plates. However, only a slight improvement in the wetted area was observed. Simulations were also performed using a ridged plate and it was observed that these surface textures significantly increase the wetted area of the plate. This was mainly attributed to the channelling effect of the ridges, which helped to oppose the surface tension forces trying to minimise the surface area. Rivulet formations on the ridged plate were also flattened out and spread across a larger proportion of the plate width.Keywords: CCS, liquid film flow, packed columns, wetted area
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21027512 Integration of Multi-Source Data to Monitor Coral Biodiversity
Authors: K. Jitkue, W. Srisang, C. Yaiprasert, K. Jaroensutasinee, M. Jaroensutasinee
This study aims at using multi-source data to monitor coral biodiversity and coral bleaching. We used coral reef at Racha Islands, Phuket as a study area. There were three sources of data: coral diversity, sensor based data and satellite data.Keywords: Coral reefs, Remote sensing, Sea surfacetemperatue, Satellite imagery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15537511 Decision Support System Based on Data Warehouse
Authors: Yang Bao, LuJing Zhang
Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.
Keywords: Decision Support System, Data Warehouse, Data Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38627510 Incremental Learning of Independent Topic Analysis
Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda
In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.Keywords: Text mining, topic extraction, independent, incremental, independent component analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10587509 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data
Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala
Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.
Keywords: Databases, data mining, multi-agent, spatial datamart.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20457508 Streamflow Modeling for a Small Watershed Using Limited Hydrological Data
Authors: S. Chuenchooklin
This research was conducted in the Pua Watershed whereas located in the Upper Nan River Basin in Nan province, Thailand. Nan River basin originated in Nan province that comprises of many tributary streams to produce as inflow to the Sirikit dam provided huge reservoir with the storage capacity of 9510 million cubic meters. The common problems of most watersheds were found i.e. shortage water supply for consumption and agriculture utilizations, deteriorate of water quality, flood and landslide including debris flow, and unstable of riverbank. The Pua Watershed is one of several small river basins that flow through the Nan River Basin. The watershed includes 404 km2 representing the Pua District, the Upper Nan Basin, or the whole Nan River Basin, of 61.5%, 18.2% or 1.2% respectively. The Pua River is a main stream producing all year streamflow supplying the Pua District and an inflow to the Upper Nan Basin. Its length approximately 56.3 kilometers with an average slope of the channel by 1.9% measured. A diversion weir namely Pua weir bound the plain and mountainous areas with a very steep slope of the riverbed to 2.9% and drainage area of 149 km2 as upstream watershed while a mild slope of the riverbed to 0.2% found in a river reach of 20.3 km downstream of this weir, which considered as a gauged basin. However, the major branch streams of the Pua River are ungauged catchments namely: Nam Kwang and Nam Koon with the drainage area of 86 and 35 km2 respectively. These upstream watersheds produce runoff through the 3-streams downstream of Pua weir, Jao weir, and Kang weir, with an averaged annual runoff of 578 million cubic meters. They were analyzed using both statistical data at Pua weir and simulated data resulted from the hydrologic modeling system (HEC–HMS) which applied for the remaining ungauged basins. Since the Kwang and Koon catchments were limited with lack of hydrological data included streamflow and rainfall. Therefore, the mathematical modeling: HEC-HMS with the Snyder-s hydrograph synthesized and transposed methods were applied for those areas using calibrated hydrological parameters from the upstream of Pua weir with continuously daily recorded of streamflow and rainfall data during 2008-2011. The results showed that the simulated daily streamflow and sum up as annual runoff in 2008, 2010, and 2011 were fitted with observed annual runoff at Pua weir using the simple linear regression with the satisfied correlation R2 of 0.64, 062, and 0.59, respectively. The sensitivity of simulation results were come from difficulty using calibrated parameters i.e. lag-time, coefficient of peak flow, initial losses, uniform loss rates, and missing some daily observed data. These calibrated parameters were used to apply for the other 2-ungauged catchments and downstream catchments simulated.
Keywords: Streamflow, hydrological model, ungauged catchments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19917507 The Challenges of Cloud Computing Adoption in Nigeria
Authors: Chapman Eze Nnadozie
Cloud computing, a technology that is made possible through virtualization within networks represents a shift from the traditional ownership of infrastructure and other resources by distinct organization to a more scalable pattern in which computer resources are rented online to organizations on either as a pay-as-you-use basis or by subscription. In other words, cloud computing entails the renting of computing resources (such as storage space, memory, servers, applications, networks, etc.) by a third party to its clients on a pay-as-go basis. It is a new innovative technology that is globally embraced because of its renowned benefits, profound of which is its cost effectiveness on the part of organizations engaged with its services. In Nigeria, the services are provided either directly to companies mostly by the key IT players such as Microsoft, IBM, and Google; or in partnership with some other players such as Infoware, Descasio, and Sunnet. This action enables organizations to rent IT resources on a pay-as-you-go basis thereby salvaging them from wastages accruable on acquisition and maintenance of IT resources such as ownership of a separate data centre. This paper intends to appraise the challenges of cloud computing adoption in Nigeria, bearing in mind the country’s peculiarities’ in terms of infrastructural development. The methodologies used in this paper include the use of research questionnaires, formulated hypothesis, and the testing of the formulated hypothesis. The major findings of this paper include the fact that there are some addressable challenges to the adoption of cloud computing in Nigeria. Furthermore, the country will gain significantly if the challenges especially in the area of infrastructural development are well addressed. This is because the research established the fact that there are significant gains derivable by the adoption of cloud computing by organizations in Nigeria. However, these challenges can be overturned by concerted efforts in the part of government and other stakeholders.
Keywords: Cloud computing, data centre, infrastructure, IT resources, network, servers, virtualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17967506 Latent Topic Based Medical Data Classification
Authors: Jian-hua Yeh, Shi-yi Kuo
This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.
Keywords: classification, latent topics, outlier adjustment, feature scaling
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16427505 Concept, Design and Implementation of Power System Component Simulator Based on Thyristor Controlled Transformer and Power Converter
Authors: B. Kędra, R. Małkowski
This paper presents information on Power System Component Simulator – a device designed for LINTE^2 laboratory owned by Gdansk University of Technology in Poland. In this paper, we first provide an introductory information on the Power System Component Simulator and its capabilities. Then, the concept of the unit is presented. Requirements for the unit are described as well as proposed and introduced functions are listed. Implementation details are given. Hardware structure is presented and described. Information about used communication interface, data maintenance and storage solution, as well as used Simulink real-time features are presented. List and description of all measurements is provided. Potential of laboratory setup modifications is evaluated. Lastly, the results of experiments performed using Power System Component Simulator are presented. This includes simulation of under frequency load shedding, frequency and voltage dependent characteristics of groups of load units, time characteristics of group of different load units in a chosen area.
Keywords: Power converter, Simulink real-time, MATLAB, load, tap controller.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7957504 Data Collection in Hospital Emergencies: A Questionnaire Survey
Authors: Nouha Mhimdi, Wahiba Ben Abdessalem Karaa, Henda Ben Ghezala
Many methods are used to collect data like questionnaires, surveys, focus group interviews. Or the collection of poor-quality data resulting, for example, from poorly designed questionnaires, the absence of good translators or interpreters, and the incorrect recording of data allow conclusions to be drawn that are not supported by the data or to focus only on the average effect of the program or policy. There are several solutions to avoid or minimize the most frequent errors, including obtaining expert advice on the design or adaptation of data collection instruments; or use technologies allowing better "anonymity" in the responses. In this context, and to overcome the aforementioned problems, we suggest in this paper an approach to achieve the collection of relevant data, by carrying out a large-scale questionnaire-based survey. We have been able to collect good quality, consistent and practical data on hospital emergencies to improve emergency services in hospitals, especially in the case of epidemics or pandemics.
Keywords: Data collection, survey, database, data analysis, hospital emergencies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6677503 Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data
Authors: J. D. D. Daniel, K. N. Goh, S. M. Yusop
Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.Keywords: Data Transformation Services (DTS), ObjectLinking and Embedding Database (OLEDB), Data Mart, OnlineAnalytical Processing (OLAP), Online Transactional Processing(OLTP).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20387502 Extraction of Data from Web Pages: A Vision Based Approach
Authors: P. S. Hiremath, Siddu P. Algur
With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.
Keywords: Web data records, web data regions, web mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19017501 Visual-Graphical Methods for Exploring Longitudinal Data
Authors: H. W. Ker
Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20947500 A Materialized Approach to the Integration of XML Documents: the OSIX System
Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet
The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.Keywords: Data integration, semi-structured data, views, XML.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15907499 Insights into Smoothies with High Levels of Fibre and Polyphenols: Factors Influencing Chemical, Rheological and Sensory Properties
Authors: Dongxiao Sun-Waterhouse, Shiji Nair, Reginald Wibisono, Sandhya S. Wadhwa, Carl Massarotto, Duncan I. Hedderley, Jing Zhou, Sara R. Jaeger, Virginia Corrigan
Attempts to add fibre and polyphenols (PPs) into popular beverages present challenges related to the properties of finished products such as smoothies. Consumer acceptability, viscosity and phenolic composition of smoothies containing high levels of fruit fibre (2.5-7.5 g per 300 mL serve) and PPs (250-750 mg per 300 mL serve) were examined. The changes in total extractable PP, vitamin C content, and colour of selected smoothies over a storage stability trial (4°C, 14 days) were compared. A set of acidic aqueous model beverages were prepared to further examine the effect of two different heat treatments on the stability and extractability of PPs. Results show that overall consumer acceptability of high fibre and PP smoothies was low, with average hedonic scores ranging from 3.9 to 6.4 (on a 1-9 scale). Flavour, texture and overall acceptability decreased as fibre and polyphenol contents increased, with fibre content exerting a stronger effect. Higher fibre content resulted in greater viscosity, with an elevated PP content increasing viscosity only slightly. The presence of fibre also aided the stability and extractability of PPs after heating. A reduction of extractable PPs, vitamin C content and colour intensity of smoothies was observed after a 14-day storage period at 4°C. Two heat treatments (75°C for 45 min or 85°C for 1 min) that are normally used for beverage production, did not cause significant reduction of total extracted PPs. It is clear that high levels of added fibre and PPs greatly influence the consumer appeal of smoothies, suggesting the need to develop novel formulation and processing methods if a satisfactory functional beverage is to be developed incorporating these ingredients.Keywords: Apple fibre, apple and blackcurrant polyphenols, consumer acceptability, functional foods, stability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 43367498 Data-Driven Decision-Making in Digital Entrepreneurship
Authors: Abeba Nigussie Turi, Xiangming Samuel Li
Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.
Keywords: Startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8277497 Classifying Bio-Chip Data using an Ant Colony System Algorithm
Authors: Minsoo Lee, Yearn Jeong Kim, Yun-mi Kim, Sujeung Cheong, Sookyung Song
Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.Keywords: Ant Colony System, DNA chip data, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14687496 Trust and Reliability for Public Sector Data
Authors: Klaus Stranacher, Vesna Krnjic, Thomas Zefferer
The public sector holds large amounts of data of various areas such as social affairs, economy, or tourism. Various initiatives such as Open Government Data or the EU Directive on public sector information aim to make these data available for public and private service providers. Requirements for the provision of public sector data are defined by legal and organizational frameworks. Surprisingly, the defined requirements hardly cover security aspects such as integrity or authenticity. In this paper we discuss the importance of these missing requirements and present a concept to assure the integrity and authenticity of provided data based on electronic signatures. We show that our concept is perfectly suitable for the provisioning of unaltered data. We also show that our concept can also be extended to data that needs to be anonymized before provisioning by incorporating redactable signatures. Our proposed concept enhances trust and reliability of provided public sector data.Keywords: Trusted Public Sector Data, Integrity, Authenticity, Reliability, Redactable Signatures.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17587495 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance
Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat
Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18327494 Towards Development of Solution for Business Process-Oriented Data Analysis
Authors: M. Klimavicius
This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.Keywords: Data warehouse, data analysis, business processmanagement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1392