Search results for: analysis of scientific data
13289 Are XBRL-based Financial Reports Better than Non-XBRL Reports? A Quality Assessment
Authors: Zhenkun Wang, Simon S. Gao
Abstract:
Using a scoring system, this paper provides a comparative assessment of the quality of data between XBRL formatted financial reports and non-XBRL financial reports. It shows a major improvement in the quality of data of XBRL formatted financial reports. Although XBRL formatted financial reports do not show much advantage in the quality at the beginning, XBRL financial reports lately display a large improvement in the quality of data in almost all aspects. With the improved XBRL web data managing, presentation and analysis applications, XBRL formatted financial reports have a much better accessibility, are more accurate and better in timeliness.Keywords: Data Quality; Financial Report; Information; XBRL
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 250413288 Mining Association Rules from Unstructured Documents
Authors: Hany Mahgoub
Abstract:
This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 240113287 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.Keywords: Data cleaning, dependency rules, violation data discovery, data repair.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 256313286 An Automatic Tool for Checking Consistency between Data Flow Diagrams (DFDs)
Authors: Rosziati Ibrahim, Siow Yen Yen
Abstract:
System development life cycle (SDLC) is a process uses during the development of any system. SDLC consists of four main phases: analysis, design, implement and testing. During analysis phase, context diagram and data flow diagrams are used to produce the process model of a system. A consistency of the context diagram to lower-level data flow diagrams is very important in smoothing up developing process of a system. However, manual consistency check from context diagram to lower-level data flow diagrams by using a checklist is time-consuming process. At the same time, the limitation of human ability to validate the errors is one of the factors that influence the correctness and balancing of the diagrams. This paper presents a tool that automates the consistency check between Data Flow Diagrams (DFDs) based on the rules of DFDs. The tool serves two purposes: as an editor to draw the diagrams and as a checker to check the correctness of the diagrams drawn. The consistency check from context diagram to lower-level data flow diagrams is embedded inside the tool to overcome the manual checking problem.Keywords: Data Flow Diagram, Context Diagram, ConsistencyCheck, Syntax and Semantic Rules
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 340213285 Operational Risk – Scenario Analysis
Authors: Milan Rippel, Petr Teply
Abstract:
This paper focuses on operational risk measurement techniques and on economic capital estimation methods. A data sample of operational losses provided by an anonymous Central European bank is analyzed using several approaches. Loss Distribution Approach and scenario analysis method are considered. Custom plausible loss events defined in a particular scenario are merged with the original data sample and their impact on capital estimates and on the financial institution is evaluated. Two main questions are assessed – What is the most appropriate statistical method to measure and model operational loss data distribution? and What is the impact of hypothetical plausible events on the financial institution? The g&h distribution was evaluated to be the most suitable one for operational risk modeling. The method based on the combination of historical loss events modeling and scenario analysis provides reasonable capital estimates and allows for the measurement of the impact of extreme events on banking operations.Keywords: operational risk, scenario analysis, economic capital, loss distribution approach, extreme value theory, stress testing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 237813284 Designing Transcutaneous Inductive Powering Links for Implanted Micro-System Device
Authors: Saad Mutashar Abbas, M. A. Hannan, S. A. Samad, A. Hussain
Abstract:
This paper presented a proposed design for transcutaneous inductive powering links. The design used to transfer power and data to the implanted devices such as implanted Microsystems to stimulate and monitoring the nerves and muscles. The system operated with low band frequency 13.56 MHZ according to industrial- scientific – medical (ISM) band to avoid the tissue heating. For external part, the modulation index is 13 % and the modulation rate 7.3% with data rate 1 Mbit/s assuming Tbit=1us. The system has been designed using 0.35-μm fabricated CMOS technology. The mathematical model is given and the design is simulated using OrCAD P Spice 16.2 software tool and for real-time simulation the electronic workbench MULISIM 11 has been used. The novel circular plane (pancake) coils was simulated using ANSOFT- HFss software.Keywords: Implanted devices, ASK techniques, Class-E power amplifier, Inductive powering and low-frequency ISM band.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 254313283 Fault Detection and Identification of COSMED K4b2 Based On PCA and Neural Network
Authors: Jing Zhou, Steven Su, Aihuang Guo
Abstract:
COSMED K4b2 is a portable electrical device designed to test pulmonary functions. It is ideal for many applications that need the measurement of the cardio-respiratory response either in the field or in the lab is capable with the capability to delivery real time data to a sink node or a PC base station with storing data in the memory at the same time. But the actual sensor outputs and data received may contain some errors, such as impulsive noise which can be related to sensors, low batteries, environment or disturbance in data acquisition process. These abnormal outputs might cause misinterpretations of exercise or living activities to persons being monitored. In our paper we propose an effective and feasible method to detect and identify errors in applications by principal component analysis (PCA) and a back propagation (BP) neural network.
Keywords: BP Neural Network, Exercising Testing, Fault Detection and Identification, Principal Component Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 303313282 The Robust Clustering with Reduction Dimension
Authors: Dyah E. Herwindiati
Abstract:
A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paperKeywords: Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 165713281 Benchmarking Cleaner Production Performance of Coal-fired Power Plants Using Two-stage Super-efficiency Data Envelopment Analysis
Authors: Shao-lun Zeng, Yu-long Ren
Abstract:
Benchmarking cleaner production performance is an effective way of pollution control and emission reduction in coal-fired power industry. A benchmarking method using two-stage super-efficiency data envelopment analysis for coal-fired power plants is proposed – firstly, to improve the cleaner production performance of DEA-inefficient or weakly DEA-efficient plants, then to select the benchmark from performance-improved power plants. An empirical study is carried out with the survey data of 24 coal-fired power plants. The result shows that in the first stage the performance of 16 plants is DEA-efficient and that of 8 plants is relatively inefficient. The target values for improving DEA-inefficient plants are acquired by projection analysis. The efficient performance of 24 power plants and the benchmarking plant is achieved in the second stage. The two-stage benchmarking method is practical to select the optimal benchmark in the cleaner production of coal-fired power industry and will continuously improve plants- cleaner production performance.Keywords: benchmarking, cleaner production performance, coal-fired power plant, super-efficiency data envelopment analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 238313280 Energy Efficiency Analysis of Crossover Technologies in Industrial Applications
Authors: W. Schellong
Abstract:
Industry accounts for one-third of global final energy demand. Crossover technologies (e.g. motors, pumps, process heat, and air conditioning) play an important role in improving energy efficiency. These technologies are used in many applications independent of the production branch. Especially electrical power is used by drives, pumps, compressors, and lightning. The paper demonstrates the algorithm of the energy analysis by some selected case studies for typical industrial processes. The energy analysis represents an essential part of energy management systems (EMS). Generally, process control system (PCS) can support EMS. They provide information about the production process, and they organize the maintenance actions. Combining these tools into an integrated process allows the development of an energy critical equipment strategy. Thus, asset and energy management can use the same common data to improve the energy efficiency.
Keywords: Crossover technologies, data management, energy analysis, energy efficiency, process control.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 91113279 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 147913278 Sustainable Perspectives and Local Development Potential through Tourism
Authors: Pedro H. S. Messetti, Mary L. G. S. Senna, Afonso R. Aquino
Abstract:
Sustainability is a very important and heavily discussed subject, expanding through tourism as well. The study proposition was to collect data and present it to the competent bodies so they can mold their public policies to improve the conditions of the site. It was hypothesized that the lack of data is currently affecting the quality of life and the sustainable development of the site and the tourism. The research was held in Mateiros, a city in the state of Tocantins (TO)/Brasil near Palmas, its capital city. Because of the concentration of tourists during the high season and several tourist attractions being around, the research took place in Mateiros. The methodological procedure had a script of theoretical construction and investigation of the deductive scientific method parameters through a case study in the Jalapão/TO/Brazil region, using it as a tool for a questionnaire given to the competent bodies in an interview system with the UN sustainability indexes as a base. In the three sustainable development scope: environmental, social and economic, the results indicated that the data presented by the interviewed were scarce or nonexistent. It shows that more research is necessary, providing the tools for the ones responsible to propose action plans to improve the site, strengthening the tourism and making it even more sustainable.Keywords: Jalapão/Brazil State Park, sustainable tourism, UN sustainability indexes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 140713277 Methods for Distinction of Cattle Using Supervised Learning
Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl
Abstract:
Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.
Keywords: Genetic data, Pinzgau cattle, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 227713276 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application
Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil
Abstract:
In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.
Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 207413275 Spatial-Temporal Clustering Characteristics of Dengue in the Northern Region of Sri Lanka, 2010-2013
Authors: Sumiko Anno, Keiji Imaoka, Takeo Tadono, Tamotsu Igarashi, Subramaniam Sivaganesh, Selvam Kannathasan, Vaithehi Kumaran, Sinnathamby Noble Surendran
Abstract:
Dengue outbreaks are affected by biological, ecological, socio-economic and demographic factors that vary over time and space. These factors have been examined separately and still require systematic clarification. The present study aimed to investigate the spatial-temporal clustering relationships between these factors and dengue outbreaks in the northern region of Sri Lanka. Remote sensing (RS) data gathered from a plurality of satellites were used to develop an index comprising rainfall, humidity and temperature data. RS data gathered by ALOS/AVNIR-2 were used to detect urbanization, and a digital land cover map was used to extract land cover information. Other data on relevant factors and dengue outbreaks were collected through institutions and extant databases. The analyzed RS data and databases were integrated into geographic information systems, enabling temporal analysis, spatial statistical analysis and space-time clustering analysis. Our present results showed that increases in the number of the combination of ecological factor and socio-economic and demographic factors with above the average or the presence contribute to significantly high rates of space-time dengue clusters.
Keywords: ALOS/AVNIR-2, Dengue, Space-time clustering analysis, Sri Lanka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 223713274 Estimating Bridge Deterioration for Small Data Sets Using Regression and Markov Models
Authors: Yina F. Muñoz, Alexander Paz, Hanns De La Fuente-Mella, Joaquin V. Fariña, Guilherme M. Sales
Abstract:
The primary approach for estimating bridge deterioration uses Markov-chain models and regression analysis. Traditional Markov models have problems in estimating the required transition probabilities when a small sample size is used. Often, reliable bridge data have not been taken over large periods, thus large data sets may not be available. This study presents an important change to the traditional approach by using the Small Data Method to estimate transition probabilities. The results illustrate that the Small Data Method and traditional approach both provide similar estimates; however, the former method provides results that are more conservative. That is, Small Data Method provided slightly lower than expected bridge condition ratings compared with the traditional approach. Considering that bridges are critical infrastructures, the Small Data Method, which uses more information and provides more conservative estimates, may be more appropriate when the available sample size is small. In addition, regression analysis was used to calculate bridge deterioration. Condition ratings were determined for bridge groups, and the best regression model was selected for each group. The results obtained were very similar to those obtained when using Markov chains; however, it is desirable to use more data for better results.
Keywords: Concrete bridges, deterioration, Markov chains, probability matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138913273 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity
Authors: Hoda A. Abdel Hafez
Abstract:
Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 242213272 The Comparison of Anchor and Star Schema from a Query Performance Perspective
Authors: Radek Němec
Abstract:
Today's business environment requires that companies have access to highly relevant information in a matter of seconds. Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by star schemas. Dimensional modeling is already recognized as a leading industry standard in the field of data warehousing although several drawbacks and pitfalls were reported. This paper focuses on the analysis of another data warehouse modeling technique - the anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show information about performance of queries executed on database schemas structured according to principles of each database modeling technique.Keywords: Data warehousing, anchor modeling, star schema, anchor schema, query performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 327213271 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences
Authors: C. Xavier Mendieta, J. J McArthur
Abstract:
Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.Keywords: Building archetypes, data analysis, energy benchmarks, GHG emissions.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 97413270 Twitter Sentiment Analysis during the Lockdown on New Zealand
Authors: Smah Doeban Almotiri
Abstract:
One of the most common fields of natural language processing (NLP) is sentimental analysis. The inferred feeling in the text can be successfully mined for various events using sentiment analysis. Twitter is viewed as a reliable data point for sentimental analytics studies since people are using social media to receive and exchange different types of data on a broad scale during the COVID-19 epidemic. The processing of such data may aid in making critical decisions on how to keep the situation under control. The aim of this research is to look at how sentimental states differed in a single geographic region during the lockdown at two different times.1162 tweets were analyzed related to the COVID-19 pandemic lockdown using keywords hashtags (lockdown, COVID-19) for the first sample tweets were from March 23, 2020, until April 23, 2020, and the second sample for the following year was from March 1, 2021, until April 4, 2021. Natural language processing (NLP), which is a form of Artificial intelligent was used for this research to calculate the sentiment value of all of the tweets by using AFINN Lexicon sentiment analysis method. The findings revealed that the sentimental condition in both different times during the region's lockdown was positive in the samples of this study, which are unique to the specific geographical area of New Zealand. This research suggests applied machine learning sentimental method such as Crystal Feel and extended the size of the sample tweet by using multiple tweets over a longer period of time.
Keywords: sentiment analysis, Twitter analysis, lockdown, Covid-19, AFINN, NodeJS
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 47113269 Generic Data Warehousing for Consumer Electronics Retail Industry
Authors: S. Habte, K. Ouazzane, P. Patel, S. Patel
Abstract:
The dynamic and highly competitive nature of the consumer electronics retail industry means that businesses in this industry are experiencing different decision making challenges in relation to pricing, inventory control, consumer satisfaction and product offerings. To overcome the challenges facing retailers and create opportunities, we propose a generic data warehousing solution which can be applied to a wide range of consumer electronics retailers with a minimum configuration. The solution includes a dimensional data model, a template SQL script, a high level architectural descriptions, ETL tool developed using C#, a set of APIs, and data access tools. It has been successfully applied by ASK Outlets Ltd UK resulting in improved productivity and enhanced sales growth.
Keywords: Consumer electronics retail, dimensional data model, data analysis, generic data warehousing, reporting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 133313268 A Proposal for U-City (Smart City) Service Method Using Real-Time Digital Map
Authors: SangWon Han, MuWook Pyeon, Sujung Moon, DaeKyo Seo
Abstract:
Recently, technologies based on three-dimensional (3D) space information are being developed and quality of life is improving as a result. Research on real-time digital map (RDM) is being conducted now to provide 3D space information. RDM is a service that creates and supplies 3D space information in real time based on location/shape detection. Research subjects on RDM include the construction of 3D space information with matching image data, complementing the weaknesses of image acquisition using multi-source data, and data collection methods using big data. Using RDM will be effective for space analysis using 3D space information in a U-City and for other space information utilization technologies.
Keywords: RDM, multi-source data, big data, U-City.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 76313267 Interrelationships between Physicochemical Water Pollution Indicators: A Case Study of River Pandu
Authors: Sunita Verma , Divya Tiwari, Ajay Verma
Abstract:
Water samples were collected from river Pandu at six stations where human and animal activities were high. Composite samples were analyzed for dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD) , pH values during dry and wet seasons as well as the harmattan period. The total data points were used to establish relationships between the parameters and data were also subjected to statistical analysis and expressed as mean ± standard error of mean (SEM) at a level of significance of p<0.05. Regression analysis was carried out to establish relationships if any between studied parameters and relationships in form of scatter plots were obtained between DO/BOD, COD/DO, BOD/COD, COD/pH, BOD/pH and DO/pH. The high to moderate correlation coefficient observed, R2 ranged from 0.68 to 0.15 between these parameters.Keywords: BOD, DO, COD, pH, Regression analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 208413266 K-Means for Spherical Clusters with Large Variance in Sizes
Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan
Abstract:
Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.Keywords: K-Means, Data Clustering, Cluster Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 324113265 Analysis of Data Gathering Schemes for Layered Sensor Networks with Multihop Polling
Authors: Bhed Bahadur Bista, Danda B. Rawat
Abstract:
In this paper, we investigate multihop polling and data gathering schemes in layered sensor networks in order to extend the life time of the networks. A network consists of three layers. The lowest layer contains sensors. The middle layer contains so called super nodes with higher computational power, energy supply and longer transmission range than sensor nodes. The top layer contains a sink node. A node in each layer controls a number of nodes in lower layer by polling mechanism to gather data. We will present four types of data gathering schemes: intermediate nodes do not queue data packet, queue single packet, queue multiple packets and aggregate data, to see which data gathering scheme is more energy efficient for multihop polling in layered sensor networks.
Keywords: layered sensor network, polling, data gatheringschemes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 152413264 Transcutaneous Inductive Powering Links Based on ASK Modulation Techniques
Authors: S. M. Abbas, M. A. Hannan, S. A. Samad, A. Hussain
Abstract:
This paper presented a modified efficient inductive powering link based on ASK modulator and proposed efficient class- E power amplifier. The design presents the external part which is located outside the body to transfer power and data to the implanted devices such as implanted Microsystems to stimulate and monitoring the nerves and muscles. The system operated with low band frequency 10MHZ according to industrial- scientific – medical (ISM) band to avoid the tissue heating. For external part, the modulation index is 11.1% and the modulation rate 7.2% with data rate 1 Mbit/s assuming Tbit = 1us. The system has been designed using 0.35-μm fabricated CMOS technology. The mathematical model is given and the design is simulated using OrCAD P Spice 16.2 software tool and for real-time simulation, the electronic workbench MULISIM 11 has been used.Keywords: Implanted devices, ASK techniques, Class-E power amplifier, Inductive powering and low-frequency ISM band.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 233213263 A Review on the Comparison of EU Countries Based on Research and Development Efficiencies
Authors: Yeliz Ekinci, Raife Merve Ön
Abstract:
Nowadays, technological progress is one of the most important components of economic growth and the efficiency of R&D activities is particularly essential for countries. This study is an attempt to analyze the R&D efficiencies of EU countries. The indicators related to R&D efficiencies should be determined in advance in order to use DEA. For this reason a list of input and output indicators are derived from the literature review. Considering the data availability, a final list is given for the numerical analysis for future research.Keywords: Data envelopment analysis, economic growth, EU Countries, R&D efficiency.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201113262 Sentiment Analysis: Popularity of Candidates for the President of the United States
Authors: Radek Malinský, Ivan Jelínek
Abstract:
This article deals with the popularity of candidates for the president of the United States of America. The popularity is assessed according to public comments on the Web 2.0. Social networking, blogging and online forums (collectively Web 2.0) are for common Internet users the easiest way to share their personal opinions, thoughts, and ideas with the entire world. However, the web content diversity, variety of technologies and website structure differences, all of these make the Web 2.0 a network of heterogeneous data, where things are difficult to find for common users. The introductory part of the article describes methodology for gathering and processing data from Web 2.0. The next part of the article is focused on the evaluation and content analysis of obtained information, which write about presidential candidates.
Keywords: Sentiment Analysis, Web 2.0, Webometrics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 319313261 Data Projects for “Social Good”: Challenges and Opportunities
Authors: Mikel Niño, Roberto V. Zicari, Todor Ivanov, Kim Hee, Naveed Mushtaq, Marten Rosselli, Concha Sánchez-Ocaña, Karsten Tolle, José Miguel Blanco, Arantza Illarramendi, Jörg Besier, Harry Underwood
Abstract:
One of the application fields for data analysis techniques and technologies gaining momentum is the area of social good or “common good”, covering cases related to humanitarian crises, global health care, or ecology and environmental issues, among others. The promotion of data-driven projects in this field aims at increasing the efficacy and efficiency of social initiatives, improving the way these actions help humanity in general and people in need in particular. This application field, however, poses its own barriers and challenges when developing data-driven projects, lagging behind in comparison with other scenarios. These challenges derive from aspects such as the scope and scale of the social issue to solve, cultural and political barriers, the skills of main stakeholders and the technological resources available, the motivation to be engaged in such projects, or the ethical and legal issues related to sensitive data. This paper analyzes the application of data projects in the field of social good, reviewing its current state and noteworthy initiatives, and presenting a framework covering the key aspects to analyze in such projects. The goal is to provide guidelines to understand the main challenges and opportunities for this type of data project, as well as identifying the main differential issues compared to “classical” data projects in general. A case study is presented on the initial steps and stakeholder analysis of a data project for the inclusion of refugees in the city of Frankfurt, Germany, in order to empirically confront the framework with a real example.Keywords: Data-Driven projects, humanitarian operations, personal and sensitive data, social good, stakeholders analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 171413260 Using RASCAL and ALOHA Codes to Establish an Analysis Methodology for Hydrogen Fluoride Evaluation
Authors: J. R. Wang, Y. Chiang, W. S. Hsu, H. C. Chen, S. H. Chen, J. H. Yang, S. W. Chen, C. Shih
Abstract:
In this study, the RASCAL and ALOHA codes are used to establish an analysis methodology for hydrogen fluoride (HF) evaluation. There are three main steps in this study. First, the UF6 data were collected. Second, one postulated case was analyzed by using the RASCAL and UF6 data. This postulated case assumes that fire occurring and UF6 is releasing from a building. Third, the results of RASCAL for HF mass were as the input data of ALOHA. Two postulated cases of HF were analyzed by using ALOHA code and the results of RASCAL. These postulated cases assume fire occurring and HF is releasing with no raining (Case 1) or raining (Case 2) condition. According to the analysis results of ALOHA, the HF concentration of Case 2 is smaller than Case 1. The results can be a reference for the preparing of emergency plans for the release of HF.Keywords: RASCAL, ALOHA, UF6, hydrogen fluoride.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 697