Search results for: Data Structure

8161 Performance Analysis of the Subgroup Method for Collective I/O

Authors: Kwangho Cha, Hyeyoung Cho, Sungho Kim

Abstract:

As many scientific applications require large data processing, the importance of parallel I/O has been increasingly recognized. Collective I/O is one of the considerable features of parallel I/O and enables application programmers to easily handle their large data volume. In this paper we measured and analyzed the performance of original collective I/O and the subgroup method, the way of using collective I/O of MPI effectively. From the experimental results, we found that the subgroup method showed good performance with small data size.

Keywords: Collective I/O, MPI, parallel file system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1567

8160 Adsorption of Lead(II) and Cadmium(II) Ions from Aqueous Solutions by Adsorption on Activated Carbon Prepared from Cashew Nut Shells

Authors: S. Tangjuank, N. Insuk , J. Tontrakoon , V. Udeye

Abstract:

Cashew nut shells were converted into activated carbon powders using KOH activation plus CO2 gasification at 1027 K. The increase both of impregnation ratio and activation time, there was swiftly the development of mesoporous structure with increasing of mesopore volume ratio from 20-28% and 27-45% for activated carbon with ratio of KOH per char equal to 1 and 4, respectively. Activated carbon derived from KOH/char ratio equal to 1 and CO2 gasification time from 20 to 150 minutes were exhibited the BET surface area increasing from 222 to 627 m2.g-1. And those were derived from KOH/char ratio of 4 with activation time from 20 to 150 minutes exhibited high BET surface area from 682 to 1026 m2.g-1. The adsorption of Lead(II) and Cadmium(II) ion was investigated. This adsorbent exhibited excellent adsorption for Lead(II) and Cadmium(II) ion. Maximum adsorption presented at 99.61% at pH 6.5 and 98.87% at optimum conditions. The experimental data was calculated from Freundlich isotherm and Langmuir isotherm model. The maximum capacity of Pb2+ and Cd2+ ions was found to be 28.90 m2.g-1 and 14.29 m2.g-1, respectively.

Keywords: Activated carbon, cashew nut shell, heavy metals, adsorption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3386

8159 Statistical Analysis for Overdispersed Medical Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.

Keywords: Zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3308

8158 Student Satisfaction Data for Work Based Learners

Authors: Rosie Borup, Hanifa Shah

Abstract:

This paper aims to describe how student satisfaction is measured for work-based learners as these are non-traditional learners, conducting academic learning in the workplace, typically their curricula have a high degree of negotiation, and whose motivations are directly related to their employers- needs, as well as their own career ambitions. We argue that while increasing WBL participation, and use of SSD are both accepted as being of strategic importance to the HE agenda, the use of WBL SSD is rarely examined, and lessons can be learned from the comparison of SSD from a range of WBL programmes, and increased visibility of this type of data will provide insight into ways to improve and develop this type of delivery. The key themes that emerged from the analysis of the interview data were: learners profiles and needs, employers drivers, academic staff drivers, organizational approach, tools for collecting data and visibility of findings. The paper concludes with observations on best practice in the collection, analysis and use of WBL SSD, thus offering recommendations for both academic managers and practitioners.

Keywords: Student satisfaction data, work based learning, employer engagement, NSS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488

8157 An Improved Adaptive Dot-Shape Beamforming Algorithm Research on Frequency Diverse Array

Authors: Yanping Liao, Zenan Wu, Ruigang Zhao

Abstract:

Frequency diverse array (FDA) beamforming is a technology developed in recent years, and its antenna pattern has a unique angle-distance-dependent characteristic. However, the beam is always required to have strong concentration, high resolution and low sidelobe level to form the point-to-point interference in the concentrated set. In order to eliminate the angle-distance coupling of the traditional FDA and to make the beam energy more concentrated, this paper adopts a multi-carrier FDA structure based on proposed power exponential frequency offset to improve the array structure and frequency offset of the traditional FDA. The simulation results show that the beam pattern of the array can form a dot-shape beam with more concentrated energy, and its resolution and sidelobe level performance are improved. However, the covariance matrix of the signal in the traditional adaptive beamforming algorithm is estimated by the finite-time snapshot data. When the number of snapshots is limited, the algorithm has an underestimation problem, which leads to the estimation error of the covariance matrix to cause beam distortion, so that the output pattern cannot form a dot-shape beam. And it also has main lobe deviation and high sidelobe level problems in the case of limited snapshot. Aiming at these problems, an adaptive beamforming technique based on exponential correction for multi-carrier FDA is proposed to improve beamforming robustness. The steps are as follows: first, the beamforming of the multi-carrier FDA is formed under linear constrained minimum variance (LCMV) criteria. Then the eigenvalue decomposition of the covariance matrix is performed to obtain the diagonal matrix composed of the interference subspace, the noise subspace and the corresponding eigenvalues. Finally, the correction index is introduced to exponentially correct the small eigenvalues of the noise subspace, improve the divergence of small eigenvalues in the noise subspace, and improve the performance of beamforming. The theoretical analysis and simulation results show that the proposed algorithm can make the multi-carrier FDA form a dot-shape beam at limited snapshots, reduce the sidelobe level, improve the robustness of beamforming, and have better performance.

Keywords: Multi-carrier frequency diverse array, adaptive beamforming, correction index, limited snapshot, robust.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 665

8156 Application of a Similarity Measure for Graphs to Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian, Max Mühlhauser

Abstract:

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1884

8155 A Consistency Protocol Multi-Layer for Replicas Management in Large Scale Systems

Authors: Ghalem Belalem, Yahya Slimani

Abstract:

Large scale systems such as computational Grid is a distributed computing infrastructure that can provide globally available network resources. The evolution of information processing systems in Data Grid is characterized by a strong decentralization of data in several fields whose objective is to ensure the availability and the reliability of the data in the reason to provide a fault tolerance and scalability, which cannot be possible only with the use of the techniques of replication. Unfortunately the use of these techniques has a height cost, because it is necessary to maintain consistency between the distributed data. Nevertheless, to agree to live with certain imperfections can improve the performance of the system by improving competition. In this paper, we propose a multi-layer protocol combining the pessimistic and optimistic approaches conceived for the data consistency maintenance in large scale systems. Our approach is based on a hierarchical representation model with tree layers, whose objective is with double vocation, because it initially makes it possible to reduce response times compared to completely pessimistic approach and it the second time to improve the quality of service compared to an optimistic approach.

Keywords: Data Grid, replication, consistency, optimistic approach, pessimistic approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568

8154 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5421

8153 The Extension of Monomeric Computational Results to Polymeric Measurable Properties: An Introductory Computational Chemistry Experiment

Authors: Zhao Jing, Bai Yongqing, Shi Qiaofang, Zang Yang, Zhang Huaihao

Abstract:

Advances in software technology enable the computational chemistry to be commonly applied in various research fields, especially in pedagogy. Thus, in order to expand and improve experimental instructions of computational chemistry for undergraduates, we designed an introductory experiment—research on acrylamide molecular structure and physicochemical properties. Initially, students construct molecular models of acrylamide and polyacrylamide in Gaussian and Materials Studio software respectively. Then, the infrared spectral data, atomic charge and molecular orbitals of acrylamide as well as solvation effect of polyacrylamide are calculated to predict their physicochemical performance. At last, rheological experiments are used to validate these predictions. Through the combination of molecular simulation (performed on Gaussian, Materials Studio) with experimental verification (rheology experiment), learners have deeply comprehended the chemical nature of acrylamide and polyacrylamide, achieving good learning outcomes.

Keywords: Upper-division undergraduate, computer-based learning, laboratory instruction, amides, molecular modeling, spectroscopy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 347

8152 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data

Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara

Abstract:

This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.

Keywords: Text mining, sequential mining, time constraints, daily business reports.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480

8151 Seismic Base Shear Force Depending on Building Fundamental Period and Site Conditions: Deterministic Formulation and Probabilistic Analysis

Authors: S. Dorbani, M. Badaoui, D. Benouar

Abstract:

The aim of this paper is to investigate the effect of the building fundamental period of reinforced concrete buildings of (6, 9, and 12-storey), with different floor plans: Symmetric, mono-symmetric, and unsymmetric. These structures are erected at different epicentral distances. Using the Boumerdes, Algeria (2003) earthquake data, we focused primarily on the establishment of the deterministic formulation linking the base shear force to two parameters: The first one is the fundamental period that represents the numerical fingerprint of the structure, and the second one is the epicentral distance used to represent the impact of the earthquake on this force. In a second step, with a view to highlight the effect of uncertainty in these parameters on the analyzed response, these parameters are modeled as random variables with a log-normal distribution. The variability of the coefficients of variation of the chosen uncertain parameters, on the statistics on the seismic base shear force, showed that the effect of uncertainty on fundamental period on this force statistics is low compared to the epicentral distance uncertainty influence.

Keywords: Base shear force, fundamental period, epicentral distance, uncertainty, lognormal variable, statistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1290

8150 A 3.125Gb/s Clock and Data Recovery Circuit Using 1/4-Rate Technique

Authors: Il-Do Jeong, Hang-Geun Jeong

Abstract:

This paper describes the design and fabrication of a clock and data recovery circuit (CDR). We propose a new clock and data recovery which is based on a 1/4-rate frequency detector (QRFD). The proposed frequency detector helps reduce the VCO frequency and is thus advantageous for high speed application. The proposed frequency detector can achieve low jitter operation and extend the pull-in range without using the reference clock. The proposed CDR was implemented using a 1/4-rate bang-bang type phase detector (PD) and a ring voltage controlled oscillator (VCO). The CDR circuit has been fabricated in a standard 0.18 CMOS technology. It occupies an active area of 1 x 1 and consumes 90 mW from a single 1.8V supply.

Keywords: Clock and data recovery, 1/4-rate frequency detector, 1/4-rate phase detector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2912

8149 Malaysian Multi-Ethnic Discrimination Scale: Preliminary Factor and Psychometric Analysis

Authors: Chua Bee Seok, Shamsul Amri Baharuddin, Rosnah Ismail, Ferlis Bahari, Jasmine Adela Mutang, Lailawati Madlan, Asong Joseph

Abstract:

The aims of this study were to determine the factor structure and psychometric properties (i.e., reliability and convergent validity) of the Malaysian Multi-Ethnic Discrimination Scale (MMEDS). It consists of 71-items measure experience, strategies used and consequences of ethnic discrimination. A sample of 649 university students from one of the higher education institution in Malaysia was asked to complete MMEDS, as well as Perceived Ethnic and Racial Discrimination. The exploratory factor analysis on ethnic discrimination experience extracted two factors labeled ‘unfair treatment’ (15 items) and ‘Denial of the ethnic right’ (12 items) which accounted for 60.92% of the total variance. The two sub scales demonstrated clear reliability with internal consistency above .70. The convergent validity of the Scale was supported by an expected pattern of correlations (positive and significant correlation) between the score of unfair treatment and denial of the ethnic right and the score of Perceived Ethnic and Racial Discrimination by Peers Scale. The results suggest that the MMEDS is a reliable and valid measure. However, further studies need to be carried out in other groups of sample as to validate the Scale.

Keywords: Factor structure, psychometric properties, exploratory factor analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2483

8148 Very High Speed Data Driven Dynamic NAND Gate at 22nm High K Metal Gate Strained Silicon Technology Node

Authors: Shobha Sharma, Amita Dev

Abstract:

Data driven dynamic logic is the high speed dynamic circuit with low area. The clock of the dynamic circuit is removed and data drives the circuit instead of clock for precharging purpose. This data driven dynamic nand gate is given static forward substrate biasing of Vsupply/2 as well as the substrate bias is connected to the input data, resulting in dynamic substrate bias. The dynamic substrate bias gives the shortest propagation delay with a penalty on the power dissipation. Propagation delay is reduced by 77.8% compared to the normal reverse substrate bias Data driven dynamic nand. Also dynamic substrate biased D3nand’s propagation delay is reduced by 31.26% compared to data driven dynamic nand gate with static forward substrate biasing of Vdd/2. This data driven dynamic nand gate with dynamic body biasing gives us the highest speed with no area penalty and finds its applications where power penalty is acceptable. Also combination of Dynamic and static Forward body bias can be used with reduced propagation delay compared to static forward biased circuit and with comparable increase in an average power. The simulations were done on hspice simulator with 22nm High-k metal gate strained Si technology HP models of Arizona State University, USA.

Keywords: Data driven nand gate, dynamic substrate biasing, nand gate, static substrate biasing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1611

8147 A Comparative Study on the Performance of Viscous and Friction Dampers under Seismic Excitation

Authors: Apetsi K. Ampiah, Zhao Xin

Abstract:

Earthquakes over the years have been known to cause devastating damage on buildings and induced huge loss on human life and properties. It is for this reason that engineers have devised means of protecting buildings and thus protecting human life. Since the invention of devices such as the viscous and friction dampers, scientists/researchers have been able to incorporate these devices into buildings and other engineering structures. The viscous damper is a hydraulic device which dissipates the seismic forces by pushing fluid through an orifice, producing a damping pressure which creates a force. In the friction damper, the force is mainly resisted by converting the kinetic energy into heat by friction. Devices such as viscous and friction dampers are able to absorb almost all the earthquake energy, allowing the structure to remain undamaged (or with some amount of damage) and ready for immediate reuse (with some repair works). Comparing these two devices presents the engineer with adequate information on the merits and demerits of these devices and in which circumstances their use would be highly favorable. This paper examines the performance of both viscous and friction dampers under different ground motions. A two-storey frame installed with both devices under investigation are modeled in commercial computer software and analyzed under different ground motions. The results of the performance of the structure are then tabulated and compared. Also included in this study is the ease of installation and maintenance of these devices.

Keywords: Friction damper, seismic, slip load, viscous damper.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703

8146 Soft Computing based Retrieval System for Medical Applications

Authors: Pardeep Singh, Sanjay Sharma

Abstract:

With increasing data in medical databases, medical data retrieval is growing in popularity. Some of this analysis including inducing propositional rules from databases using many soft techniques, and then using these rules in an expert system. Diagnostic rules and information on features are extracted from clinical databases on diseases of congenital anomaly. This paper explain the latest soft computing techniques and some of the adaptive techniques encompasses an extensive group of methods that have been applied in the medical domain and that are used for the discovery of data dependencies, importance of features, patterns in sample data, and feature space dimensionality reduction. These approaches pave the way for new and interesting avenues of research in medical imaging and represent an important challenge for researchers.

Keywords: CBIR, GA, Rough sets, CBMIR, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1725

8145 Rigorous Electromagnetic Model of Fourier Transform Infrared (FT-IR) Spectroscopic Imaging Applied to Automated Histology of Prostate Tissue Specimens

Authors: Rohith K Reddy, David Mayerich, Michael Walsh, P Scott Carney, Rohit Bhargava

Abstract:

Fourier transform infrared (FT-IR) spectroscopic imaging is an emerging technique that provides both chemically and spatially resolved information. The rich chemical content of data may be utilized for computer-aided determinations of structure and pathologic state (cancer diagnosis) in histological tissue sections for prostate cancer. FT-IR spectroscopic imaging of prostate tissue has shown that tissue type (histological) classification can be performed to a high degree of accuracy [1] and cancer diagnosis can be performed with an accuracy of about 80% [2] on a microscopic (≈ 6μm) length scale. In performing these analyses, it has been observed that there is large variability (more than 60%) between spectra from different points on tissue that is expected to consist of the same essential chemical constituents. Spectra at the edges of tissues are characteristically and consistently different from chemically similar tissue in the middle of the same sample. Here, we explain these differences using a rigorous electromagnetic model for light-sample interaction. Spectra from FT-IR spectroscopic imaging of chemically heterogeneous samples are different from bulk spectra of individual chemical constituents of the sample. This is because spectra not only depend on chemistry, but also on the shape of the sample. Using coupled wave analysis, we characterize and quantify the nature of spectral distortions at the edges of tissues. Furthermore, we present a method of performing histological classification of tissue samples. Since the mid-infrared spectrum is typically assumed to be a quantitative measure of chemical composition, classification results can vary widely due to spectral distortions. However, we demonstrate that the selection of localized metrics based on chemical information can make our data robust to the spectral distortions caused by scattering at the tissue boundary.

Keywords: Infrared, Spectroscopy, Imaging, Tissue classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1627

8144 NSBS: Design of a Network Storage Backup System

Authors: Xinyan Zhang, Zhipeng Tan, Shan Fan

Abstract:

The first layer of defense against data loss is the backup data. This paper implements an agent-based network backup system used the backup, server-storage and server-backup agent these tripartite construction, and the snapshot and hierarchical index are used in the NSBS. It realizes the control command and data flow separation, balances the system load, thereby improving efficiency of the system backup and recovery. The test results show the agent-based network backup system can effectively improve the task-based concurrency, reasonably allocate network bandwidth, the system backup performance loss costs smaller and improves data recovery efficiency by 20%.

Keywords: Agent, network backup system, three architecture model, NSBS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2219

8143 Estimating the Life-Distribution Parameters of Weibull-Life PV Systems Utilizing Non-Parametric Analysis

Authors: Saleem Z. Ramadan

Abstract:

In this paper, a model is proposed to determine the life distribution parameters of the useful life region for the PV system utilizing a combination of non-parametric and linear regression analysis for the failure data of these systems. Results showed that this method is dependable for analyzing failure time data for such reliable systems when the data is scarce.

Keywords: Masking, Bathtub model, reliability, non-parametric analysis, useful life.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1838

8142 Gas-Liquid Flow on Smooth and Textured Inclined Planes

Authors: J.J. Cooke, S. Gu, L.M. Armstrong, K.H. Luo

Abstract:

Carbon Capture & Storage (CCS) is one of the various methods that can be used to reduce the carbon footprint of the energy sector. This paper focuses on the absorption of CO2 from flue gas using packed columns, whose efficiency is highly dependent on the structure of the liquid films within the column. To study the characteristics of liquid films a CFD solver, OpenFOAM is utilised to solve two-phase, isothermal film flow using the volume-of-fluid (VOF) method. The model was validated using existing experimental data and the Nusselt theory. It was found that smaller plate inclination angles, with respect to the horizontal plane, resulted in larger wetted areas on smooth plates. However, only a slight improvement in the wetted area was observed. Simulations were also performed using a ridged plate and it was observed that these surface textures significantly increase the wetted area of the plate. This was mainly attributed to the channelling effect of the ridges, which helped to oppose the surface tension forces trying to minimise the surface area. Rivulet formations on the ridged plate were also flattened out and spread across a larger proportion of the plate width.

Keywords: CCS, liquid film flow, packed columns, wetted area

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2095

8141 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489

8140 An Anomaly Detection Approach to Detect Unexpected Faults in Recordings from Test Drives

Authors: Andreas Theissler, Ian Dear

Abstract:

In the automotive industry test drives are being conducted during the development of new vehicle models or as a part of quality assurance of series-production vehicles. The communication on the in-vehicle network, data from external sensors, or internal data from the electronic control units is recorded by automotive data loggers during the test drives. The recordings are used for fault analysis. Since the resulting data volume is tremendous, manually analysing each recording in great detail is not feasible. This paper proposes to use machine learning to support domainexperts by preventing them from contemplating irrelevant data and rather pointing them to the relevant parts in the recordings. The underlying idea is to learn the normal behaviour from available recordings, i.e. a training set, and then to autonomously detect unexpected deviations and report them as anomalies. The one-class support vector machine “support vector data description” is utilised to calculate distances of feature vectors. SVDDSUBSEQ is proposed as a novel approach, allowing to classify subsequences in multivariate time series data. The approach allows to detect unexpected faults without modelling effort as is shown with experimental results on recordings from test drives.

Keywords: Anomaly detection, fault detection, test drive analysis, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2468

8139 Decision Tree for Competing Risks Survival Probability in Breast Cancer Study

Authors: N. A. Ibrahim, A. Kudus, I. Daud, M. R. Abu Bakar

Abstract:

Competing risks survival data that comprises of more than one type of event has been used in many applications, and one of these is in clinical study (e.g. in breast cancer study). The decision tree method can be extended to competing risks survival data by modifying the split function so as to accommodate two or more risks which might be dependent on each other. Recently, researchers have constructed some decision trees for recurrent survival time data using frailty and marginal modelling. We further extended the method for the case of competing risks. In this paper, we developed the decision tree method for competing risks survival time data based on proportional hazards for subdistribution of competing risks. In particular, we grow a tree by using deviance statistic. The application of breast cancer data is presented. Finally, to investigate the performance of the proposed method, simulation studies on identification of true group of observations were executed.

Keywords: Competing risks, Decision tree, Simulation, Subdistribution Proportional Hazard.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2367

8138 Spatially Random Sampling for Retail Food Risk Factors Study

Authors: Guilan Huang

Abstract:

In 2013 and 2014, the U.S. Food and Drug Administration (FDA) collected data from selected fast food restaurants and full service restaurants for tracking changes in the occurrence of foodborne illness risk factors. This paper discussed how we customized spatial random sampling method by considering financial position and availability of FDA resources, and how we enriched restaurants data with location. Location information of restaurants provides opportunity for quantitatively determining random sampling within non-government units (e.g.: 240 kilometers around each data-collector). Spatial analysis also could optimize data-collectors’ work plans and resource allocation. Spatial analytic and processing platform helped us handling the spatial random sampling challenges. Our method fits in FDA’s ability to pinpoint features of foodservice establishments, and reduced both time and expense on data collection.

Keywords: Geospatial technology, restaurant, retail food risk factors study, spatial random sampling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453

8137 Anisotropic Total Fractional Order Variation Model in Seismic Data Denoising

Authors: Jianwei Ma, Diriba Gemechu

Abstract:

In seismic data processing, attenuation of random noise is the basic step to improve quality of data for further application of seismic data in exploration and development in different gas and oil industries. The signal-to-noise ratio of the data also highly determines quality of seismic data. This factor affects the reliability as well as the accuracy of seismic signal during interpretation for different purposes in different companies. To use seismic data for further application and interpretation, we need to improve the signal-to-noise ration while attenuating random noise effectively. To improve the signal-to-noise ration and attenuating seismic random noise by preserving important features and information about seismic signals, we introduce the concept of anisotropic total fractional order denoising algorithm. The anisotropic total fractional order variation model defined in fractional order bounded variation is proposed as a regularization in seismic denoising. The split Bregman algorithm is employed to solve the minimization problem of the anisotropic total fractional order variation model and the corresponding denoising algorithm for the proposed method is derived. We test the effectiveness of theproposed method for synthetic and real seismic data sets and the denoised result is compared with F-X deconvolution and non-local means denoising algorithm.

Keywords: Anisotropic total fractional order variation, fractional order bounded variation, seismic random noise attenuation, Split Bregman Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1005

8136 Improving Academic Performance Prediction using Voting Technique in Data Mining

Authors: Ikmal Hisyam Mohamad Paris, Lilly Suriani Affendey, Norwati Mustapha

Abstract:

In this paper we compare the accuracy of data mining methods to classifying students in order to predicting student-s class grade. These predictions are more useful for identifying weak students and assisting management to take remedial measures at early stages to produce excellent graduate that will graduate at least with second class upper. Firstly we examine single classifiers accuracy on our data set and choose the best one and then ensembles it with a weak classifier to produce simple voting method. We present results show that combining different classifiers outperformed other single classifiers for predicting student performance.

Keywords: Classification, Data Mining, Prediction, Combination of Multiple Classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2746

8135 Extracting Terrain Points from Airborne Laser Scanning Data in Densely Forested Areas

Authors: Ziad Abdeldayem, Jakub Markiewicz, Kunal Kansara, Laura Edwards

Abstract:

Airborne Laser Scanning (ALS) is one of the main technologies for generating high-resolution digital terrain models (DTMs). DTMs are crucial to several applications, such as topographic mapping, flood zone delineation, geographic information systems (GIS), hydrological modelling, spatial analysis, etc. Laser scanning system generates irregularly spaced three-dimensional cloud of points. Raw ALS data are mainly ground points (that represent the bare earth) and non-ground points (that represent buildings, trees, cars, etc.). Removing all the non-ground points from the raw data is referred to as filtering. Filtering heavily forested areas is considered a difficult and challenging task as the canopy stops laser pulses from reaching the terrain surface. This research presents an approach for removing non-ground points from raw ALS data in densely forested areas. Smoothing splines are exploited to interpolate and fit the noisy ALS data. The presented filter utilizes a weight function to allocate weights for each point of the data. Furthermore, unlike most of the methods, the presented filtering algorithm is designed to be automatic. Three different forested areas in the United Kingdom are used to assess the performance of the algorithm. The results show that the generated DTMs from the filtered data are accurate (when compared against reference terrain data) and the performance of the method is stable for all the heavily forested data samples. The average root mean square error (RMSE) value is 0.35 m.

Keywords: Airborne laser scanning, digital terrain models, filtering, forested areas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 708

8134 Blockchain for IoT Security and Privacy in Healthcare Sector

Authors: Umair Shafique, Hafiz Usman Zia, Fiaz Majeed, Samina Naz, Javeria Ahmed, Maleeha Zainab

Abstract:

The Internet of Things (IoT) has become a hot topic for the last couple of years. This innovative technology has shown promising progress in various areas and the world has witnessed exponential growth in multiple application domains. Researchers are working to investigate its aptitudes to get the best from it by harnessing its true potential. But at the same time, IoT networks open up a new aspect of vulnerability and physical threats to data integrity, privacy, and confidentiality. It is due to centralized control, data silos approach for handling information, and a lack of standardization in the IoT networks. As we know, blockchain is a new technology that involves creating secure distributed ledgers to store and communicate data. Some of the benefits include resiliency, integrity, anonymity, decentralization, and autonomous control. The potential for blockchain technology to provide the key to managing and controlling IoT has created a new wave of excitement around the idea of putting that data back into the hands of the end-users. In this manuscript, we have proposed a model that combines blockchain and IoT networks to address potential security and privacy issues in the healthcare domain and how various stakeholders will interact with the system.

Keywords: Internet of Things, IoT, blockchain, data integrity, authentication, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 390

8133 Multidimensional Performance Management

Authors: David Wiese

Abstract:

In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.

Keywords: Data Warehousing, OLAP, Multidimensional Navigation, Performance Diagnosis, Performance Management, Performance Tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2125

8132 A New Algorithm for Enhanced Robustness of Copyright Mark

Authors: Harsh Vikram Singh, S. P. Singh, Anand Mohan

Abstract:

This paper discusses a new heavy tailed distribution based data hiding into discrete cosine transform (DCT) coefficients of image, which provides statistical security as well as robustness against steganalysis attacks. Unlike other data hiding algorithms, the proposed technique does not introduce much effect in the stegoimage-s DCT coefficient probability plots, thus making the presence of hidden data statistically undetectable. In addition the proposed method does not compromise on hiding capacity. When compared to the generic block DCT based data-hiding scheme, our method found more robust against a variety of image manipulating attacks such as filtering, blurring, JPEG compression etc.

Keywords: Information Security, Robust Steganography, Steganalysis, Pareto Probability Distribution function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790