Search results for: biological data

6907 A Weighted Sum Technique for the Joint Optimization of Performance and Power Consumption in Data Centers

Abstract:

With data centers, end-users can realize the pervasiveness of services that will be one day the cornerstone of our lives. However, data centers are often classified as computing systems that consume the most amounts of power. To circumvent such a problem, we propose a self-adaptive weighted sum methodology that jointly optimizes the performance and power consumption of any given data center. Compared to traditional methodologies for multi-objective optimization problems, the proposed self-adaptive weighted sum technique does not rely on a systematical change of weights during the optimization procedure. The proposed technique is compared with the greedy and LR heuristics for large-scale problems, and the optimal solution for small-scale problems implemented in LINDO. the experimental results revealed that the proposed selfadaptive weighted sum technique outperforms both of the heuristics and projects a competitive performance compared to the optimal solution.

Keywords: Meta-heuristics, distributed systems, adaptive methods, resource allocation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1816

6906 Image-Based (RBG) Technique for Estimating Phosphorus Levels of Crops

Authors: M. M. Ali, Ahmed Al-Ani, Derek Eamus, Daniel K. Y. Tan

Abstract:

In this glasshouse study, we developed a new imagebased non-destructive technique for detecting leaf P status of different crops such as cotton, tomato and lettuce. The plants were grown on a nutrient solution containing different P concentrations, e.g. 0%, 50% and 100% of recommended P concentration (P0 = no P, L; P1 = 2.5 mL 10 L-1 of P and P2 = 5 mL 10 L-1 of P). After 7 weeks of treatment, the plants were harvested and data on leaf P contents were collected using the standard destructive laboratory method and at the same time leaf images were collected by a handheld crop image sensor. We calculated leaf area, leaf perimeter and RGB (red, green and blue) values of these images. These data were further used in linear discriminant analysis (LDA) to estimate leaf P contents, which successfully classified these plants on the basis of leaf P contents. The data indicated that P deficiency in crop plants can be predicted using leaf image and morphological data. Our proposed nondestructive imaging method is precise in estimating P requirements of different crop species.

Keywords: Image-based techniques, leaf area, leaf P contents, linear discriminant analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633

6905 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: Bootstrap, diabetes risk groups, error rate, k-nearest neighbors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1987

6904 Localizing and Experiencing Electronic Questionnaires in an Educational Web Site

Authors: Theodore H. Kaskalis

Abstract:

One of the main research methods in humanistic studies is the collection and process of data through questionnaires. This paper reports our experiences of localizing and adapting the phpESP package of electronic surveys, which led to a friendly on-line questionnaire environment offered through our department web site. After presenting the characteristics of this environment, we identify the expected benefits and present a questionnaire carried out through both the traditional and electronic way. We present the respondents' feedback and then we report the researchers' opinions.Finally, we propose ideas we intend to implement in order to further assist and enhance the research based on this web accessed,electronic questionnaire environment.

Keywords: Electronic questionnaires, Computer assisted webinterviewing, Survey data collection, Survey data visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1266

6903 Design and Implementation of Security Middleware for Data Warehouse Signature Framework

Authors: Mayada AlMeghari

Abstract:

Recently, grid middlewares have provided large integrated use of network resources as the shared data and the CPU to become a virtual supercomputer. In this work, we present the design and implementation of the middleware for Data Warehouse Signature (DWS) Framework. The aim of using the middleware in the proposed DWS framework is to achieve the high performance by the parallel computing. This middleware is developed on Alchemi.Net framework to increase the security among the network nodes through the authentication and group-key distribution model. This model achieves the key security and prevents any intermediate attacks in the middleware. This paper presents the flow process structures of the middleware design. In addition, the paper ensures the implementation of security for DWS middleware enhancement with the authentication and group-key distribution model. Finally, from the analysis of other middleware approaches, the developed middleware of DWS framework is the optimal solution of a complete covering of security issues.

Keywords: Middleware, parallel computing, data warehouse, security, group-key, high performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 298

6902 A Review of the Characteristics and Optimization of Optical Properties of Zirconia Ceramics for Aesthetic Dental Restorations

Authors: R. A. Shahmiri, O. C. Standard, J. N. Hart, C. C. Sorrell

Abstract:

The ceramic yttria-stabilized tetragonal zirconia polycrystal (Y-TZP) has been used as a dental biomaterial for several decades. The strength and toughness of this material can be accounted for by its toughening mechanisms, which include transformation toughening, crack deflection, zone shielding, contact shielding, and crack bridging. Prevention of crack propagation is of critical importance in high-fatigue situations, such as those encountered in mastication and para-function. However, the poor translucence of Y-TZP in polycrystalline form is such that it may not meet the aesthetic requirements due to its white/grey appearance. To improve the optical properties of Y-TZP, more detailed study of the optical properties is required; in particular, precise evaluation of the refractive index, absorption coefficient, and scattering coefficient are necessary. The measurement of the optical parameters has been based on the assumption that light scattered from biological media is isotropically distributed over all angles. In fact, the optical behavior of real biological materials depends on the angular scattering of light due to the anisotropic nature of the materials. The purpose of the present work is to evaluate the optical properties (including color, opacity/translucence, scattering, and fluorescence) of zirconia dental ceramics and their control through modification of the chemical composition, phase composition, and surface microstructure.

Keywords: Optical properties, opacity/translucence, scattering, fluorescence, chemical composition, phase composition, surface microstructure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479

6901 Re-Optimization MVPP Using Common Subexpression for Materialized View Selection

Authors: Boontita Suchyukorn, Raweewan Auepanwiriyakul

Abstract:

A Data Warehouses is a repository of information integrated from source data. Information stored in data warehouse is the form of materialized in order to provide the better performance for answering the queries. Deciding which appropriated views to be materialized is one of important problem. In order to achieve this requirement, the constructing search space close to optimal is a necessary task. It will provide effective result for selecting view to be materialized. In this paper we have proposed an approach to reoptimize Multiple View Processing Plan (MVPP) by using global common subexpressions. The merged queries which have query processing cost not close to optimal would be rewritten. The experiment shows that our approach can help to improve the total query processing cost of MVPP and sum of query processing cost and materialized view maintenance cost is reduced as well after views are selected to be materialized.

Keywords: Data Warehouse, materialized views, query rewriting, common subexpressions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649

6900 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: Clustering, k-means, categorical datasets, pattern recognition, unsupervised learning, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3513

6899 A New Direct Updating Method for Undamped Structural Systems

Authors: Yongxin Yuan, Jiashang Jiang

Abstract:

A new numerical method for simultaneously updating mass and stiffness matrices based on incomplete modal measured data is presented. By using the Kronecker product, all the variables that are to be modified can be found out and then can be updated directly. The optimal approximation mass matrix and stiffness matrix which satisfy the required eigenvalue equation and orthogonality condition are found under the Frobenius norm sense. The physical configuration of the analytical model is preserved and the updated model will exactly reproduce the modal measured data. The numerical example seems to indicate that the method is quite accurate and efficient.

Keywords: Finite element model, model updating, modal data, optimal approximation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459

6898 Positioning a Southern Inclusive Framework Embedded in the Social Model of Disability Theory Contextualized for Guyana

Authors: Lidon Lashley

Abstract:

This paper presents how the social model of disability can be used to reshape inclusive education practices in Guyana. Inclusive education in Guyana is metamorphosizing but still firmly held in the tenets of the Medical Model of Disability which influences the experiences of children with Special Education Needs and/or Disabilities (SEN/D). An ethnographic approach to data gathering was employed in this study. Qualitative data were gathered from the voices of children with and without SEN/D as well as their mainstream teachers to present the interplay of discourses and subjectivities in the situation. The data were analyzed using Adele Clarke's situational analysis. The data suggest that it is possible but will be challenging to fully contextualize and adopt Loreman's synthesis and Booths and Ainscow's Index in the two mainstream schools studied. In addition, the data paved the way for the presentation of the 'Southern Inclusive Education Framework for Guyana' and its support tool 'The Inclusive Checker created for Southern mainstream primary classrooms'.

Keywords: Social Model of Disability, Medical Model of Disability, subjectivities, metamorphosis, special education needs, postcolonial Guyana, Quasi-inclusion practices, Guyanese cultural challenges, mainstream primary schools, Loreman's Synthesis, Booths and Ainscow's Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 542

6897 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout

Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.

Keywords: Metabolic network, gene knockout, flux balance analysis, microarray data, integration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 978

6896 Using Genetic Programming to Evolve a Team of Data Classifiers

Authors: Gregor A. Morrison, Dominic P. Searson, Mark J. Willis

Abstract:

The purpose of this paper is to demonstrate the ability of a genetic programming (GP) algorithm to evolve a team of data classification models. The GP algorithm used in this work is “multigene" in nature, i.e. there are multiple tree structures (genes) that are used to represent team members. Each team member assigns a data sample to one of a fixed set of output classes. A majority vote, determined using the mode (highest occurrence) of classes predicted by the individual genes, is used to determine the final class prediction. The algorithm is tested on a binary classification problem. For the case study investigated, compact classification models are obtained with comparable accuracy to alternative approaches.

Keywords: classification, genetic programming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1759

6895 Parallel Computation of Data Summation for Multiple Problem Spaces on Partitioned Optical Passive Stars Network

Authors: Khin Thida Latt, Mineo Kaneko, Yoichi Shinoda

Abstract:

In Partitioned Optical Passive Stars POPS network,nodes and couplers become free after slot to slot in some computation.It is necessary to efficiently utilize free couplers and nodes to be cost effective. Improving parallelism, we present the fast data summation algorithm for multiple problem spaces on P OP S(g, g) with smaller number of nodes for the case of d =n = g. For the case of d >n > g, we simulate the calculation of large number of data items dedicated to larger system with many nodes on smaller system with smaller number of nodes. The algorithm is faster than the best know algorithm and using smaller number of nodes and groups make the system low cost and practical.

Keywords: Partitioned optical passive stars network, parallelcomputing, optical computing, data sum

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1159

6894 Customer Satisfaction and Effective HRM Policies: Customer and Employee Satisfaction

Authors: S. Anastasiou, C. Nathanailides

Abstract:

The purpose of this study is to examine the possible link between employee and customer satisfaction. The service provided by employees, help to build a good relationship with customers and can help at increasing their loyalty. Published data for job satisfaction and indicators of customer services of banks were gathered from relevant published works which included data from five different countries. The scores of customers and employees satisfaction of the different published works were transformed and normalized to the scale of 1 to 100. The data were analyzed and a regression analysis of the two parameters was used to describe the link between employee’s satisfaction and customer’s satisfaction. Assuming that employee satisfaction has a significant influence on customer’s service and the resulting customer satisfaction, the reviewed data indicate that employee’s satisfaction contributes significantly on the level of customer satisfaction in the Banking sector. There was a significant correlation between the two parameters (Pearson correlation R2=0.52 P<0.05). The reviewed data indicate that published data support the hypothesis that practical evidence link these two parameters. During the recent global economic crisis, the financial services sector was affected severely and job security, remuneration and recruitment of personnel of banks was in many countries, including Greece, significantly reduced. Nevertheless, modern organizations should always consider their personnel as a capital, which is the driving force for success in the future. Appropriate human resource management policies can increase the level of job satisfaction of the personnel with positive consequences for the level of customer’s satisfaction.

Keywords: Job satisfaction, job performance, customer service, banks, human resources management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5090

6893 A New Approach of Fuzzy Methods for Evaluating of Hydrological Data

Authors: Nasser Shamskia, Seyyed Habib Rahmati, Hassan Haleh , Seyyedeh Hoda Rahmati

Abstract:

The main criteria of designing in the most hydraulic constructions essentially are based on runoff or discharge of water. Two of those important criteria are runoff and return period. Mostly, these measures are calculated or estimated by stochastic data. Another feature in hydrological data is their impreciseness. Therefore, in order to deal with uncertainty and impreciseness, based on Buckley-s estimation method, a new fuzzy method of evaluating hydrological measures are developed. The method introduces triangular shape fuzzy numbers for different measures in which both of the uncertainty and impreciseness concepts are considered. Besides, since another important consideration in most of the hydrological studies is comparison of a measure during different months or years, a new fuzzy method which is consistent with special form of proposed fuzzy numbers, is also developed. Finally, to illustrate the methods more explicitly, the two algorithms are tested on one simple example and a real case study.

Keywords: Fuzzy Discharge, Fuzzy estimation, Fuzzy ranking method, Hydrological data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1692

6892 A Tree Based Association Rule Approach for XML Data with Semantic Integration

Authors: D. Sasikala, K. Premalatha

Abstract:

The use of eXtensible Markup Language (XML) in web, business and scientific databases lead to the development of methods, techniques and systems to manage and analyze XML data. Semi-structured documents suffer due to its heterogeneity and dimensionality. XML structure and content mining represent convergence for research in semi-structured data and text mining. As the information available on the internet grows drastically, extracting knowledge from XML documents becomes a harder task. Certainly, documents are often so large that the data set returned as answer to a query may also be very big to convey the required information. To improve the query answering, a Semantic Tree Based Association Rule (STAR) mining method is proposed. This method provides intentional information by considering the structure, content and the semantics of the content. The method is applied on Reuter’s dataset and the results show that the proposed method outperforms well.

Keywords: Semi--structured Document, Tree based Association Rule (TAR), Semantic Association Rule Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2324

6891 Principle Knowledge of Integrated Pest Management Adopting Cotton Cultivators in Irrigated and Rainfed Conditions: A Critical Analysis

Authors: B. Sudhakar, K. A. Ponnusamy

Abstract:

In India cotton was the major commercial crop and cultivating all the states. In recent years, area of cotton declined due to pest and disease attack, drought, lower price for the produces etc. The first reason as pest and disease attack will be the challenges and it is of utmost importance that in future the insect problems would have to be tackled through Integrated Pest Management (IPM). The present study deals with principle knowledge of IPM adopting cotton cultivators in irrigated and rainfed conditions. Under irrigated conditions, among cultural practices, all respondents had principle knowledge about growing high yielding and pest resistant hybrids, sowing quality and certified seeds and avoiding cotton ratoon cropping. Regarding mechanical practices all respondents had principle knowledge about collecting and destroying egg, larvae and pupae of pests and removing and destroying pest and disease infected cotton squares, flowers and other shed materials. With regard to biological practices, 93% of them had principle knowledge about spraying neem oil, followed by 82% about tying Trichogramma eggcard. Among chemical practices, more than 90% of the respondents had principle knowledge about of spraying herbicide (96%), identifying ETL (Economic Threshold Level) for cotton pests (94%), and applying safe insecticides (90%). Under rainfed condition, among cultural practices, all respondents had principle knowledge about sowing quality and certified seeds and growing high yielding and pest resistant hybrids seeds. Regarding mechanical practices hundred percentage of the respondents had principle knowledge on the mechanical practices viz., collecting and destroying egg, larvae and pupae of pests and removing and destroying pest and disease infected cotton squares, flowers and other shed materials. With regard to biological practices, 96% of the respondents had correct in principle knowledge about spraying neem oil, followed by 89% about tying Trichogramma eggcard. With regard to chemical practices, more than 90% of the respondents had principle knowledge of applying safe insecticides (95%), avoiding repeated use of the same insecticides (95%), identifying ETL for cotton pests (94%) and applying granular insecticides (90%).

Keywords: Biological practices, chemical practices, cultural practices, mechanical practices, integrated pest management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 972

6890 On-line Control of the Natural and Anthropogenic Safety in Krasnoyarsk Region

Authors: T. Penkova, A. Korobko, V. Nicheporchuk., L. Nozhenkova, A. Metus

Abstract:

This paper presents an approach of on-line control of the state of technosphere and environment objects based on the integration of Data Warehouse, OLAP and Expert systems technologies. It looks at the structure and content of data warehouse that provides consolidation and storage of monitoring data. There is a description of OLAP-models that provide a multidimensional analysis of monitoring data and dynamic analysis of principal parameters of controlled objects. The authors suggest some criteria of emergency risk assessment using expert knowledge about danger levels. It is demonstrated now some of the proposed solutions could be adopted in territorial decision making support systems. Operational control allows authorities to detect threat, prevent natural and anthropogenic emergencies and ensure a comprehensive safety of territory.

Keywords: Decision making support systems, Emergency risk assessment, Natural and anthropogenic safety, On-line control, Territory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1873

6889 A Virtual Grid Based Energy Efficient Data Gathering Scheme for Heterogeneous Sensor Networks

Authors: Siddhartha Chauhan, Nitin Kumar Kotania

Abstract:

Traditional Wireless Sensor Networks (WSNs) generally use static sinks to collect data from the sensor nodes via multiple forwarding. Therefore, network suffers with some problems like long message relay time, bottle neck problem which reduces the performance of the network.

Many approaches have been proposed to prevent this problem with the help of mobile sink to collect the data from the sensor nodes, but these approaches still suffer from the buffer overflow problem due to limited memory size of sensor nodes. This paper proposes an energy efficient scheme for data gathering which overcomes the buffer overflow problem. The proposed scheme creates virtual grid structure of heterogeneous nodes. Scheme has been designed for sensor nodes having variable sensing rate. Every node finds out its buffer overflow time and on the basis of this cluster heads are elected. A controlled traversing approach is used by the proposed scheme in order to transmit data to sink. The effectiveness of the proposed scheme is verified by simulation.

Keywords: Buffer overflow problem, Mobile sink, Virtual grid, Wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805

6888 Spatial Data Science for Data Driven Urban Planning: The Youth Economic Discomfort Index for Rome

Authors: Iacopo Testi, Diego Pajarito, Nicoletta Roberto, Carmen Greco

Abstract:

Today, a consistent segment of the world’s population lives in urban areas, and this proportion will vastly increase in the next decades. Therefore, understanding the key trends in urbanization, likely to unfold over the coming years, is crucial to the implementation of sustainable urban strategies. In parallel, the daily amount of digital data produced will be expanding at an exponential rate during the following years. The analysis of various types of data sets and its derived applications have incredible potential across different crucial sectors such as healthcare, housing, transportation, energy, and education. Nevertheless, in city development, architects and urban planners appear to rely mostly on traditional and analogical techniques of data collection. This paper investigates the prospective of the data science field, appearing to be a formidable resource to assist city managers in identifying strategies to enhance the social, economic, and environmental sustainability of our urban areas. The collection of different new layers of information would definitely enhance planners' capabilities to comprehend more in-depth urban phenomena such as gentrification, land use definition, mobility, or critical infrastructural issues. Specifically, the research results correlate economic, commercial, demographic, and housing data with the purpose of defining the youth economic discomfort index. The statistical composite index provides insights regarding the economic disadvantage of citizens aged between 18 years and 29 years, and results clearly display that central urban zones and more disadvantaged than peripheral ones. The experimental set up selected the city of Rome as the testing ground of the whole investigation. The methodology aims at applying statistical and spatial analysis to construct a composite index supporting informed data-driven decisions for urban planning.

Keywords: Data science, spatial analysis, composite index, Rome, urban planning, youth economic discomfort index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 851

6887 Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Authors: Bharatendra Rai

Abstract:

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

Keywords: Housing data, feature selection, random forest, Boruta algorithm, root mean square error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675

6886 Data Placement in Heterogeneous Storage of Short Videos

Authors: W. Jaipahkdee, C. Srinilta

Abstract:

The overall service performance of I/O intensive system depends mainly on workload on its storage system. In heterogeneous storage environment where storage elements from different vendors with different capacity and performance are put together, workload should be distributed according to storage capability. This paper addresses data placement issue in short video sharing website. Workload contributed by a video is estimated by the number of views and life time span of existing videos in same category. Experiment was conducted on 42,000 video titles in six weeks. Result showed that the proposed algorithm distributed workload and maintained balance better than round robin and random algorithms.

Keywords: data placement, heterogeneous storage system, YouTube, short videos

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465

6885 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Authors: Ghada Badr, Arwa Alturki

Abstract:

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Keywords: Alignment, RNA secondary structure, pairwise, component-based, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 953

6884 Malicious Route Defending Reliable-Data Transmission Scheme for Multi Path Routing in Wireless Network

Authors: S. Raja Ratna, R. Ravi

Abstract:

Securing the confidential data transferred via wireless network remains a challenging problem. It is paramount to ensure that data are accessible only by the legitimate users rather than by the attackers. One of the most serious threats to organization is jamming, which disrupts the communication between any two pairs of nodes. Therefore, designing an attack-defending scheme without any packet loss in data transmission is an important challenge. In this paper, Dependence based Malicious Route Defending DMRD Scheme has been proposed in multi path routing environment to prevent jamming attack. The key idea is to defend the malicious route to ensure perspicuous transmission. This scheme develops a two layered architecture and it operates in two different steps. In the first step, possible routes are captured and their agent dependence values are marked using triple agents. In the second step, the dependence values are compared by performing comparator filtering to detect malicious route as well as to identify a reliable route for secured data transmission. By simulation studies, it is observed that the proposed scheme significantly identifies malicious route by attaining lower delay time and route discovery time; it also achieves higher throughput.

Keywords: Attacker, Dependence, Jamming, Malicious.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1732

6883 EDULOGIC+ - Knowledge Management through Data Analysis in Education

Authors: Alok Sharma, Dr. Harvinder S. Saini, Raviteja Tiruvury

Abstract:

This paper outlines the application of Knowledge Management (KM) principles in the context of Educational institutions. The paper caters to the needs of the engineering institutions for imparting quality education by delineating the instruction delivery process in a highly structured, controlled and quantified manner. This is done using a software tool EDULOGIC+. The central idea has been based on the engineering education pattern in Indian Universities/ Institutions. The data, contents and results produced over contiguous years build the necessary ground for managing the related accumulated knowledge. Application of KM has been explained using certain examples of data analysis and knowledge extraction.

Keywords: Education software system, information system, knowledge management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729

6882 Sparse Unmixing of Hyperspectral Data by Exploiting Joint-Sparsity and Rank-Deficiency

Authors: Fanqiang Kong, Chending Bian

Abstract:

In this work, we exploit two assumed properties of the abundances of the observed signatures (endmembers) in order to reconstruct the abundances from hyperspectral data. Joint-sparsity is the first property of the abundances, which assumes the adjacent pixels can be expressed as different linear combinations of same materials. The second property is rank-deficiency where the number of endmembers participating in hyperspectral data is very small compared with the dimensionality of spectral library, which means that the abundances matrix of the endmembers is a low-rank matrix. These assumptions lead to an optimization problem for the sparse unmixing model that requires minimizing a combined l_2,p-norm and nuclear norm. We propose a variable splitting and augmented Lagrangian algorithm to solve the optimization problem. Experimental evaluation carried out on synthetic and real hyperspectral data shows that the proposed method outperforms the state-of-the-art algorithms with a better spectral unmixing accuracy.

Keywords: Hyperspectral unmixing, joint-sparse, low-rank representation, abundance estimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 753

6881 Application of Post-Stack and Pre-Stack Seismic Inversion for Prediction of Hydrocarbon Reservoirs in a Persian Gulf Gas Field

Authors: Nastaran Moosavi, Mohammad Mokhtari

Abstract:

Seismic inversion is a technique which has been in use for years and its main goal is to estimate and to model physical characteristics of rocks and fluids. Generally, it is a combination of seismic and well-log data. Seismic inversion can be carried out through different methods; we have conducted and compared post-stack and pre- stack seismic inversion methods on real data in one of the fields in the Persian Gulf. Pre-stack seismic inversion can transform seismic data to rock physics such as P-impedance, S-impedance and density. While post- stack seismic inversion can just estimate P-impedance. Then these parameters can be used in reservoir identification. Based on the results of inverting seismic data, a gas reservoir was detected in one of Hydrocarbon oil fields in south of Iran (Persian Gulf). By comparing post stack and pre-stack seismic inversion it can be concluded that the pre-stack seismic inversion provides a more reliable and detailed information for identification and prediction of hydrocarbon reservoirs.

Keywords: Density, P-impedance, S-impedance, post-stack seismic inversion, pre-stack seismic inversion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2202

6880 A Decision Support System for Predicting Hospitalization of Hemodialysis Patients

Authors: Jinn-Yi Yeh, Tai-Hsi Wu

Abstract:

Hemodialysis patients might suffer from unhealthy care behaviors or long-term dialysis treatments. Ultimately they need to be hospitalized. If the hospitalization rate of a hemodialysis center is high, its quality of service would be low. Therefore, how to decrease hospitalization rate is a crucial problem for health care. In this study we combined temporal abstraction with data mining techniques for analyzing the dialysis patients' biochemical data to develop a decision support system. The mined temporal patterns are helpful for clinicians to predict hospitalization of hemodialysis patients and to suggest them some treatments immediately to avoid hospitalization.

Keywords: Hemodialysis, Temporal abstract, Data mining, Healthcare quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714

6879 A Human Activity Recognition System Based On Sensory Data Related to Object Usage

Authors: M. Abdullah-Al-Wadud

Abstract:

Sensor-based Activity Recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.

Keywords: Naïve Bayesian-based classification, Activity recognition, sensor data, object-usage model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1804

6878 Dominating Set Algorithm and Trust Evaluation Scheme for Secured Cluster Formation and Data Transferring

Authors: Y. Harold Robinson, M. Rajaram, E. Golden Julie, S. Balaji

Abstract:

This paper describes the proficient way of choosing the cluster head based on dominating set algorithm in a wireless sensor network (WSN). The algorithm overcomes the energy deterioration problems by this selection process of cluster heads. Clustering algorithms such as LEACH, EEHC and HEED enhance scalability in WSNs. Dominating set algorithm keeps the first node alive longer than the other protocols previously used. As the dominating set of cluster heads are directly connected to each node, the energy of the network is saved by eliminating the intermediate nodes in WSN. Security and trust is pivotal in network messaging. Cluster head is secured with a unique key. The member can only connect with the cluster head if and only if they are secured too. The secured trust model provides security for data transmission in the dominated set network with the group key. The concept can be extended to add a mobile sink for each or for no of clusters to transmit data or messages between cluster heads and to base station. Data security id preferably high and data loss can be prevented. The simulation demonstrates the concept of choosing cluster heads by dominating set algorithm and trust evaluation using DSTE. The research done is rationalized.

Keywords: Wireless Sensor Networks, LEECH, EEHC, HEED, DSTE.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1382