Search results for: data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24739

Search results for: data sets

24529 Effects of Handgrip Isometric Training in Blood Pressure of Patients with Peripheral Artery Disease

Authors: Raphael M. Ritti-Dias, Marilia A. Correia, Wagner J. R. Domingues, Aline C. Palmeira, Paulo Longano, Nelson Wolosker, Lauro C. Vianna, Gabriel G. Cucato

Abstract:

Patients with peripheral arterial disease (PAD) have a high prevalence of hypertension, which contributes to a high risk of acute cardiovascular events and cardiovascular mortality. Strategies to reduce cardiovascular risk of these patients are needed. Meta-analysis studies have shown that isometric handgrip training promotes reductions in clinical blood pressure in normotensive, pre-hypertensive and hypertensive individuals. However, the effect of this exercise training on other cardiovascular function indicators in PAD patients remains unknown. Thus, the aim of this study was to analyze the effects of isometric handgrip training on blood pressure in patients with PAD. In this clinical trial, 28 patients were randomly allocated into two groups: isometric handgrip training (HG) and control (CG). The HG conducted the unilateral handgrip training three days per week (four sets of two minutes, with 30% of maximum voluntary contraction with an interval of four minutes between sets). CG was encouraged to increase their physical activity levels. At baseline and after eight weeks blood pressure and heart rate were obtained. ANOVA two-way for repeated measures with the group (GH and GC) and time (pre- and post-intervention) as factors was performed. After 8 weeks of training there were no significant changes in systolic blood pressure (HG pre 141 ± 24.0 mmHg vs. HG post 142 ± 22.0 mmHg; CG pre 140 ± 22.1 mmHg vs. CG post 146 ± 16.2 mmHg; P=0.18), diastolic blood pressure (HG pre 74 ± 10.4 mmHg vs. HG post 74 ± 11.9 mmHg; CG pre 72 ± 6.9 mmHg vs. CG post 74 ± 8.0 mmHg; P=0.22) and heart rate (HG pre 61 ± 10.5 bpm vs. HG post 62 ± 8.0 bpm; CG pre 64 ± 11.8 bpm vs. CG post 65 ± 13.6 bpm; P=0.81). In conclusion, our preliminary data indicate that isometric handgrip training did not modify blood pressure and heart rate in patients with PAD.

Keywords: blood pressure, exercise, isometric, peripheral artery disease

Procedia PDF Downloads 309
24528 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 311
24527 Recommendations Using Online Water Quality Sensors for Chlorinated Drinking Water Monitoring at Drinking Water Distribution Systems Exposed to Glyphosate

Authors: Angela Maria Fasnacht

Abstract:

Detection of anomalies due to contaminants’ presence, also known as early detection systems in water treatment plants, has become a critical point that deserves an in-depth study for their improvement and adaptation to current requirements. The design of these systems requires a detailed analysis and processing of the data in real-time, so it is necessary to apply various statistical methods appropriate to the data generated, such as Spearman’s Correlation, Factor Analysis, Cross-Correlation, and k-fold Cross-validation. Statistical analysis and methods allow the evaluation of large data sets to model the behavior of variables; in this sense, statistical treatment or analysis could be considered a vital step to be able to develop advanced models focused on machine learning that allows optimized data management in real-time, applied to early detection systems in water treatment processes. These techniques facilitate the development of new technologies used in advanced sensors. In this work, these methods were applied to identify the possible correlations between the measured parameters and the presence of the glyphosate contaminant in the single-pass system. The interaction between the initial concentration of glyphosate and the location of the sensors on the reading of the reported parameters was studied.

Keywords: glyphosate, emergent contaminants, machine learning, probes, sensors, predictive

Procedia PDF Downloads 86
24526 Tracking Filtering Algorithm Based on ConvLSTM

Authors: Ailing Yang, Penghan Song, Aihua Cai

Abstract:

The nonlinear maneuvering target tracking problem is mainly a state estimation problem when the target motion model is uncertain. Traditional solutions include Kalman filtering based on Bayesian filtering framework and extended Kalman filtering. However, these methods need prior knowledge such as kinematics model and state system distribution, and their performance is poor in state estimation of nonprior complex dynamic systems. Therefore, in view of the problems existing in traditional algorithms, a convolution LSTM target state estimation (SAConvLSTM-SE) algorithm based on Self-Attention memory (SAM) is proposed to learn the historical motion state of the target and the error distribution information measured at the current time. The measured track point data of airborne radar are processed into data sets. After supervised training, the data-driven deep neural network based on SAConvLSTM can directly obtain the target state at the next moment. Through experiments on two different maneuvering targets, we find that the network has stronger robustness and better tracking accuracy than the existing tracking methods.

Keywords: maneuvering target, state estimation, Kalman filter, LSTM, self-attention

Procedia PDF Downloads 112
24525 Spatial Data Science for Data Driven Urban Planning: The Youth Economic Discomfort Index for Rome

Authors: Iacopo Testi, Diego Pajarito, Nicoletta Roberto, Carmen Greco

Abstract:

Today, a consistent segment of the world’s population lives in urban areas, and this proportion will vastly increase in the next decades. Therefore, understanding the key trends in urbanization, likely to unfold over the coming years, is crucial to the implementation of sustainable urban strategies. In parallel, the daily amount of digital data produced will be expanding at an exponential rate during the following years. The analysis of various types of data sets and its derived applications have incredible potential across different crucial sectors such as healthcare, housing, transportation, energy, and education. Nevertheless, in city development, architects and urban planners appear to rely mostly on traditional and analogical techniques of data collection. This paper investigates the prospective of the data science field, appearing to be a formidable resource to assist city managers in identifying strategies to enhance the social, economic, and environmental sustainability of our urban areas. The collection of different new layers of information would definitely enhance planners' capabilities to comprehend more in-depth urban phenomena such as gentrification, land use definition, mobility, or critical infrastructural issues. Specifically, the research results correlate economic, commercial, demographic, and housing data with the purpose of defining the youth economic discomfort index. The statistical composite index provides insights regarding the economic disadvantage of citizens aged between 18 years and 29 years, and results clearly display that central urban zones and more disadvantaged than peripheral ones. The experimental set up selected the city of Rome as the testing ground of the whole investigation. The methodology aims at applying statistical and spatial analysis to construct a composite index supporting informed data-driven decisions for urban planning.

Keywords: data science, spatial analysis, composite index, Rome, urban planning, youth economic discomfort index

Procedia PDF Downloads 107
24524 Phylogenetic Inferences based on Morphoanatomical Characters in Plectranthus esculentus N. E. Br. (Lamiaceae) from Nigeria

Authors: Otuwose E. Agyeno, Adeniyi A. Jayeola, Bashir A. Ajala

Abstract:

P. esculentus is indigenous to Nigeria yet no wild relation has been encountered or reported. This has made it difficult to establish proper lineages between the varieties and landraces under cultivation. The present work is the first to determine the apormophy of 135 morphoanatomical characters in organs of 46 accessions drawn from 23 populations of this species based on dicta. The character states were coded in accession x character-state matrices and only 83 were informative and utilised for neighbour joining clustering based on euclidean values, and heuristic search in parsimony analysis using PAST ver. 3.15 software. Compatibility and evolutionary trends between accessions were then explored from values and diagrams produced. The low consistency indices (CI) recorded support monophyly and low homoplasy in this taxon. Agglomerative schedules based on character type and source data sets divided the accessions into mainly 3 clades, each of complexes of accessions. Solenostemon rotundifolius (Poir) J.K Morton was the outgroup (OG) used, and it occurred within the largest clades except when the characters were combined in a data set. The OG showed better compatibility with accessions of populations of landrace Isci, and varieties Riyum and Long’at. Otherwise, its aerial parts are more consistent with those of accessions of variety Bebot. The highly polytomous clades produced due to anatomical data set may be an indication of how stable such characters are in this species. Strict consensus trees with more than 60 nodes outputted showed that the basal nodes were strongly supported by 3 to 17 characters across the data sets, suggesting that populations of this species are more alike. The OG was clearly the first diverging lineage and closely related to accessions of landrace Gwe and variety Bebot morphologically, but different from them anatomically. It was also distantly related to landrace Fina and variety Long’at in terms of root, stem and leaf structural attributes. There were at least 5 other clades with each comprising of complexes of accessions from different localities and terrains within the study area. Spherical stem in cross section, size of vascular bundles at the stem corners as well as the alternate and whorl phyllotaxy are attributes which may have facilitated each other’s evolution in all accessions of the landrace Gwe, and they may be innovative since such states are not characteristic of the larger Lamiaceae, and Plectranthus L’Her in particular. In conclusion, this study has provided valuable information about infraspecific diversity in this taxon. It supports recognition of the varietal statuses accorded to populations of P. esculentus, as well as the hypothesis that the wild gene might have been distributed on the Jos Plateau. However, molecular characterisation of accessions of populations of this species would resolve this problem better.

Keywords: clustering, lineage, morphoanatomical characters, Nigeria, phylogenetics, Plectranthus esculentus, population

Procedia PDF Downloads 110
24523 Feature Selection Approach for the Classification of Hydraulic Leakages in Hydraulic Final Inspection using Machine Learning

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Manufacturing companies are facing global competition and enormous cost pressure. The use of machine learning applications can help reduce production costs and create added value. Predictive quality enables the securing of product quality through data-supported predictions using machine learning models as a basis for decisions on test results. Furthermore, machine learning methods are able to process large amounts of data, deal with unfavourable row-column ratios and detect dependencies between the covariates and the given target as well as assess the multidimensional influence of all input variables on the target. Real production data are often subject to highly fluctuating boundary conditions and unbalanced data sets. Changes in production data manifest themselves in trends, systematic shifts, and seasonal effects. Thus, Machine learning applications require intensive pre-processing and feature selection. Data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets. Within the used real data set of Bosch hydraulic valves, the comparability of the same production conditions in the production of hydraulic valves within certain time periods can be identified by applying the concept drift method. Furthermore, a classification model is developed to evaluate the feature importance in different subsets within the identified time periods. By selecting comparable and stable features, the number of features used can be significantly reduced without a strong decrease in predictive power. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. In this research, the ada boosting classifier is used to predict the leakage of hydraulic valves based on geometric gauge blocks from machining, mating data from the assembly, and hydraulic measurement data from end-of-line testing. In addition, the most suitable methods are selected and accurate quality predictions are achieved.

Keywords: classification, achine learning, predictive quality, feature selection

Procedia PDF Downloads 138
24522 Remaining Useful Life Estimation of Bearings Based on Nonlinear Dimensional Reduction Combined with Timing Signals

Authors: Zhongmin Wang, Wudong Fan, Hengshan Zhang, Yimin Zhou

Abstract:

In data-driven prognostic methods, the prediction accuracy of the estimation for remaining useful life of bearings mainly depends on the performance of health indicators, which are usually fused some statistical features extracted from vibrating signals. However, the existing health indicators have the following two drawbacks: (1) The differnet ranges of the statistical features have the different contributions to construct the health indicators, the expert knowledge is required to extract the features. (2) When convolutional neural networks are utilized to tackle time-frequency features of signals, the time-series of signals are not considered. To overcome these drawbacks, in this study, the method combining convolutional neural network with gated recurrent unit is proposed to extract the time-frequency image features. The extracted features are utilized to construct health indicator and predict remaining useful life of bearings. First, original signals are converted into time-frequency images by using continuous wavelet transform so as to form the original feature sets. Second, with convolutional and pooling layers of convolutional neural networks, the most sensitive features of time-frequency images are selected from the original feature sets. Finally, these selected features are fed into the gated recurrent unit to construct the health indicator. The results state that the proposed method shows the enhance performance than the related studies which have used the same bearing dataset provided by PRONOSTIA.

Keywords: continuous wavelet transform, convolution neural net-work, gated recurrent unit, health indicators, remaining useful life

Procedia PDF Downloads 102
24521 Formulating a Flexible-Spread Fuzzy Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

This study proposes a regression model with flexible spreads for fuzzy input-output data to cope with the situation that the existing measures cannot reflect the actual estimation error. The main idea is that a dissemblance index (DI) is carefully identified and defined for precisely measuring the actual estimation error. Moreover, the graded mean integration (GMI) representation is adopted for determining more representative numeric regression coefficients. Notably, to comprehensively compare the performance of the proposed model with other ones, three different criteria are adopted. The results from commonly used test numerical examples and an application to Taiwan's business monitoring indicator illustrate that the proposed dissemblance index method not only produces valid fuzzy regression models for fuzzy input-output data, but also has satisfactory and stable performance in terms of the total estimation error based on these three criteria.

Keywords: dissemblance index, forecasting, fuzzy sets, linear regression

Procedia PDF Downloads 331
24520 Piql Preservation Services - A Holistic Approach to Digital Long-Term Preservation

Authors: Alexander Rych

Abstract:

Piql Preservation Services (“Piql”) is a turnkey solution designed for secure, migration-free long- term preservation of digital data. Piql sets an open standard for long- term preservation for the future. It consists of equipment and processes needed for writing and retrieving digital data. Exponentially growing amounts of data demand for logistically effective and cost effective processes. Digital storage media (hard disks, magnetic tape) exhibit limited lifetime. Repetitive data migration to overcome rapid obsolescence of hardware and software bears accelerated risk of data loss, data corruption or even manipulation and adds significant repetitive costs for hardware and software investments. Piql stores any kind of data in its digital as well as analog form securely for 500 years. The medium that provides this is a film reel. Using photosensitive film polyester base, a very stable material that is known for its immutability over hundreds of years, secure and cost-effective long- term preservation can be provided. The film reel itself is stored in a packaging capable of protecting the optical storage medium. These components have undergone extensive testing to ensure longevity of up to 500 years. In addition to its durability, film is a true WORM (write once- read many) medium. It therefore is resistant to editing or manipulation. Being able to store any form of data onto the film makes Piql a superior solution for long-term preservation. Paper documents, images, video or audio sequences – all of those file formats and documents can be preserved in its native file structure. In order to restore the encoded digital data, only a film scanner, a digital camera or any appropriate optical reading device will be needed in the future. Every film reel includes an index section describing the data saved on the film. It also contains a content section carrying meta-data, enabling users in the future to rebuild software in order to read and decode the digital information.

Keywords: digital data, long-term preservation, migration-free, photosensitive film

Procedia PDF Downloads 365
24519 Using Classifiers to Predict Student Outcome at Higher Institute of Telecommunication

Authors: Fuad M. Alkoot

Abstract:

We aim at highlighting the benefits of classifier systems especially in supporting educational management decisions. The paper aims at using classifiers in an educational application where an outcome is predicted based on given input parameters that represent various conditions at the institute. We present a classifier system that is designed using a limited training set with data for only one semester. The achieved system is able to reach at previously known outcomes accurately. It is also tested on new input parameters representing variations of input conditions to see its prediction on the possible outcome value. Given the supervised expectation of the outcome for the new input we find the system is able to predict the correct outcome. Experiments were conducted on one semester data from two departments only, Switching and Mathematics. Future work on other departments with larger training sets and wider input variations will show additional benefits of classifier systems in supporting the management decisions at an educational institute.

Keywords: machine learning, pattern recognition, classifier design, educational management, outcome estimation

Procedia PDF Downloads 251
24518 High Resolution Sandstone Connectivity Modelling: Implications for Outcrop Geological and Its Analog Studies

Authors: Numair Ahmed Siddiqui, Abdul Hadi bin Abd Rahman, Chow Weng Sum, Wan Ismail Wan Yousif, Asif Zameer, Joel Ben-Awal

Abstract:

Advances in data capturing from outcrop studies have made possible the acquisition of high-resolution digital data, offering improved and economical reservoir modelling methods. Terrestrial laser scanning utilizing LiDAR (Light detection and ranging) provides a new method to build outcrop based reservoir models, which provide a crucial piece of information to understand heterogeneities in sandstone facies with high-resolution images and data set. This study presents the detailed application of outcrop based sandstone facies connectivity model by acquiring information gathered from traditional fieldwork and processing detailed digital point-cloud data from LiDAR to develop an intermediate small-scale reservoir sandstone facies model of the Miocene Sandakan Formation, Sabah, East Malaysia. The software RiScan pro (v1.8.0) was used in digital data collection and post-processing with an accuracy of 0.01 m and point acquisition rate of up to 10,000 points per second. We provide an accurate and descriptive workflow to triangulate point-clouds of different sets of sandstone facies with well-marked top and bottom boundaries in conjunction with field sedimentology. This will provide highly accurate qualitative sandstone facies connectivity model which is a challenge to obtain from subsurface datasets (i.e., seismic and well data). Finally, by applying this workflow, we can build an outcrop based static connectivity model, which can be an analogue to subsurface reservoir studies.

Keywords: LiDAR, outcrop, high resolution, sandstone faceis, connectivity model

Procedia PDF Downloads 179
24517 The Language Use of Middle Eastern Freedom Activists' Speeches: A Gender Perspective

Authors: Sulistyaningtyas

Abstract:

Examining the role of Middle Eastern freedom activists’ speech based on gender perspective is considered noteworthy because the society in the Middle East is patriarchal. This research aims to examine the language use of the Middle Eastern freedom activists’ speeches through gender perspective. The data sources are from male and female Middle Eastern freedom activists’ speech videos. In analyzing the data, the theories employed are about Language Style from Gender Perspective and The Language for Speech. The result reveals that there are sets of spoken language differences between male and female speakers. In using the language for speech, both male and female speakers produce metaphor, euphemism, the ‘rule of three’, parallelism, and pronouns in random frequency of production, which cannot be separated by genders. Moreover, it cannot be concluded that one gender is more potential than the other to influence the audience in delivering speech. There are other factors, particularly non-verbal factors, existing to give impacts on how a speech can influence the audience.

Keywords: gender perspective, language use, Middle Eastern freedom activists, speech

Procedia PDF Downloads 399
24516 Analysis of Land Use, Land Cover Changes in Damaturu, Nigeria: Using Satellite Images

Authors: Isa Muhammad Zumo, Musa Lawan

Abstract:

This study analyzes the land use/land cover changes in Damaturu metropolis from 1986 to 2005. LandSat TM Images of 1986, 1999, and 2005 were used. Built-up lands, agric lands, water body and other lands were created as themes within ILWIS 3.4 software. The images were displayed in False Colour Composite (FCC) for a better visualization and identification of the themes created. Training sample sets were collected based on the ground truth data during field the checks. Statistical data were then extracted from the classified sample set. Area in hectares for each theme was calculated for each year and the result for each land use/land cover types for each study year was compared. From the result, it was found out that built-up areas have a considerable increase from 37.71 hectares in 1986 to 1062.72 hectares in 2005. It has an annual increase rate of approximately 0.34%. The results also reveal that there is a decrease of 5829.66 hectares of other lands (vacant lands) from 1986 to 2005.

Keywords: land use, changes, analysis, environmental pollution

Procedia PDF Downloads 311
24515 Optimizing Energy Efficiency: Leveraging Big Data Analytics and AWS Services for Buildings and Industries

Authors: Gaurav Kumar Sinha

Abstract:

In an era marked by increasing concerns about energy sustainability, this research endeavors to address the pressing challenge of energy consumption in buildings and industries. This study delves into the transformative potential of AWS services in optimizing energy efficiency. The research is founded on the recognition that effective management of energy consumption is imperative for both environmental conservation and economic viability. Buildings and industries account for a substantial portion of global energy use, making it crucial to develop advanced techniques for analysis and reduction. This study sets out to explore the integration of AWS services with big data analytics to provide innovative solutions for energy consumption analysis. Leveraging AWS's cloud computing capabilities, scalable infrastructure, and data analytics tools, the research aims to develop efficient methods for collecting, processing, and analyzing energy data from diverse sources. The core focus is on creating predictive models and real-time monitoring systems that enable proactive energy management. By harnessing AWS's machine learning and data analytics capabilities, the research seeks to identify patterns, anomalies, and optimization opportunities within energy consumption data. Furthermore, this study aims to propose actionable recommendations for reducing energy consumption in buildings and industries. By combining AWS services with metrics-driven insights, the research strives to facilitate the implementation of energy-efficient practices, ultimately leading to reduced carbon emissions and cost savings. The integration of AWS services not only enhances the analytical capabilities but also offers scalable solutions that can be customized for different building and industrial contexts. The research also recognizes the potential for AWS-powered solutions to promote sustainable practices and support environmental stewardship.

Keywords: energy consumption analysis, big data analytics, AWS services, energy efficiency

Procedia PDF Downloads 37
24514 Bioinformatics High Performance Computation and Big Data

Authors: Javed Mohammed

Abstract:

Right now, bio-medical infrastructure lags well behind the curve. Our healthcare system is dispersed and disjointed; medical records are a bit of a mess; and we do not yet have the capacity to store and process the crazy amounts of data coming our way from widespread whole-genome sequencing. And then there are privacy issues. Despite these infrastructure challenges, some researchers are plunging into bio medical Big Data now, in hopes of extracting new and actionable knowledge. They are doing delving into molecular-level data to discover bio markers that help classify patients based on their response to existing treatments; and pushing their results out to physicians in novel and creative ways. Computer scientists and bio medical researchers are able to transform data into models and simulations that will enable scientists for the first time to gain a profound under-standing of the deepest biological functions. Solving biological problems may require High-Performance Computing HPC due either to the massive parallel computation required to solve a particular problem or to algorithmic complexity that may range from difficult to intractable. Many problems involve seemingly well-behaved polynomial time algorithms (such as all-to-all comparisons) but have massive computational requirements due to the large data sets that must be analyzed. High-throughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. With the increased availability of genomic data traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is possible for researchers to consider modeling the folding of a protein or even the simulation of an entire human body. This research paper emphasizes the computational biology's growing need for high-performance computing and Big Data. It illustrates this article’s indispensability in meeting the scientific and engineering challenges of the twenty-first century, and how Protein Folding (the structure and function of proteins) and Phylogeny Reconstruction (evolutionary history of a group of genes) can use HPC that provides sufficient capability for evaluating or solving more limited but meaningful instances. This article also indicates solutions to optimization problems, and benefits Big Data and Computational Biology. The article illustrates the Current State-of-the-Art and Future-Generation Biology of HPC Computing with Big Data.

Keywords: high performance, big data, parallel computation, molecular data, computational biology

Procedia PDF Downloads 337
24513 Isotype and Logical Positivism: A Critical Understanding through Intersemiotic Translation

Authors: Satya Girish Goparaju, Sushmita Pareek

Abstract:

This paper examines two sets of pictograms published in Neurath’s books Basic by Isotype and International Pictorial Language in order to investigate the reasons for pictorial language having become an end in itself despite its potential to be relevant, especially in the 21st century digital age of heightened interlingual engagement. ISOTYPE was developed by Otto Neurath to be an ‘international language’ (pictorial) in the late 1920s. It was derived from the philosophy of logical positivism (of the Vienna Circle), which believed that language can be reduced to sets of direct experiences as bare symbols, devoid of the emotive and expressive functions. In his book International Picture Language, Neurath noted that any language is less clear-cut in one or the other way, and hence the pictorial language was justified. However, Isotype, as an ambitious version of logical positivism in practice distanced itself from the semiotic theories of language, and therefore his pictograms were defined as an independent set of signs rather than signs as a part of the language. This paper attempts to investigate intersemiotic translation in the form of Isotypes and trace the effects of logical positivism on Neurath’s concept of isotypes; the ‘international language’.

Keywords: intersemiotic translation, isotype, logical positivism, Otto Neurath, translation studies

Procedia PDF Downloads 214
24512 Molecular Characterization of Polyploid Bamboo (Dendrocalamus hamiltonii) Using Microsatellite Markers

Authors: Rajendra K. Meena, Maneesh S. Bhandari, Santan Barthwal, Harish S. Ginwal

Abstract:

Microsatellite markers are the most valuable tools for the characterization of plant genetic resources or population genetic analysis. Since it is codominant and allelic markers, utilizing them in polyploid species remained doubtful. In such cases, the microsatellite marker is usually analyzed by treating them as a dominant marker. In the current study, it has been showed that despite losing the advantage of co-dominance, microsatellite markers are still a powerful tool for genotyping of polyploid species because of availability of large number of reproducible alleles per locus. It has been studied by genotyping of 19 subpopulations of Dendrocalamus hamiltonii (hexaploid bamboo species) with 17 polymorphic simple sequence repeat (SSR) primer pairs. Among these, ten primers gave typical banding pattern of microsatellite marker as expected in diploid species, but rest 7 gave an unusual pattern, i.e., more than two bands per locus per genotype. In such case, genotyping data are generally analyzed by considering as dominant markers. In the current study, data were analyzed in both ways as dominant and co-dominant. All the 17 primers were first scored as nonallelic data and analyzed; later, the ten primers giving standard banding patterns were analyzed as allelic data and the results were compared. The UPGMA clustering and genetic structure showed that results obtained with both the data sets are very similar with slight variation, and therefore the SSR marker could be utilized to characterize polyploid species by considering them as a dominant marker. The study is highly useful to widen the scope for SSR markers applications and beneficial to the researchers dealing with polyploid species.

Keywords: microsatellite markers, Dendrocalamus hamiltonii, dominant and codominant, polyploids

Procedia PDF Downloads 110
24511 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 383
24510 Lowering Error Floors by Concatenation of Low-Density Parity-Check and Array Code

Authors: Cinna Soltanpur, Mohammad Ghamari, Behzad Momahed Heravi, Fatemeh Zare

Abstract:

Low-density parity-check (LDPC) codes have been shown to deliver capacity approaching performance; however, problematic graphical structures (e.g. trapping sets) in the Tanner graph of some LDPC codes can cause high error floors in bit-error-ratio (BER) performance under conventional sum-product algorithm (SPA). This paper presents a serial concatenation scheme to avoid the trapping sets and to lower the error floors of LDPC code. The outer code in the proposed concatenation is the LDPC, and the inner code is a high rate array code. This approach applies an interactive hybrid process between the BCJR decoding for the array code and the SPA for the LDPC code together with bit-pinning and bit-flipping techniques. Margulis code of size (2640, 1320) has been used for the simulation and it has been shown that the proposed concatenation and decoding scheme can considerably improve the error floor performance with minimal rate loss.

Keywords: concatenated coding, low–density parity–check codes, array code, error floors

Procedia PDF Downloads 330
24509 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

Introduction: The problems of unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many research papers found that the performance of existing classifier tends to be biased towards the majority class. The k -nearest neighbors’ nonparametric discriminant analysis is one method that was proposed for classifying unbalanced classes with good performance. Hence, the methods of discriminant analysis are of interest to us in investigating misclassification error rates for class-imbalanced data of three diabetes risk groups. Objective: The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification application of class-imbalanced data of diabetes risk groups. Methods: Data from a healthy project for 599 staffs in a government hospital in Bangkok were obtained for the classification problem. The staffs were diagnosed into one of three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data along with the variables; diabetes risk group, age, gender, cholesterol, and BMI was analyzed and bootstrapped up to 50 and 100 samples, 599 observations per sample, for additional estimation of misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples show non-normality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. In finding the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions with three choices of (0.90:0.05:0.05), (0.80: 0.10: 0.10) or (0.70, 0.15, 0.15). Results: The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k = 3 or k = 4 and the prior probabilities of {non-risk:risk:diabetic} as {0.90:0.05:0.05} or {0.80:0.10:0.10} gave the smallest error rate of misclassification. Conclusion: The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: error rate, bootstrap, diabetes risk groups, k-nearest neighbors

Procedia PDF Downloads 409
24508 Fusion of MOLA-based DEMs and HiRISE Images for Large-Scale Mars Mapping

Authors: Ahmed F. Elaksher, Islam Omar

Abstract:

In this project, we used MOLA-based DEMs to orthorectify HiRISE optical images. The MOLA data was interpolated using the kriging interpolation technique. Corresponding tie points were then digitized from both datasets. These points were employed in co-registering both datasets using GIS analysis tools. Different transformation models, including the affine and projective transformation models, were used with different sets and distributions of tie points. Additionally, we evaluated the use of the MOLA elevations in co-registering the MOLA and HiRISE datasets. The planimetric RMSEs achieved for each model are reported. Results suggested the use of 3D-2D transformation models.

Keywords: photogrammetry, Mars, MOLA, HiRISE

Procedia PDF Downloads 46
24507 Evolving Credit Scoring Models using Genetic Programming and Language Integrated Query Expression Trees

Authors: Alexandru-Ion Marinescu

Abstract:

There exist a plethora of methods in the scientific literature which tackle the well-established task of credit score evaluation. In its most abstract form, a credit scoring algorithm takes as input several credit applicant properties, such as age, marital status, employment status, loan duration, etc. and must output a binary response variable (i.e. “GOOD” or “BAD”) stating whether the client is susceptible to payment return delays. Data imbalance is a common occurrence among financial institution databases, with the majority being classified as “GOOD” clients (clients that respect the loan return calendar) alongside a small percentage of “BAD” clients. But it is the “BAD” clients we are interested in since accurately predicting their behavior is crucial in preventing unwanted loss for loan providers. We add to this whole context the constraint that the algorithm must yield an actual, tractable mathematical formula, which is friendlier towards financial analysts. To this end, we have turned to genetic algorithms and genetic programming, aiming to evolve actual mathematical expressions using specially tailored mutation and crossover operators. As far as data representation is concerned, we employ a very flexible mechanism – LINQ expression trees, readily available in the C# programming language, enabling us to construct executable pieces of code at runtime. As the title implies, they model trees, with intermediate nodes being operators (addition, subtraction, multiplication, division) or mathematical functions (sin, cos, abs, round, etc.) and leaf nodes storing either constants or variables. There is a one-to-one correspondence between the client properties and the formula variables. The mutation and crossover operators work on a flattened version of the tree, obtained via a pre-order traversal. A consequence of our chosen technique is that we can identify and discard client properties which do not take part in the final score evaluation, effectively acting as a dimensionality reduction scheme. We compare ourselves with state of the art approaches, such as support vector machines, Bayesian networks, and extreme learning machines, to name a few. The data sets we benchmark against amount to a total of 8, of which we mention the well-known Australian credit and German credit data sets, and the performance indicators are the following: percentage correctly classified, area under curve, partial Gini index, H-measure, Brier score and Kolmogorov-Smirnov statistic, respectively. Finally, we obtain encouraging results, which, although placing us in the lower half of the hierarchy, drive us to further refine the algorithm.

Keywords: expression trees, financial credit scoring, genetic algorithm, genetic programming, symbolic evolution

Procedia PDF Downloads 94
24506 The Data Quality Model for the IoT based Real-time Water Quality Monitoring Sensors

Authors: Rabbia Idrees, Ananda Maiti, Saurabh Garg, Muhammad Bilal Amin

Abstract:

IoT devices are the basic building blocks of IoT network that generate enormous volume of real-time and high-speed data to help organizations and companies to take intelligent decisions. To integrate this enormous data from multisource and transfer it to the appropriate client is the fundamental of IoT development. The handling of this huge quantity of devices along with the huge volume of data is very challenging. The IoT devices are battery-powered and resource-constrained and to provide energy efficient communication, these IoT devices go sleep or online/wakeup periodically and a-periodically depending on the traffic loads to reduce energy consumption. Sometime these devices get disconnected due to device battery depletion. If the node is not available in the network, then the IoT network provides incomplete, missing, and inaccurate data. Moreover, many IoT applications, like vehicle tracking and patient tracking require the IoT devices to be mobile. Due to this mobility, If the distance of the device from the sink node become greater than required, the connection is lost. Due to this disconnection other devices join the network for replacing the broken-down and left devices. This make IoT devices dynamic in nature which brings uncertainty and unreliability in the IoT network and hence produce bad quality of data. Due to this dynamic nature of IoT devices we do not know the actual reason of abnormal data. If data are of poor-quality decisions are likely to be unsound. It is highly important to process data and estimate data quality before bringing it to use in IoT applications. In the past many researchers tried to estimate data quality and provided several Machine Learning (ML), stochastic and statistical methods to perform analysis on stored data in the data processing layer, without focusing the challenges and issues arises from the dynamic nature of IoT devices and how it is impacting data quality. A comprehensive review on determining the impact of dynamic nature of IoT devices on data quality is done in this research and presented a data quality model that can deal with this challenge and produce good quality of data. This research presents the data quality model for the sensors monitoring water quality. DBSCAN clustering and weather sensors are used in this research to make data quality model for the sensors monitoring water quality. An extensive study has been done in this research on finding the relationship between the data of weather sensors and sensors monitoring water quality of the lakes and beaches. The detailed theoretical analysis has been presented in this research mentioning correlation between independent data streams of the two sets of sensors. With the help of the analysis and DBSCAN, a data quality model is prepared. This model encompasses five dimensions of data quality: outliers’ detection and removal, completeness, patterns of missing values and checks the accuracy of the data with the help of cluster’s position. At the end, the statistical analysis has been done on the clusters formed as the result of DBSCAN, and consistency is evaluated through Coefficient of Variation (CoV).

Keywords: clustering, data quality, DBSCAN, and Internet of things (IoT)

Procedia PDF Downloads 108
24505 Dosimetric Comparison of Conventional Optimization Methods with Inverse Planning Simulated Annealing Technique

Authors: Shraddha Srivastava, N. K. Painuly, S. P. Mishra, Navin Singh, Muhsin Punchankandy, Kirti Srivastava, M. L. B. Bhatt

Abstract:

Various optimization methods used in interstitial brachytherapy are based on dwell positions and dwell weights alteration to produce dose distribution based on the implant geometry. Since these optimization schemes are not anatomy based, they could lead to deviations from the desired plan. This study was henceforth carried out to compare anatomy-based Inverse Planning Simulated Annealing (IPSA) optimization technique with graphical and geometrical optimization methods in interstitial high dose rate brachytherapy planning of cervical carcinoma. Six patients with 12 CT data sets of MUPIT implants in HDR brachytherapy of cervical cancer were prospectively studied. HR-CTV and organs at risk (OARs) were contoured in Oncentra treatment planning system (TPS) using GYN GEC-ESTRO guidelines on cervical carcinoma. Three sets of plans were generated for each fraction using IPSA, graphical optimization (GrOPT) and geometrical optimization (GOPT) methods. All patients were treated to a dose of 20 Gy in 2 fractions. The main objective was to cover at least 95% of HR-CTV with 100% of the prescribed dose (V100 ≥ 95% of HR-CTV). IPSA, GrOPT, and GOPT based plans were compared in terms of target coverage, OAR doses, homogeneity index (HI) and conformity index (COIN) using dose-volume histogram (DVH). Target volume coverage (mean V100) was found to be 93.980.87%, 91.341.02% and 85.052.84% for IPSA, GrOPT and GOPT plans respectively. Mean D90 (minimum dose received by 90% of HR-CTV) values for IPSA, GrOPT and GOPT plans were 10.19 ± 1.07 Gy, 10.17 ± 0.12 Gy and 7.99 ± 1.0 Gy respectively, while D100 (minimum dose received by 100% volume of HR-CTV) for IPSA, GrOPT and GOPT plans was 6.55 ± 0.85 Gy, 6.55 ± 0.65 Gy, 4.73 ± 0.14 Gy respectively. IPSA plans resulted in lower doses to the bladder (D₂

Keywords: cervical cancer, HDR brachytherapy, IPSA, MUPIT

Procedia PDF Downloads 159
24504 Large Neural Networks Learning From Scratch With Very Few Data and Without Explicit Regularization

Authors: Christoph Linse, Thomas Martinetz

Abstract:

Recent findings have shown that Neural Networks generalize also in over-parametrized regimes with zero training error. This is surprising, since it is completely against traditional machine learning wisdom. In our empirical study we fortify these findings in the domain of fine-grained image classification. We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining. We train the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and StanfordCars with 100 classes and more, perform a comprehensive comparative study and draw implications for the practical application of CNNs. Finally, we show that VGG19 with 140 million weights learns to distinguish airplanes and motorbikes with up to 95% accuracy using only 20 training samples per class.

Keywords: convolutional neural networks, fine-grained image classification, generalization, image recognition, over-parameterized, small data sets

Procedia PDF Downloads 58
24503 Margin-Based Feed-Forward Neural Network Classifiers

Authors: Xiaohan Bookman, Xiaoyan Zhu

Abstract:

Margin-Based Principle has been proposed for a long time, it has been proved that this principle could reduce the structural risk and improve the performance in both theoretical and practical aspects. Meanwhile, feed-forward neural network is a traditional classifier, which is very hot at present with a deeper architecture. However, the training algorithm of feed-forward neural network is developed and generated from Widrow-Hoff Principle that means to minimize the squared error. In this paper, we propose a new training algorithm for feed-forward neural networks based on Margin-Based Principle, which could effectively promote the accuracy and generalization ability of neural network classifiers with less labeled samples and flexible network. We have conducted experiments on four UCI open data sets and achieved good results as expected. In conclusion, our model could handle more sparse labeled and more high-dimension data set in a high accuracy while modification from old ANN method to our method is easy and almost free of work.

Keywords: Max-Margin Principle, Feed-Forward Neural Network, classifier, structural risk

Procedia PDF Downloads 302
24502 A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters

Authors: Imèn Khanchouch, Malika Charrad, Mohamed Limam

Abstract:

The interpretation of the quality of clusters and the determination of the optimal number of clusters is still a crucial problem in clustering. We focus in this paper on multi-SOM clustering method which overcomes the problem of extracting the number of clusters from the SOM map through the use of a clustering validity index. We then tested multi-SOM using real and artificial data sets with different evaluation criteria not used previously such as Davies Bouldin index, Dunn index and silhouette index. The developed multi-SOM algorithm is compared to k-means and Birch methods. Results show that it is more efficient than classical clustering methods.

Keywords: clustering, SOM, multi-SOM, DB index, Dunn index, silhouette index

Procedia PDF Downloads 569
24501 Axial Load Capacity of Drilled Shafts from In-Situ Test Data at Semani Site, in Albania

Authors: Neritan Shkodrani, Klearta Rrushi, Anxhela Shaha

Abstract:

Generally, the design of axial load capacity of deep foundations is based on the data provided from field tests, such as SPT (Standard Penetration Test) and CPT (Cone Penetration Test) tests. This paper reports the results of axial load capacity analysis of drilled shafts at a construction site at Semani, in Fier county, Fier prefecture in Albania. In this case, the axial load capacity analyses are based on the data of 416 SPT tests and 12 CPTU tests, which are carried out in this site construction using 12 boreholes (10 borings of a depth 30.0 m and 2 borings of a depth of 80.0m). The considered foundation widths range from 0.5m to 2.5 m and foundation embedment lengths is fixed at a value of 25m. SPT – based analytical methods from the Japanese practice of design (Building Standard Law of Japan) and CPT – based analytical Eslami and Fellenius methods are used for obtaining axial ultimate load capacity of drilled shafts. The considered drilled shaft (25m long and 0.5m - 2.5m in diameter) is analyzed for the soil conditions of each borehole. The values obtained from sets of calculations are shown in different charts. Then the reported axial load capacity values acquired from SPT and CPTU data are compared and some conclusions are found related to the mentioned methods of calculations.

Keywords: deep foundations, drilled shafts, axial load capacity, ultimate load capacity, allowable load capacity, SPT test, CPTU test

Procedia PDF Downloads 76
24500 A Generalisation of Pearson's Curve System and Explicit Representation of the Associated Density Function

Authors: S. B. Provost, Hossein Zareamoghaddam

Abstract:

A univariate density approximation technique whereby the derivative of the logarithm of a density function is assumed to be expressible as a rational function is introduced. This approach which extends Pearson’s curve system is solely based on the moments of a distribution up to a determinable order. Upon solving a system of linear equations, the coefficients of the polynomial ratio can readily be identified. An explicit solution to the integral representation of the resulting density approximant is then obtained. It will be explained that when utilised in conjunction with sample moments, this methodology lends itself to the modelling of ‘big data’. Applications to sets of univariate and bivariate observations will be presented.

Keywords: density estimation, log-density, moments, Pearson's curve system

Procedia PDF Downloads 252