Search results for: data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24782

Search results for: data sets

24632 A Secure System for Handling Information from Heterogeous Sources

Authors: Shoohira Aftab, Hammad Afzal

Abstract:

Information integration is a well known procedure to provide consolidated view on sets of heterogeneous information sources. It not only provides better statistical analysis of information but also facilitates users to query without any knowledge on the underlying heterogeneous information sources The problem of providing a consolidated view of information can be handled using Semantic data (information stored in such a way that is understandable by machines and integrate-able without manual human intervention). However, integrating information using semantic web technology without any access management enforced, will results in increase of privacy and confidentiality concerns. In this research we have designed and developed a framework that would allow information from heterogeneous formats to be consolidated, thus resolving the issue of interoperability. We have also devised an access control system for defining explicit privacy constraints. We designed and applied our framework on both semantic and non-semantic data from heterogeneous resources. Our approach is validated using scenario based testing.

Keywords: information integration, semantic data, interoperability, security, access control system

Procedia PDF Downloads 321
24631 An ANN Approach for Detection and Localization of Fatigue Damage in Aircraft Structures

Authors: Reza Rezaeipour Honarmandzad

Abstract:

In this paper we propose an ANN for detection and localization of fatigue damage in aircraft structures. We used network of piezoelectric transducers for Lamb-wave measurements in order to calculate damage indices. Data gathered by the sensors was given to neural network classifier. A set of neural network electors of different architecture cooperates to achieve consensus concerning the state of each monitored path. Sensed signal variations in the ROI, detected by the networks at each path, were used to assess the state of the structure as well as to localize detected damage and to filter out ambient changes. The classifier has been extensively tested on large data sets acquired in the tests of specimens with artificially introduced notches as well as the results of numerous fatigue experiments. Effect of the classifier structure and test data used for training on the results was evaluated.

Keywords: ANN, fatigue damage, aircraft structures, piezoelectric transducers, lamb-wave measurements

Procedia PDF Downloads 390
24630 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 368
24629 Global City Typologies: 300 Cities and Over 100 Datasets

Authors: M. Novak, E. Munoz, A. Jana, M. Nelemans

Abstract:

Cities and local governments the world over are interested to employ circular strategies as a means to bring about food security, create employment and increase resilience. The selection and implementation of circular strategies is facilitated by modeling the effects of strategies locally and understanding the impacts such strategies have had in other (comparable) cities and how that would translate locally. Urban areas are heterogeneous because of their geographic, economic, social characteristics, governance, and culture. In order to better understand the effect of circular strategies on urban systems, we create a dataset for over 300 cities around the world designed to facilitate circular strategy scenario modeling. This new dataset integrates data from over 20 prominent global national and urban data sources, such as the Global Human Settlements layer and International Labour Organisation, as well as incorporating employment data from over 150 cities collected bottom up from local departments and data providers. The dataset is made to be reproducible. Various clustering techniques are explored in the paper. The result is sets of clusters of cities, which can be used for further research, analysis, and support comparative, regional, and national policy making on circular cities.

Keywords: data integration, urban innovation, cluster analysis, circular economy, city profiles, scenario modelling

Procedia PDF Downloads 157
24628 Ensemble-Based SVM Classification Approach for miRNA Prediction

Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam

Abstract:

In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.

Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data

Procedia PDF Downloads 312
24627 An Experimental Exploration of the Interaction between Consumer Ethics Perceptions, Legality Evaluations, and Mind-Sets

Authors: Daphne Sobolev, Niklas Voege

Abstract:

During the last three decades, consumer ethics perceptions have attracted the attention of a large number of researchers. Nevertheless, little is known about the effect of the cognitive and situational contexts of the decision on ethics judgments. In this paper, the interrelationship between consumers’ ethics perceptions, legality evaluations and mind-sets are explored. Legality evaluations represent the cognitive context of the ethical judgments, whereas mind-sets represent their situational context. Drawing on moral development theories and priming theories, it is hypothesized that both factors are significantly related to consumer ethics perceptions. To test this hypothesis, 289 participants were allocated to three mind-set experimental conditions and a control group. Participants in the mind-set conditions were primed for aggressiveness, politeness or awareness to the negative legal consequences of breaking the law. Mind-sets were induced using a sentence-unscrambling task, in which target words were included. Ethics and legality judgments were assessed using consumer ethics and internet ethics questionnaires. All participants were asked to rate the ethicality and legality of consumer actions described in the questionnaires. The results showed that consumer ethics and legality perceptions were significantly correlated. Moreover, including legality evaluations as a variable in ethics judgment models increased the predictive power of the models. In addition, inducing aggressiveness in participants reduced their sensitivity to ethical issues; priming awareness to negative legal consequences increased their sensitivity to ethics when uncertainty about the legality of the judged scenario was high. Furthermore, the correlation between ethics and legality judgments was significant overall mind-set conditions. However, the results revealed conflicts between ethics and legality perceptions: consumers considered 10%-14% of the presented behaviors unethical and legal, or ethical and illegal. In 10-23% of the questions, participants indicated that they did not know whether the described action was legal or not. In addition, an asymmetry between the effects of aggressiveness and politeness priming was found. The results show that the legality judgments and mind-sets interact with consumer ethics perceptions. Thus, they portray consumer ethical judgments as dynamical processes which are inseparable from other cognitive processes and situational variables. They highlight that legal and ethical education, as well as adequate situational cues at the service place, could have a positive effect on consumer ethics perceptions. Theoretical contribution is discussed.

Keywords: consumer ethics, legality judgments, mind-set, priming, aggressiveness

Procedia PDF Downloads 271
24626 Hardware Implementation on Field Programmable Gate Array of Two-Stage Algorithm for Rough Set Reduct Generation

Authors: Tomasz Grzes, Maciej Kopczynski, Jaroslaw Stepaniuk

Abstract:

The rough sets theory developed by Prof. Z. Pawlak is one of the tools that can be used in the intelligent systems for data analysis and processing. Banking, medicine, image recognition and security are among the possible fields of utilization. In all these fields, the amount of the collected data is increasing quickly, but with the increase of the data, the computation speed becomes the critical factor. Data reduction is one of the solutions to this problem. Removing the redundancy in the rough sets can be achieved with the reduct. A lot of algorithms of generating the reduct were developed, but most of them are only software implementations, therefore have many limitations. Microprocessor uses the fixed word length, consumes a lot of time for either fetching as well as processing of the instruction and data; consequently, the software based implementations are relatively slow. Hardware systems don’t have these limitations and can process the data faster than a software. Reduct is the subset of the decision attributes that provides the discernibility of the objects. For the given decision table there can be more than one reduct. Core is the set of all indispensable condition attributes. None of its elements can be removed without affecting the classification power of all condition attributes. Moreover, every reduct consists of all the attributes from the core. In this paper, the hardware implementation of the two-stage greedy algorithm to find the one reduct is presented. The decision table is used as an input. Output of the algorithm is the superreduct which is the reduct with some additional removable attributes. First stage of the algorithm is calculating the core using the discernibility matrix. Second stage is generating the superreduct by enriching the core with the most common attributes, i.e., attributes that are more frequent in the decision table. Described above algorithm has two disadvantages: i) generating the superreduct instead of reduct, ii) additional first stage may be unnecessary if the core is empty. But for the systems focused on the fast computation of the reduct the first disadvantage is not the key problem. The core calculation can be achieved with a combinational logic block, and thus add respectively little time to the whole process. Algorithm presented in this paper was implemented in Field Programmable Gate Array (FPGA) as a digital device consisting of blocks that process the data in a single step. Calculating the core is done by the comparators connected to the block called 'singleton detector', which detects if the input word contains only single 'one'. Calculating the number of occurrences of the attribute is performed in the combinational block made up of the cascade of the adders. The superreduct generation process is iterative and thus needs the sequential circuit for controlling the calculations. For the research purpose, the algorithm was also implemented in C language and run on a PC. The times of execution of the reduct calculation in a hardware and software were considered. Results show increase in the speed of data processing.

Keywords: data reduction, digital systems design, field programmable gate array (FPGA), reduct, rough set

Procedia PDF Downloads 191
24625 Applied Canonical Correlation Analysis to Explore the Relationship between Resourcefulness and Quality of Life in Cancer Population

Authors: Chiou-Fang Liou

Abstract:

Cancer has been one of the most life-threaten diseases worldwide for 30+ years. The influences of cancer illness include symptoms from cancer itself along with its treatments. The quality of life among patients diagnosed with cancer during cancer treatments has been conceptualized within four domains: Functional Well-Being, Social Well-Being, Physical Well-Being, and Emotional Well-Being. Patients with cancer often need to make adjustments to face all the challenges. The middle-range theory of Resourcefulness and Quality of life has been applied to explore factors contributing to cancer patients’ needs. Resourcefulness is defined as sets of skills that can be learned and consisted of Person and Social Resourcefulness. Empirical evidence also supported a possible relationship between Resourcefulness and Quality of Life. However, little is known about the extent to which the two concepts are related to each other. This study, therefore, applied a multivariate technique, Canonical Correlation Analysis, to identify the relationship between the two sets of variables with multi-dimensional measures, the Resourcefulness and Quality of Life in Cancer patients receiving treatments. After IRB approval, this multi-centered study took place at two medical centers in the Central Region of Taiwan. Sample A total of 186 patients with various cancer diagnoses and either receiving radiation therapy or chemotherapy consented to and answered questionnaires. The Import findings of the Generalized F test identified two typical sets with several linear relations and explained a total of 79.1% of the total variance. The first typical set found Personal Resourcefulness negatively related to Social Well-being, Functional being, Emotional Well-being, and Physical, in that order. The second typical set found Social Resourcefulness negatively related to Functional Well-being and Physical-being yet positively related to Social Well-being and Emotional Well-being. Discussion and Conclusion, The results of this presented study supported the statistically significant relationship between two sets of variables that are consistent with the theory. In addition, the results are considerably important in cancer patients receiving cancer treatments.

Keywords: cancer, canonical correlation analysis, quality of life, resourcefulness

Procedia PDF Downloads 50
24624 Scientific Linux Cluster for BIG-DATA Analysis (SLBD): A Case of Fayoum University

Authors: Hassan S. Hussein, Rania A. Abul Seoud, Amr M. Refaat

Abstract:

Scientific researchers face in the analysis of very large data sets that is increasing noticeable rate in today’s and tomorrow’s technologies. Hadoop and Spark are types of software that developed frameworks. Hadoop framework is suitable for many Different hardware platforms. In this research, a scientific Linux cluster for Big Data analysis (SLBD) is presented. SLBD runs open source software with large computational capacity and high performance cluster infrastructure. SLBD composed of one cluster contains identical, commodity-grade computers interconnected via a small LAN. SLBD consists of a fast switch and Gigabit-Ethernet card which connect four (nodes). Cloudera Manager is used to configure and manage an Apache Hadoop stack. Hadoop is a framework allows storing and processing big data across the cluster by using MapReduce algorithm. MapReduce algorithm divides the task into smaller tasks which to be assigned to the network nodes. Algorithm then collects the results and form the final result dataset. SLBD clustering system allows fast and efficient processing of large amount of data resulting from different applications. SLBD also provides high performance, high throughput, high availability, expandability and cluster scalability.

Keywords: big data platforms, cloudera manager, Hadoop, MapReduce

Procedia PDF Downloads 333
24623 A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm

Authors: J. Yang, Y. Ma, X. Zhang, S. Li, Y. Zhang

Abstract:

The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.

Keywords: degree, initial cluster center, k-means, minimum spanning tree

Procedia PDF Downloads 380
24622 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models

Authors: Yoonsuh Jung

Abstract:

As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an "optimal" value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.

Keywords: cross validation, parameter averaging, parameter selection, regularization parameter search

Procedia PDF Downloads 390
24621 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Byung Ho Jung, Dong Hoon Lim

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: big data, logistic regression, MapReduce, RHadoop

Procedia PDF Downloads 250
24620 Impact Location From Instrumented Mouthguard Kinematic Data In Rugby

Authors: Jazim Sohail, Filipe Teixeira-Dias

Abstract:

Mild traumatic brain injury (mTBI) within non-helmeted contact sports is a growing concern due to the serious risk of potential injury. Extensive research is being conducted looking into head kinematics in non-helmeted contact sports utilizing instrumented mouthguards that allow researchers to record accelerations and velocities of the head during and after an impact. This does not, however, allow the location of the impact on the head, and its magnitude and orientation, to be determined. This research proposes and validates two methods to quantify impact locations from instrumented mouthguard kinematic data, one using rigid body dynamics, the other utilizing machine learning. The rigid body dynamics technique focuses on establishing and matching moments from Euler’s and torque equations in order to find the impact location on the head. The methodology is validated with impact data collected from a lab test with the dummy head fitted with an instrumented mouthguard. Additionally, a Hybrid III Dummy head finite element model was utilized to create synthetic kinematic data sets for impacts from varying locations to validate the impact location algorithm. The algorithm calculates accurate impact locations; however, it will require preprocessing of live data, which is currently being done by cross-referencing data timestamps to video footage. The machine learning technique focuses on eliminating the preprocessing aspect by establishing trends within time-series signals from instrumented mouthguards to determine the impact location on the head. An unsupervised learning technique is used to cluster together impacts within similar regions from an entire time-series signal. The kinematic signals established from mouthguards are converted to the frequency domain before using a clustering algorithm to cluster together similar signals within a time series that may span the length of a game. Impacts are clustered within predetermined location bins. The same Hybrid III Dummy finite element model is used to create impacts that closely replicate on-field impacts in order to create synthetic time-series datasets consisting of impacts in varying locations. These time-series data sets are used to validate the machine learning technique. The rigid body dynamics technique provides a good method to establish accurate impact location of impact signals that have already been labeled as true impacts and filtered out of the entire time series. However, the machine learning technique provides a method that can be implemented with long time series signal data but will provide impact location within predetermined regions on the head. Additionally, the machine learning technique can be used to eliminate false impacts captured by sensors saving additional time for data scientists using instrumented mouthguard kinematic data as validating true impacts with video footage would not be required.

Keywords: head impacts, impact location, instrumented mouthguard, machine learning, mTBI

Procedia PDF Downloads 181
24619 Investigation of the Effects of Simple Heating Processes on the Crystallization of Bi₂WO₆

Authors: Cisil Gulumser, Francesc Medina, Sevil Veli

Abstract:

In this study, the synthesis of photocatalytic Bi₂WO₆ was practiced with simple heating processes and the effects of these treatments on the production of the desired compound were investigated. For this purpose, experiments with Bi(NO₃)₃.5H₂O and H₂WO₄ precursors were carried out to synthesize Bi₂WO₆ by four different combinations. These four combinations were grouped in two main sets as ‘treated in microwave reactor’ and ‘directly filtrated’; additionally these main sets were grouped into two subsets as ‘calcined’ and ‘not calcined’. Calcination processes were conducted at temperatures of 400ᵒC, 600ᵒC, and 800ᵒC. X-ray diffraction (XRD) and environmental scanning electron microscopy (ESEM) analyses were performed in order to investigate the crystal structure of powdered product synthesized with each combination. The highest crystallization of produced compounds was observed for calcination at 600ᵒC from each main group.

Keywords: bismuth tungstate, crystallization, microwave, photocatalysts

Procedia PDF Downloads 151
24618 Integration of Big Data to Predict Transportation for Smart Cities

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system.  The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.

Keywords: big data, machine learning, smart city, social cost, transportation network

Procedia PDF Downloads 231
24617 Economic Valuation of Forest Landscape Function Using a Conditional Logit Model

Authors: A. J. Julius, E. Imoagene, O. A. Ganiyu

Abstract:

The purpose of this study is to estimate the economic value of the services and functions rendered by the forest landscape using a conditional logit model. For this study, attributes and levels of forest landscape were chosen; specifically, attributes include topographical forest type, forest type, forest density, recreational factor (side trip, accessibility of valley), and willingness to participate (WTP). Based on these factors, 48 choices sets with balanced and orthogonal form using statistical analysis system (SAS) 9.1 was adopted. The efficiency of the questionnaire was 6.02 (D-Error. 0.1), and choice set and socio-economic variables were analyzed. To reduce the cognitive load of respondents, the 48 choice sets were divided into 4 types in the questionnaire, so that respondents could respond to 12 choice sets, respectively. The study populations were citizens from seven metropolitan cities including Ibadan, Ilorin, Osogbo, etc. and annual WTP per household was asked by using the interview questionnaire, a total of 267 copies were recovered. As a result, Oshogbo had 0.45, and the statistical similarities could not be found except for urban forests, forest density, recreational factor, and level of WTP. Average annual WTP per household for forest landscape was 104,758 Naira (Nigerian currency) based on the outcome from this model, total economic value of the services and functions enjoyed from Nigerian forest landscape has reached approximately 1.6 trillion Naira.

Keywords: economic valuation, urban cities, services, forest landscape, logit model, nigeria

Procedia PDF Downloads 99
24616 Dynamic Risk Identification Using Fuzzy Failure Mode Effect Analysis in Fabric Process Industries: A Research Article as Management Perspective

Authors: A. Sivakumar, S. S. Darun Prakash, P. Navaneethakrishnan

Abstract:

In and around Erode District, it is estimated that more than 1250 chemical and allied textile processing fabric industries are affected, partially closed and shut off for various reasons such as poor management, poor supplier performance, lack of planning for productivity, fluctuation of output, poor investment, waste analysis, labor problems, capital/labor ratio, accumulation of stocks, poor maintenance of resources, deficiencies in the quality of fabric, low capacity utilization, age of plant and equipment, high investment and input but low throughput, poor research and development, lack of energy, workers’ fear of loss of jobs, work force mix and work ethic. The main objective of this work is to analyze the existing conditions in textile fabric sector, validate the break even of Total Productivity (TP), analyze, design and implement fuzzy sets and mathematical programming for improvement of productivity and quality dimensions in the fabric processing industry. It needs to be compatible with the reality of textile and fabric processing industries. The highly risk events from productivity and quality dimension were found by fuzzy systems and results are wrapped up among the textile fabric processing industry.

Keywords: break even point, fuzzy crisp data, fuzzy sets, productivity, productivity cycle, total productive maintenance

Procedia PDF Downloads 310
24615 Seafloor and Sea Surface Modelling in the East Coast Region of North America

Authors: Magdalena Idzikowska, Katarzyna Pająk, Kamil Kowalczyk

Abstract:

Seafloor topography is a fundamental issue in geological, geophysical, and oceanographic studies. Single-beam or multibeam sonars attached to the hulls of ships are used to emit a hydroacoustic signal from transducers and reproduce the topography of the seabed. This solution provides relevant accuracy and spatial resolution. Bathymetric data from ships surveys provides National Centers for Environmental Information – National Oceanic and Atmospheric Administration. Unfortunately, most of the seabed is still unidentified, as there are still many gaps to be explored between ship survey tracks. Moreover, such measurements are very expensive and time-consuming. The solution is raster bathymetric models shared by The General Bathymetric Chart of the Oceans. The offered products are a compilation of different sets of data - raw or processed. Indirect data for the development of bathymetric models are also measurements of gravity anomalies. Some forms of seafloor relief (e.g. seamounts) increase the force of the Earth's pull, leading to changes in the sea surface. Based on satellite altimetry data, Sea Surface Height and marine gravity anomalies can be estimated, and based on the anomalies, it’s possible to infer the structure of the seabed. The main goal of the work is to create regional bathymetric models and models of the sea surface in the area of the east coast of North America – a region of seamounts and undulating seafloor. The research includes an analysis of the methods and techniques used, an evaluation of the interpolation algorithms used, model thickening, and the creation of grid models. Obtained data are raster bathymetric models in NetCDF format, survey data from multibeam soundings in MB-System format, and satellite altimetry data from Copernicus Marine Environment Monitoring Service. The methodology includes data extraction, processing, mapping, and spatial analysis. Visualization of the obtained results was carried out with Geographic Information System tools. The result is an extension of the state of the knowledge of the quality and usefulness of the data used for seabed and sea surface modeling and knowledge of the accuracy of the generated models. Sea level is averaged over time and space (excluding waves, tides, etc.). Its changes, along with knowledge of the topography of the ocean floor - inform us indirectly about the volume of the entire water ocean. The true shape of the ocean surface is further varied by such phenomena as tides, differences in atmospheric pressure, wind systems, thermal expansion of water, or phases of ocean circulation. Depending on the location of the point, the higher the depth, the lower the trend of sea level change. Studies show that combining data sets, from different sources, with different accuracies can affect the quality of sea surface and seafloor topography models.

Keywords: seafloor, sea surface height, bathymetry, satellite altimetry

Procedia PDF Downloads 56
24614 Evaluation of Diagnosis Performance Based on Pairwise Model Construction and Filtered Data

Authors: Hyun-Woo Cho

Abstract:

It is quite important to utilize right time and intelligent production monitoring and diagnosis of industrial processes in terms of quality and safety issues. When compared with monitoring task, fault diagnosis represents the task of finding process variables responsible causing a specific fault in the process. It can be helpful to process operators who should investigate and eliminate root causes more effectively and efficiently. This work focused on the active use of combining a nonlinear statistical technique with a preprocessing method in order to implement practical real-time fault identification schemes for data-rich cases. To compare its performance to existing identification schemes, a case study on a benchmark process was performed in several scenarios. The results showed that the proposed fault identification scheme produced more reliable diagnosis results than linear methods. In addition, the use of the filtering step improved the identification results for the complicated processes with massive data sets.

Keywords: diagnosis, filtering, nonlinear statistical techniques, process monitoring

Procedia PDF Downloads 214
24613 On Estimating the Low Income Proportion with Several Auxiliary Variables

Authors: Juan F. Muñoz-Rosas, Rosa M. García-Fernández, Encarnación Álvarez-Verdejo, Pablo J. Moya-Fernández

Abstract:

Poverty measurement is a very important topic in many studies in social sciences. One of the most important indicators when measuring poverty is the low income proportion. This indicator gives the proportion of people of a population classified as poor. This indicator is generally unknown, and for this reason, it is estimated by using survey data, which are obtained by official surveys carried out by many statistical agencies such as Eurostat. The main feature of the mentioned survey data is the fact that they contain several variables. The variable used to estimate the low income proportion is called as the variable of interest. The survey data may contain several additional variables, also named as the auxiliary variables, related to the variable of interest, and if this is the situation, they could be used to improve the estimation of the low income proportion. In this paper, we use Monte Carlo simulation studies to analyze numerically the performance of estimators based on several auxiliary variables. In this simulation study, we considered real data sets obtained from the 2011 European Union Survey on Income and Living Condition. Results derived from this study indicate that the estimators based on auxiliary variables are more accurate than the naive estimator.

Keywords: inclusion probability, poverty, poverty line, survey sampling

Procedia PDF Downloads 427
24612 Inverse Scattering for a Second-Order Discrete System via Transmission Eigenvalues

Authors: Abdon Choque-Rivero

Abstract:

The Jacobi system with the Dirichlet boundary condition is considered on a half-line lattice when the coefficients are real valued. The inverse problem of recovery of the coefficients from various data sets containing the so-called transmission eigenvalues is analyzed. The Marchenko method is utilized to solve the corresponding inverse problem.

Keywords: inverse scattering, discrete system, transmission eigenvalues, Marchenko method

Procedia PDF Downloads 120
24611 A General Framework for Knowledge Discovery from Echocardiographic and Natural Images

Authors: S. Nandagopalan, N. Pradeep

Abstract:

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.

Keywords: active contour, Bayesian, echocardiographic image, feature vector

Procedia PDF Downloads 417
24610 Additive Weibull Model Using Warranty Claim and Finite Element Analysis Fatigue Analysis

Authors: Kanchan Mondal, Dasharath Koulage, Dattatray Manerikar, Asmita Ghate

Abstract:

This paper presents an additive reliability model using warranty data and Finite Element Analysis (FEA) data. Warranty data for any product gives insight to its underlying issues. This is often used by Reliability Engineers to build prediction model to forecast failure rate of parts. But there is one major limitation in using warranty data for prediction. Warranty periods constitute only a small fraction of total lifetime of a product, most of the time it covers only the infant mortality and useful life zone of a bathtub curve. Predicting with warranty data alone in these cases is not generally provide results with desired accuracy. Failure rate of a mechanical part is driven by random issues initially and wear-out or usage related issues at later stages of the lifetime. For better predictability of failure rate, one need to explore the failure rate behavior at wear out zone of a bathtub curve. Due to cost and time constraints, it is not always possible to test samples till failure, but FEA-Fatigue analysis can provide the failure rate behavior of a part much beyond warranty period in a quicker time and at lesser cost. In this work, the authors proposed an Additive Weibull Model, which make use of both warranty and FEA fatigue analysis data for predicting failure rates. It involves modeling of two data sets of a part, one with existing warranty claims and other with fatigue life data. Hazard rate base Weibull estimation has been used for the modeling the warranty data whereas S-N curved based Weibull parameter estimation is used for FEA data. Two separate Weibull models’ parameters are estimated and combined to form the proposed Additive Weibull Model for prediction.

Keywords: bathtub curve, fatigue, FEA, reliability, warranty, Weibull

Procedia PDF Downloads 46
24609 The Effect of Different Strength Training Methods on Muscle Strength, Body Composition and Factors Affecting Endurance Performance

Authors: Shaher A. I. Shalfawi, Fredrik Hviding, Bjornar Kjellstadli

Abstract:

The main purpose of this study was to measure the effect of two different strength training methods on muscle strength, muscle mass, fat mass and endurance factors. Fourteen physical education students accepted to participate in this study. The participants were then randomly divided into three groups, traditional training group (TTG), cluster training group (CTG) and control group (CG). TTG consisted of 4 participants aged ( ± SD) (22.3 ± 1.5 years), body mass (79.2 ± 15.4 kg) and height (178.3 ± 11.9 cm). CTG consisted of 5 participants aged (22.2 ± 3.5 years), body mass (81.0 ± 24.0 kg) and height (180.2 ± 12.3 cm). CG consisted of 5 participants aged (22 ± 2.8 years), body mass (77 ± 19 kg) and height (174 ± 6.7 cm). The participants underwent a hypertrophy strength training program twice a week consisting of 4 sets of 10 reps at 70% of one-repetition maximum (1RM), using barbell squat and barbell bench press for 8 weeks. The CTG performed 2 x 5 reps using 10 s recovery in between repetitions and 50 s recovery between sets, while TTG performed 4 sets of 10 reps with 90 s recovery in between sets. Pre- and post-tests were administrated to assess body composition (weight, muscle mass, and fat mass), 1RM (bench press and barbell squat) and a laboratory endurance test (Bruce Protocol). Instruments used to collect the data were Tanita BC-601 scale (Tanita, Illinois, USA), Woodway treadmill (Woodway, Wisconsin, USA) and Vyntus CPX breath-to-breath system (Jaeger, Hoechberg, Germany). Analysis was conducted at all measured variables including time to peak VO2, peak VO2, heart rate (HR) at peak VO2, respiratory exchange ratio (RER) at peak VO2, and number of breaths per minute. The results indicate an increase in 1RM performance after 8 weeks of training. The change in 1RM squat was for the TTG = 30 ± 3.8 kg, CTG = 28.6 ± 8.3 kg and CG = 10.3 ± 13.8 kg. Similarly, the change in 1RM bench press was for the TTG = 9.8 ± 2.8 kg, CTG = 7.4 ± 3.4 kg and CG = 4.4 ± 3.4 kg. The within-group analysis from the oxygen consumption measured during the incremental exercise indicated that the TTG had only a statistical significant increase in their RER from 1.16 ± 0.04 to 1.23 ± 0.05 (P < 0.05). The CTG had a statistical significant improvement in their HR at peak VO2 from 186 ± 24 to 191 ± 12 Beats Per Minute (P < 0.05) and their RER at peak VO2 from 1.11 ± 0.06 to 1.18 ±0.05 (P < 0.05). Finally, the CG had only a statistical significant increase in their RER at peak VO2 from 1.11 ± 0.07 to 1.21 ± 0.05 (P < 0.05). The between-group analysis showed no statistical differences between all groups in all the measured variables from the oxygen consumption test during the incremental exercise including changes in muscle mass, fat mass, and weight (kg). The results indicate a similar effect of hypertrophy strength training irrespective of the methods of the training used on untrained subjects. Because there were no notable changes in body-composition measures, the results suggest that the improvements in performance observed in all groups is most probably due to neuro-muscular adaptation to training.

Keywords: hypertrophy strength training, cluster set, Bruce protocol, peak VO2

Procedia PDF Downloads 225
24608 Liver Lesion Extraction with Fuzzy Thresholding in Contrast Enhanced Ultrasound Images

Authors: Abder-Rahman Ali, Adélaïde Albouy-Kissi, Manuel Grand-Brochier, Viviane Ladan-Marcus, Christine Hoeffl, Claude Marcus, Antoine Vacavant, Jean-Yves Boire

Abstract:

In this paper, we present a new segmentation approach for focal liver lesions in contrast enhanced ultrasound imaging. This approach, based on a two-cluster Fuzzy C-Means methodology, considers type-II fuzzy sets to handle uncertainty due to the image modality (presence of speckle noise, low contrast, etc.), and to calculate the optimum inter-cluster threshold. Fine boundaries are detected by a local recursive merging of ambiguous pixels. The method has been tested on a representative database. Compared to both Otsu and type-I Fuzzy C-Means techniques, the proposed method significantly reduces the segmentation errors.

Keywords: defuzzification, fuzzy clustering, image segmentation, type-II fuzzy sets

Procedia PDF Downloads 456
24607 Developing Pavement Structural Deterioration Curves

Authors: Gregory Kelly, Gary Chai, Sittampalam Manoharan, Deborah Delaney

Abstract:

A Structural Number (SN) can be calculated for a road pavement from the properties and thicknesses of the surface, base course, sub-base, and subgrade. Historically, the cost of collecting structural data has been very high. Data were initially collected using Benkelman Beams and now by Falling Weight Deflectometer (FWD). The structural strength of pavements weakens over time due to environmental and traffic loading factors, but due to a lack of data, no structural deterioration curve for pavements has been implemented in a Pavement Management System (PMS). International Roughness Index (IRI) is a measure of the road longitudinal profile and has been used as a proxy for a pavement’s structural integrity. This paper offers two conceptual methods to develop Pavement Structural Deterioration Curves (PSDC). Firstly, structural data are grouped in sets by design Equivalent Standard Axles (ESA). An ‘Initial’ SN (ISN), Intermediate SN’s (SNI) and a Terminal SN (TSN), are used to develop the curves. Using FWD data, the ISN is the SN after the pavement is rehabilitated (Financial Accounting ‘Modern Equivalent’). Intermediate SNIs, are SNs other than the ISN and TSN. The TSN was defined as the SN of the pavement when it was approved for pavement rehabilitation. The second method is to use Traffic Speed Deflectometer data (TSD). The road network already divided into road blocks, is grouped by traffic loading. For each traffic loading group, road blocks that have had a recent pavement rehabilitation, are used to calculate the ISN and those planned for pavement rehabilitation to calculate the TSN. The remaining SNs are used to complete the age-based or if available, historical traffic loading-based SNI’s.

Keywords: conceptual, pavement structural number, pavement structural deterioration curve, pavement management system

Procedia PDF Downloads 520
24606 Mutual Information Based Image Registration of Satellite Images Using PSO-GA Hybrid Algorithm

Authors: Dipti Patra, Guguloth Uma, Smita Pradhan

Abstract:

Registration is a fundamental task in image processing. It is used to transform different sets of data into one coordinate system, where data are acquired from different times, different viewing angles, and/or different sensors. The registration geometrically aligns two images (the reference and target images). Registration techniques are used in satellite images and it is important in order to be able to compare or integrate the data obtained from these different measurements. In this work, mutual information is considered as a similarity metric for registration of satellite images. The transformation is assumed to be a rigid transformation. An attempt has been made here to optimize the transformation function. The proposed image registration technique hybrid PSO-GA incorporates the notion of Particle Swarm Optimization and Genetic Algorithm and is used for finding the best optimum values of transformation parameters. The performance comparision obtained with the experiments on satellite images found that the proposed hybrid PSO-GA algorithm outperforms the other algorithms in terms of mutual information and registration accuracy.

Keywords: image registration, genetic algorithm, particle swarm optimization, hybrid PSO-GA algorithm and mutual information

Procedia PDF Downloads 379
24605 Generation of Knowlege with Self-Learning Methods for Ophthalmic Data

Authors: Klaus Peter Scherer, Daniel Knöll, Constantin Rieder

Abstract:

Problem and Purpose: Intelligent systems are available and helpful to support the human being decision process, especially when complex surgical eye interventions are necessary and must be performed. Normally, such a decision support system consists of a knowledge-based module, which is responsible for the real assistance power, given by an explanation and logical reasoning processes. The interview based acquisition and generation of the complex knowledge itself is very crucial, because there are different correlations between the complex parameters. So, in this project (semi)automated self-learning methods are researched and developed for an enhancement of the quality of such a decision support system. Methods: For ophthalmic data sets of real patients in a hospital, advanced data mining procedures seem to be very helpful. Especially subgroup analysis methods are developed, extended and used to analyze and find out the correlations and conditional dependencies between the structured patient data. After finding causal dependencies, a ranking must be performed for the generation of rule-based representations. For this, anonymous patient data are transformed into a special machine language format. The imported data are used as input for algorithms of conditioned probability methods to calculate the parameter distributions concerning a special given goal parameter. Results: In the field of knowledge discovery advanced methods and applications could be performed to produce operation and patient related correlations. So, new knowledge was generated by finding causal relations between the operational equipment, the medical instances and patient specific history by a dependency ranking process. After transformation in association rules logically based representations were available for the clinical experts to evaluate the new knowledge. The structured data sets take account of about 80 parameters as special characteristic features per patient. For different extended patient groups (100, 300, 500), as well one target value as well multi-target values were set for the subgroup analysis. So the newly generated hypotheses could be interpreted regarding the dependency or independency of patient number. Conclusions: The aim and the advantage of such a semi-automatically self-learning process are the extensions of the knowledge base by finding new parameter correlations. The discovered knowledge is transformed into association rules and serves as rule-based representation of the knowledge in the knowledge base. Even more, than one goal parameter of interest can be considered by the semi-automated learning process. With ranking procedures, the most strong premises and also conjunctive associated conditions can be found to conclude the interested goal parameter. So the knowledge, hidden in structured tables or lists can be extracted as rule-based representation. This is a real assistance power for the communication with the clinical experts.

Keywords: an expert system, knowledge-based support, ophthalmic decision support, self-learning methods

Procedia PDF Downloads 232
24604 A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms

Authors: S. Nandagopalan, N. Pradeep

Abstract:

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.

Keywords: active contour, bayesian, echocardiographic image, feature vector

Procedia PDF Downloads 391
24603 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 253