Search results for: Data analysis
12992 Towards End-To-End Disease Prediction from Raw Metagenomic Data
Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker
Abstract:
Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.Keywords: Metagenomics, phenotype prediction, deep learning, embeddings, multiple instance learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 91812991 Two DEA Based Ant Algorithms for CMS Problems
Authors: Hossein Ali Akbarpour, Fatemeh Dadkhah
Abstract:
This paper considers a multi criteria cell formation problem in Cellular Manufacturing System (CMS). Minimizing the number of voids and exceptional elements in cells simultaneously are two proposed objective functions. This problem is an Np-hard problem according to the literature, and therefore, we can-t find the optimal solution by an exact method. In this paper we developed two ant algorithms, Ant Colony Optimization (ACO) and Max-Min Ant System (MMAS), based on Data Envelopment Analysis (DEA). Both of them try to find the efficient solutions based on efficiency concept in DEA. Each artificial ant is considered as a Decision Making Unit (DMU). For each DMU we considered two inputs, the values of objective functions, and one output, the value of one for all of them. In order to evaluate performance of proposed methods we provided an experimental design with some empirical problem in three different sizes, small, medium and large. We defined three different criteria that show which algorithm has the best performance.Keywords: Ant algorithm, Cellular manufacturing system, Data envelopment analysis, Efficiency
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 165712990 An Advanced Time-Frequency Domain Method for PD Extraction with Non-Intrusive Measurement
Authors: Guomin Luo, Daming Zhang, Yong Kwee Koh, Kim Teck Ng, Helmi Kurniawan, Weng Hoe Leong
Abstract:
Partial discharge (PD) detection is an important method to evaluate the insulation condition of metal-clad apparatus. Non-intrusive sensors which are easy to install and have no interruptions on operation are preferred in onsite PD detection. However, it often lacks of accuracy due to the interferences in PD signals. In this paper a novel PD extraction method that uses frequency analysis and entropy based time-frequency (TF) analysis is introduced. The repetitive pulses from convertor are first removed via frequency analysis. Then, the relative entropy and relative peak-frequency of each pulse (i.e. time-indexed vector TF spectrum) are calculated and all pulses with similar parameters are grouped. According to the characteristics of non-intrusive sensor and the frequency distribution of PDs, the pulses of PD and interferences are separated. Finally the PD signal and interferences are recovered via inverse TF transform. The de-noised result of noisy PD data demonstrates that the combination of frequency and time-frequency techniques can discriminate PDs from interferences with various frequency distributions.Keywords: Entropy, Fourier analysis, non-intrusive measurement, time-frequency analysis, partial discharge
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 159412989 Delay Analysis of Sampled-Data Systems in Hard RTOS
Authors: A. M. Azad, M. Alam, C. M. Hussain
Abstract:
In this paper, we have presented the effect of varying time-delays on performance and stability in the single-channel multirate sampled-data system in hard real-time (RT-Linux) environment. The sampling task require response time that might exceed the capacity of RT-Linux. So a straight implementation with RT-Linux is not feasible, because of the latency of the systems and hence, sampling period should be less to handle this task. The best sampling rate is chosen for the sampled-data system, which is the slowest rate meets all performance requirements. RT-Linux is consistent with its specifications and the resolution of the real-time is considered 0.01 seconds to achieve an efficient result. The test results of our laboratory experiment shows that the multi-rate control technique in hard real-time operating system (RTOS) can improve the stability problem caused by the random access delays and asynchronization.Keywords: Multi-rate, PID, RT-Linux, Sampled-data, Servo.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 144712988 Bridge Health Monitoring: A Review
Authors: Mohammad Bakhshandeh
Abstract:
Structural Health Monitoring (SHM) is a crucial and necessary practice that plays a vital role in ensuring the safety and integrity of critical structures, and in particular, bridges. The continuous monitoring of bridges for signs of damage or degradation through Bridge Health Monitoring (BHM) enables early detection of potential problems, allowing for prompt corrective action to be taken before significant damage occurs. Although all monitoring techniques aim to provide accurate and decisive information regarding the remaining useful life, safety, integrity, and serviceability of bridges, understanding the development and propagation of damage is vital for maintaining uninterrupted bridge operation. Over the years, extensive research has been conducted on BHM methods, and experts in the field have increasingly adopted new methodologies. In this article, we provide a comprehensive exploration of the various BHM approaches, including sensor-based, non-destructive testing (NDT), model-based, and artificial intelligence (AI)-based methods. We also discuss the challenges associated with BHM, including sensor placement and data acquisition, data analysis and interpretation, cost and complexity, and environmental effects, through an extensive review of relevant literature and research studies. Additionally, we examine potential solutions to these challenges and propose future research ideas to address critical gaps in BHM.
Keywords: Structural health monitoring, bridge health monitoring, sensor-based methods, machine-learning algorithms, model-based techniques, sensor placement, data acquisition, data analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 33212987 Time-Domain Analysis of Pulse Parameters Effects on Crosstalk (In High Speed Circuits)
Authors: L. Tani, N. El Ouzzani
Abstract:
Crosstalk among interconnects and printed-circuit board (PCB) traces is a major limiting factor of signal quality in highspeed digital and communication equipments especially when fast data buses are involved. Such a bus is considered as a planar multiconductor transmission line. This paper will demonstrate how the finite difference time domain (FDTD) method provides an exact solution of the transmission-line equations to analyze the near end and the far end crosstalk. In addition, this study makes it possible to analyze the rise time effect on the near and far end voltages of the victim conductor. The paper also discusses a statistical analysis, based upon a set of several simulations. Such analysis leads to a better understanding of the phenomenon and yields useful information.Keywords: Multiconductor transmission line, Crosstalk, Finite difference time domain (FDTD), printed-circuit board (PCB), Rise time, Statistical analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 177612986 Elemental Graph Data Model: A Semantic and Topological Representation of Building Elements
Authors: Yasmeen A. S. Essawy, Khaled Nassar
Abstract:
With the rapid increase of complexity in the building industry, professionals in the A/E/C industry were forced to adopt Building Information Modeling (BIM) in order to enhance the communication between the different project stakeholders throughout the project life cycle and create a semantic object-oriented building model that can support geometric-topological analysis of building elements during design and construction. This paper presents a model that extracts topological relationships and geometrical properties of building elements from an existing fully designed BIM, and maps this information into a directed acyclic Elemental Graph Data Model (EGDM). The model incorporates BIM-based search algorithms for automatic deduction of geometrical data and topological relationships for each building element type. Using graph search algorithms, such as Depth First Search (DFS) and topological sortings, all possible construction sequences can be generated and compared against production and construction rules to generate an optimized construction sequence and its associated schedule. The model is implemented in a C# platform.
Keywords: Building information modeling, elemental graph data model, geometric and topological data models, and graph theory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 120712985 Constructing a Bayesian Network for Solar Energy in Egypt Using Life Cycle Analysis and Machine Learning Algorithms
Authors: Rawaa H. El-Bidweihy, Hisham M. Abdelsalam, Ihab A. El-Khodary
Abstract:
In an era where machines run and shape our world, the need for a stable, non-ending source of energy emerges. In this study, the focus was on the solar energy in Egypt as a renewable source, the most important factors that could affect the solar energy’s market share throughout its life cycle production were analyzed and filtered, the relationships between them were derived before structuring a Bayesian network. Also, forecasted models were built for multiple factors to predict the states in Egypt by 2035, based on historical data and patterns, to be used as the nodes’ states in the network. 37 factors were found to might have an impact on the use of solar energy and then were deducted to 12 factors that were chosen to be the most effective to the solar energy’s life cycle in Egypt, based on surveying experts and data analysis, some of the factors were found to be recurring in multiple stages. The presented Bayesian network could be used later for scenario and decision analysis of using solar energy in Egypt, as a stable renewable source for generating any type of energy needed.
Keywords: ARIMA, auto correlation, Bayesian network, forecasting models, life cycle, partial correlation, renewable energy, SARIMA, solar energy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 78612984 An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions
Authors: R. Mallika, V. Saravanan
Abstract:
This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.Keywords: Support vector machines-one against all, cancerclassification, Linear Discriminant analysis, K nearest neighbour, microarray gene expression, gene pair ranking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 256512983 Analytical Slope Stability Analysis Based on the Statistical Characterization of Soil Shear Strength
Authors: Bernardo C. P. Albuquerque, Darym J. F. Campos
Abstract:
Increasing our ability to solve complex engineering problems is directly related to the processing capacity of computers. By means of such equipments, one is able to fast and accurately run numerical algorithms. Besides the increasing interest in numerical simulations, probabilistic approaches are also of great importance. This way, statistical tools have shown their relevance to the modelling of practical engineering problems. In general, statistical approaches to such problems consider that the random variables involved follow a normal distribution. This assumption tends to provide incorrect results when skew data is present since normal distributions are symmetric about their means. Thus, in order to visualize and quantify this aspect, 9 statistical distributions (symmetric and skew) have been considered to model a hypothetical slope stability problem. The data modeled is the friction angle of a superficial soil in Brasilia, Brazil. Despite the apparent universality, the normal distribution did not qualify as the best fit. In the present effort, data obtained in consolidated-drained triaxial tests and saturated direct shear tests have been modeled and used to analytically derive the probability density function (PDF) of the safety factor of a hypothetical slope based on Mohr-Coulomb rupture criterion. Therefore, based on this analysis, it is possible to explicitly derive the failure probability considering the friction angle as a random variable. Furthermore, it is possible to compare the stability analysis when the friction angle is modelled as a Dagum distribution (distribution that presented the best fit to the histogram) and as a Normal distribution. This comparison leads to relevant differences when analyzed in light of the risk management.Keywords: Statistical slope stability analysis, Skew distributions, Probability of failure, Functions of random variables.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 154912982 Methodology Issues and Design Approach of VLE on Mathematical Concepts Acquisition within Secondary Education in England
Authors: Aaron A. R. Nwabude
Abstract:
This study used positivist quantitative approach to examine the mathematical concepts acquisition of- KS4 (14-16) Special Education Needs (SENs) students within the school sector education in England. The research is based on a pilot study and the design is completely holistic in its approach with mixing methodologies. The study combines the qualitative and quantitative methods of approach in gathering formative data for the design process. Although, the approach could best be described as a mix method, fundamentally with a strong positivist paradigm, hence my earlier understanding of the differentiation of the students, student – teacher body and the various elements of indicators that is being measured which will require an attenuated description of individual research subjects. The design process involves four phases with five key stages which are; literature review and document analysis, the survey, interview, and observation; then finally the analysis of data set. The research identified the need for triangulation with Reid-s phases of data management providing scaffold for the study. The study clearly identified the ideological and philosophical aspects of educational research design for the study of mathematics by the special education needs (SENs) students in England using the virtual learning environment (VLE) platform.
Keywords: VLE, Special Education Needs, Key stage4, School, Mathematics, Concepts Acquisition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 198312981 Social Network Analysis & Information Disclosure: A Case Study
Authors: Shilpi Sharma, J. S. Sodhi
Abstract:
The advent of social networking technologies has been met with mixed reactions in academic and corporate circles around the world. This study explored the influence of social network in current era, the relation being maintained between the Social networking site and its user by the extent of use, benefits and latest technologies. The study followed a descriptive research design wherein a questionnaire was used as the main research tool. The data collected was analyzed using SPSS 16. Data was gathered from 1205 users and analyzed in accordance with the objectives of the study. The analysis of the results seem to suggest that the majority of users were mainly using Facebook, despite of concerns raised about the disclosure of personal information on social network sites, users continue to disclose huge quantity of personal information, they find that reading privacy policy is time consuming and changes made can result into improper settings.
Keywords: Social Networking Sites, Privacy Policy, Disclosure of Personal Information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 206412980 Scale Development for Measuring E-Service Quality in Banking
Authors: Vivek Agrawal, Vikas Tripathi, Nitin Seth
Abstract:
This study examines several critical dimensions of eservice quality overlooked in the existing literature and proposes a model and instrument framework for measuring customer perceived e-service quality in the banking sector. The initial design was derived from a pool of instrument dimensions and their items from the existing literature review by content analysis. Based on focused group discussion, nine dimensions were extracted. An exploratory factor analysis approach was applied to data from a survey of 323 respondents. The instrument has been designed specifically for the banking sector. Research data was collected from bank customers who use electronic banking in a developing economy. A nine-factor instrument has been proposed to measure the e-service quality. The instrument has been checked for reliability. The validity and sample place limited the applicability of the instrument across economies and service categories. Future research must be conducted to check the validity. This instrument can help bankers in developing economies like India to measure the e-service quality and make improvements. The present study offers a systematic procedure that provides insights on to the conceptual and empirical comprehension of customer perceived e-service quality and its constituents.
Keywords: Testing, instrument, e-service quality, factor analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 384312979 Preliminary Overview of Data Mining Technology for Knowledge Management System in Institutions of Higher Learning
Authors: Muslihah Wook, Zawiyah M. Yusof, Mohd Zakree Ahmad Nazri
Abstract:
Data mining has been integrated into application systems to enhance the quality of the decision-making process. This study aims to focus on the integration of data mining technology and Knowledge Management System (KMS), due to the ability of data mining technology to create useful knowledge from large volumes of data. Meanwhile, KMS vitally support the creation and use of knowledge. The integration of data mining technology and KMS are popularly used in business for enhancing and sustaining organizational performance. However, there is a lack of studies that applied data mining technology and KMS in the education sector; particularly students- academic performance since this could reflect the IHL performance. Realizing its importance, this study seeks to integrate data mining technology and KMS to promote an effective management of knowledge within IHLs. Several concepts from literature are adapted, for proposing the new integrative data mining technology and KMS framework to an IHL.
Keywords: Data mining, Institutions of Higher Learning, Knowledge Management System, Students' academic performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 214612978 Towards a Secure Storage in Cloud Computing
Authors: Mohamed Elkholy, Ahmed Elfatatry
Abstract:
Cloud computing has emerged as a flexible computing paradigm that reshaped the Information Technology map. However, cloud computing brought about a number of security challenges as a result of the physical distribution of computational resources and the limited control that users have over the physical storage. This situation raises many security challenges for data integrity and confidentiality as well as authentication and access control. This work proposes a security mechanism for data integrity that allows a data owner to be aware of any modification that takes place to his data. The data integrity mechanism is integrated with an extended Kerberos authentication that ensures authorized access control. The proposed mechanism protects data confidentiality even if data are stored on an untrusted storage. The proposed mechanism has been evaluated against different types of attacks and proved its efficiency to protect cloud data storage from different malicious attacks.Keywords: Access control, data integrity, data confidentiality, Kerberos authentication, cloud security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 177412977 Application of Data Envelopment Analysis and Performance Indicators to Irrigation Systems in Thessaloniki Plain (Greece)
Authors: Ntantos P.N, Karpouzos D.K
Abstract:
In this paper, a benchmarking framework is presented for the performance assessment of irrigations systems. Firstly, a data envelopment analysis (DEA) is applied to measure the technical efficiency of irrigation systems. This method, based on linear programming, aims to determine a consistent efficiency ranking of irrigation systems in which known inputs, such as water volume supplied and total irrigated area, and a given output corresponding to the total value of irrigation production are taken into account simultaneously. Secondly, in order to examine the irrigation efficiency in more detail, a cross – system comparison is elaborated using a performance indicators set selected by IWMI. The above methodologies were applied in Thessaloniki plain, located in Northern Greece while the results of the application are presented and discussed. The conjunctive use of DEA and performance indicators seems to be a very useful tool for efficiency assessment and identification of best practices in irrigation systems management.Keywords: Benchmarking, D.E.A, Performance Indicators, Irrigation systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 210112976 Freighter Aircraft Selection Using Entropic Programming for Multiple Criteria Decision Making Analysis
Authors: C. Ardil
Abstract:
This paper proposes entropic programming for the freighter aircraft selection problem using the multiple criteria decision analysis method. The study aims to propose a systematic and comprehensive framework by focusing on the perspective of freighter aircraft selection. In order to achieve this goal, an integrated entropic programming approach was proposed to evaluate and rank alternatives. The decision criteria and aircraft alternatives were identified from the research data analysis. The objective criteria weights were determined by the mean weight method and the standard deviation method. The proposed entropic programming model was applied to a practical decision problem for evaluating and selecting freighter aircraft. The proposed entropic programming technique gives robust, reliable, and efficient results in modeling decision making analysis problems. As a result of entropic programming analysis, Boeing B747-8F, a freighter aircraft alternative ( a3), was chosen as the most suitable freighter aircraft candidate.
Keywords: entropic programming, additive weighted model, multiple criteria decision making analysis, MCDMA, TOPSIS, aircraft selection, freighter aircraft, Boeing B747-8F, Boeing B777F, Airbus A350F
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 55212975 Dynamic Metadata Schemes in the Neutron and Photon Science Communities: A Case Study of X-Ray Photon Correlation Spectroscopy
Authors: Amir Tosson, Mohammad Reza, Christian Gutt
Abstract:
Metadata is one of the most important aspects for advancing data management practices within all research communities. Definitions and schemes of metadata are inter alia of particular significance in the domain of neutron and photon scattering experiments covering a broad area of different scientific disciplines. The demand of describing continuously evolving highly non-standardized experiments, including the resulting processed and published data, constitutes a considerable challenge for a static definition of metadata. Here, we present the concept of dynamic metadata for the neutron and photon scientific community, which enriches a static set of defined basic metadata. We explore the idea of dynamic metadata with the help of the use case of X-ray Photon Correlation Spectroscopy (XPCS), which is a synchrotron-based scattering technique that allows the investigation of nanoscale dynamic processes. It serves here as a demonstrator of how dynamic metadata can improve data acquisition, sharing, and analysis workflows. Our approach enables researchers to tailor metadata definitions dynamically and adapt them to the evolving demands of describing data and results from a diverse set of experiments. We demonstrate that dynamic metadata standards yield advantages that enhance data reproducibility, interoperability, and the dissemination of knowledge.
Keywords: Big data, metadata, schemas, XPCS, X-ray Photon Correlation Spectroscopy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16012974 A Software Tool Design for Cerebral Infarction of MR Images
Authors: Kyoung-Jong Park, Woong-Gi Jeon, Hee-Cheol Kim, Dong-Eog Kim, Heung-Kook Choi
Abstract:
The brain MR imaging-based clinical research and analysis system were specifically built and the development for a large-scale data was targeted. We used the general clinical data available for building large-scale data. Registration period for the selection of the lesion ROI and the region growing algorithm was used and the Mesh-warp algorithm for matching was implemented. The accuracy of the matching errors was modified individually. Also, the large ROI research data can accumulate by our developed compression method. In this way, the correctly decision criteria to the research result was suggested. The experimental groups were age, sex, MR type, patient ID and smoking which can easily be queries. The result data was visualized of the overlapped images by a color table. Its data was calculated by the statistical package. The evaluation for the utilization of this system in the chronic ischemic damage in the area has done from patients with the acute cerebral infarction. This is the cause of neurologic disability index location in the center portion of the lateral ventricle facing. The corona radiate was found in the position. Finally, the system reliability was measured both inter-user and intra-user registering correlation.
Keywords: Software tool design, Cerebral infarction, Brain MR image, Registration
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166712973 Banks Profitability Indicators in CEE Countries
Abstract:
The aim of the present article is to determine the impact of the external and internal factors of bank performance on the profitability indicators of the CEE countries banks in the period from 2006 to 2012. On the basis of research conducted abroad on bank and macroeconomic profitability indicators, in order to obtain research results, the authors evaluated return on average assets (ROAA) and return on average equity (ROAE) indicators of the CEE countries banks. The authors analyzed profitability indicators of banks using descriptive methods, SPSS data analysis methods, as well as data correlation and linear regression analysis. The authors concluded that most internal and external indicators of bank performance have no direct influence the profitability of the banks in the CEE countries. The only exceptions are credit risk and bank size, which affect one of the measures of bank profitability – return on average equity.
Keywords: Banks, CEE countries, Profitability ROAA, ROAE.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 266512972 Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System
Authors: A. Gruzdz, A. Ihnatowicz, J. Siddiqi, B. Akhgar
Abstract:
MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.Keywords: Bioinformatics, gene expression, ontology, selforganizingmaps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 197712971 Addressing Data Security in the Cloud
Authors: Marinela Mircea
Abstract:
The development of information and communication technology, the increased use of the internet, as well as the effects of the recession within the last years, have lead to the increased use of cloud computing based solutions, also called on-demand solutions. These solutions offer a large number of benefits to organizations as well as challenges and risks, mainly determined by data visualization in different geographic locations on the internet. As far as the specific risks of cloud environment are concerned, data security is still considered a peak barrier in adopting cloud computing. The present study offers an approach upon ensuring the security of cloud data, oriented towards the whole data life cycle. The final part of the study focuses on the assessment of data security in the cloud, this representing the bases in determining the potential losses and the premise for subsequent improvements and continuous learning.Keywords: cloud computing, data life cycle, data security, security assessment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 216412970 VaR Forecasting in Times of Increased Volatility
Authors: Ivo Jánský, Milan Rippel
Abstract:
The paper evaluates several hundred one-day-ahead VaR forecasting models in the time period between the years 2004 and 2009 on data from six world stock indices - DJI, GSPC, IXIC, FTSE, GDAXI and N225. The models model mean using the ARMA processes with up to two lags and variance with one of GARCH, EGARCH or TARCH processes with up to two lags. The models are estimated on the data from the in-sample period and their forecasting accuracy is evaluated on the out-of-sample data, which are more volatile. The main aim of the paper is to test whether a model estimated on data with lower volatility can be used in periods with higher volatility. The evaluation is based on the conditional coverage test and is performed on each stock index separately. The primary result of the paper is that the volatility is best modelled using a GARCH process and that an ARMA process pattern cannot be found in analyzed time series.Keywords: VaR, risk analysis, conditional volatility, garch, egarch, tarch, moving average process, autoregressive process
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 143212969 1/Sigma Term Weighting Scheme for Sentiment Analysis
Authors: Hanan Alshaher, Jinsheng Xu
Abstract:
Large amounts of data on the web can provide valuable information. For example, product reviews help business owners measure customer satisfaction. Sentiment analysis classifies texts into two polarities: positive and negative. This paper examines movie reviews and tweets using a new term weighting scheme, called one-over-sigma (1/sigma), on benchmark datasets for sentiment classification. The proposed method aims to improve the performance of sentiment classification. The results show that 1/sigma is more accurate than the popular term weighting schemes. In order to verify if the entropy reflects the discriminating power of terms, we report a comparison of entropy values for different term weighting schemes.
Keywords: Sentiment analysis, term weighting scheme, 1/sigma.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 53812968 Analysis of Train Passenger Seat Using Ergonomic Function Deployment Method
Authors: Robertoes K. K. Wibowo, Siswoyo Soekarno, Irma Puspitasari
Abstract:
Indonesian people use trains for their transportation, especially they use economy class train transportation because it is cheaper and has a more precise schedule than any other ground transportation. Nevertheless, the economy class passenger seat raises some inconvenience issues for passengers. This is due to the design of the chair on the economic class of trains that did not adjusted to the shape of anthropometry of Indonesian people. Thus, research needs to be conducted on the design of the seats in the economic class of trains. The purpose of this research is to make the design of economy class passenger seats ergonomic. This research method uses questionnaires and anthropometry measurements. The data obtained is processed using House of Quality of Ergonomic Function Development. From the results of analysis and data processing were obtained important changes from the original design. Ergonomic chair design according to the analysis is a stainless steel frame, seat height 390 mm, with a seat width for each passenger of 400 mm and a depth of 400 mm. Design of the backrest has a height of 840 mm, width of 430 mm and length of 300 mm that can move at the angle of 105-115 degrees. The width of the footrest is 42 mm and 400 mm length. The thickness of the seat cushion is 100 mm.
Keywords: Chair, ergonomics, function development, train passenger.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 183212967 Design and Implementation of Security Middleware for Data Warehouse Signature Framework
Authors: Mayada AlMeghari
Abstract:
Recently, grid middlewares have provided large integrated use of network resources as the shared data and the CPU to become a virtual supercomputer. In this work, we present the design and implementation of the middleware for Data Warehouse Signature (DWS) Framework. The aim of using the middleware in the proposed DWS framework is to achieve the high performance by the parallel computing. This middleware is developed on Alchemi.Net framework to increase the security among the network nodes through the authentication and group-key distribution model. This model achieves the key security and prevents any intermediate attacks in the middleware. This paper presents the flow process structures of the middleware design. In addition, the paper ensures the implementation of security for DWS middleware enhancement with the authentication and group-key distribution model. Finally, from the analysis of other middleware approaches, the developed middleware of DWS framework is the optimal solution of a complete covering of security issues.
Keywords: Middleware, parallel computing, data warehouse, security, group-key, high performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34612966 Analysis of the Structural Fluctuation of the Permitted Building Areas and Housing Distribution Ratios - Focused on 5 Cities Including Bucheon
Authors: Cheon Sik Min, Hyeong Wook Song, Sook Yeon Shim, Hoon Chang
Abstract:
The purpose of this study was to analyze the correlation between permitted building areas and housing distribution ratios and their fluctuation, and test a distribution model during 3 successive governments in 5 cities including Bucheon in reference to the time series administrative data, and thereby, interpret the results of the analysis in association with the policies pursued by the successive governments to examine the structural fluctuation of permitted building areas and housing distribution ratios. In order to analyze the fluctuation of permitted building areas and housing distribution ratios during 3 successive governments and examine the cycles of the time series data, the spectral analysis was performed, and in order to analyze the correlation between permitted building areas and housing distribution ratios, the tabulation was performed to describe the correlations statistically, and in order to explain about differences of fluctuation distribution of permitted building areas and housing distribution ratios among 3 governments, the goodness of fit test was conducted.Keywords: The Permitted Building Areas, Housing Distribution Ratios, the Structural Fluctuation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 119912965 A Network Traffic Prediction Algorithm Based On Data Mining Technique
Authors: D. Prangchumpol
Abstract:
This paper is a description approach to predict incoming and outgoing data rate in network system by using association rule discover, which is one of the data mining techniques. Information of incoming and outgoing data in each times and network bandwidth are network performance parameters, which needed to solve in the traffic problem. Since congestion and data loss are important network problems. The result of this technique can predicted future network traffic. In addition, this research is useful for network routing selection and network performance improvement.
Keywords: Traffic prediction, association rule, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 367412964 Fuzzy Processing of Uncertain Data
Authors: Petr Morávek, Miloš Šeda
Abstract:
In practice, we often come across situations where it is necessary to make decisions based on incomplete or uncertain data. In control systems it may be due to the unknown exact mathematical model, or its excessive complexity (e.g. nonlinearity) when it is necessary to simplify it, respectively, to solve it using a rule base. In the case of databases, searching data we compare a similarity measure with of the requirements of the selection with stored data, where both the select query and the data itself may contain vague terms, for example in the form of linguistic qualifiers. In this paper, we focus on the processing of uncertain data in databases and demonstrate it on the example multi-criteria decision making in the selection of variants, specified by higher number of technical parameters.Keywords: fuzzy logic, linguistic variable, multicriteria decision
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 142112963 Analysis of Mathematical Models and Their Application to Extreme Events
Authors: Avellino I. Mondlane, Karin Hansson, Oliver Popov
Abstract:
This paper discusses the application of extreme events distribution taking the Limpopo River Basin at Xai-Xai station, in Mozambique, as a case analysis. We analyze the extreme value concepts, namely Gumbel, Fréchet, Weibull and Generalized Extreme Value Distributions and then extrapolate the original data to 1000, 5000 and 10000 figures for further simulations and we compare their outcomes based on these three main distributions.
Keywords: Catastrophes, extreme event, disasters, mathematical models, simulation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2524