Search results for: genetic breeding models
7651 Leveraging Unannotated Data to Improve Question Answering for French Contract Analysis
Authors: Touila Ahmed, Elie Louis, Hamza Gharbi
Abstract:
State of the art question answering models have recently shown impressive performance especially in a zero-shot setting. This approach is particularly useful when confronted with a highly diverse domain such as the legal field, in which it is increasingly difficult to have a dataset covering every notion and concept. In this work, we propose a flexible generative question answering approach to contract analysis as well as a weakly supervised procedure to leverage unannotated data and boost our models’ performance in general, and their zero-shot performance in particular.Keywords: question answering, contract analysis, zero-shot, natural language processing, generative models, self-supervision
Procedia PDF Downloads 1947650 Dow Polyols near Infrared Chemometric Model Reduction Based on Clustering: Reducing Thirty Global Hydroxyl Number (OH) Models to Less Than Five
Authors: Wendy Flory, Kazi Czarnecki, Matthijs Mercy, Mark Joswiak, Mary Beth Seasholtz
Abstract:
Polyurethane Materials are present in a wide range of industrial segments such as Furniture, Building and Construction, Composites, Automotive, Electronics, and more. Dow is one of the leaders for the manufacture of the two main raw materials, Isocyanates and Polyols used to produce polyurethane products. Dow is also a key player for the manufacture of Polyurethane Systems/Formulations designed for targeted applications. In 1990, the first analytical chemometric models were developed and deployed for use in the Dow QC labs of the polyols business for the quantification of OH, water, cloud point, and viscosity. Over the years many models have been added; there are now over 140 models for quantification and hundreds for product identification, too many to be reasonable for support. There are 29 global models alone for the quantification of OH across > 70 products at many sites. An attempt was made to consolidate these into a single model. While the consolidated model proved good statistics across the entire range of OH, several products had a bias by ASTM E1655 with individual product validation. This project summary will show the strategy for global model updates for OH, to reduce the number of models for quantification from over 140 to 5 or less using chemometric methods. In order to gain an understanding of the best product groupings, we identify clusters by reducing spectra to a few dimensions via Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). Results from these cluster analyses and a separate validation set allowed dow to reduce the number of models for predicting OH from 29 to 3 without loss of accuracy.Keywords: hydroxyl, global model, model maintenance, near infrared, polyol
Procedia PDF Downloads 1357649 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue
Authors: Rachel Y. Zhang, Christopher K. Anderson
Abstract:
A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine
Procedia PDF Downloads 1337648 Optimization of Solar Rankine Cycle by Exergy Analysis and Genetic Algorithm
Authors: R. Akbari, M. A. Ehyaei, R. Shahi Shavvon
Abstract:
Nowadays, solar energy is used for energy purposes such as the use of thermal energy for domestic, industrial and power applications, as well as the conversion of the sunlight into electricity by photovoltaic cells. In this study, the thermodynamic simulation of the solar Rankin cycle with phase change material (paraffin) was first studied. Then energy and exergy analyses were performed. For optimization, a single and multi-objective genetic optimization algorithm to maximize thermal and exergy efficiency was used. The parameters discussed in this paper included the effects of input pressure on turbines, input mass flow to turbines, the surface of converters and collector angles on thermal and exergy efficiency. In the organic Rankin cycle, where solar energy is used as input energy, the fluid selection is considered as a necessary factor to achieve reliable and efficient operation. Therefore, silicon oil is selected for a high-temperature cycle and water for a low-temperature cycle as an operating fluid. The results showed that increasing the mass flow to turbines 1 and 2 would increase thermal efficiency, while it reduces and increases the exergy efficiency in turbines 1 and 2, respectively. Increasing the inlet pressure to the turbine 1 decreases the thermal and exergy efficiency, and increasing the inlet pressure to the turbine 2 increases the thermal efficiency and exergy efficiency. Also, increasing the angle of the collector increased thermal efficiency and exergy. The thermal efficiency of the system was 22.3% which improves to 33.2 and 27.2% in single-objective and multi-objective optimization, respectively. Also, the exergy efficiency of the system was 1.33% which has been improved to 1.719 and 1.529% in single-objective and multi-objective optimization, respectively. These results showed that the thermal and exergy efficiency in a single-objective optimization is greater than the multi-objective optimization.Keywords: exergy analysis, genetic algorithm, rankine cycle, single and multi-objective function
Procedia PDF Downloads 1477647 Text Similarity in Vector Space Models: A Comparative Study
Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge
Abstract:
Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.Keywords: big data, patent, text embedding, text similarity, vector space model
Procedia PDF Downloads 1757646 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins
Authors: Navab Karimi, Tohid Alizadeh
Abstract:
An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.
Procedia PDF Downloads 737645 Geographic Information System for District Level Energy Performance Simulations
Authors: Avichal Malhotra, Jerome Frisch, Christoph van Treeck
Abstract:
The utilization of semantic, cadastral and topological data from geographic information systems (GIS) has exponentially increased for building and urban-scale energy performance simulations. Urban planners, simulation scientists, and researchers use virtual 3D city models for energy analysis, algorithms and simulation tools. For dynamic energy simulations at city and district level, this paper provides an overview of the available GIS data models and their levels of detail. Adhering to different norms and standards, these models also intend to describe building and construction industry data. For further investigations, CityGML data models are considered for simulations. Though geographical information modelling has considerably many different implementations, extensions of virtual city data can also be made for domain specific applications. Highlighting the use of the extended CityGML models for energy researches, a brief introduction to the Energy Application Domain Extension (ADE) along with its significance is made. Consequently, addressing specific input simulation data, a workflow using Modelica underlining the usage of GIS information and the quantification of its significance over annual heating energy demand is presented in this paper.Keywords: CityGML, EnergyADE, energy performance simulation, GIS
Procedia PDF Downloads 1687644 Genetics of Birth and Weaning Weight of Holstein, Friesians in Sudan
Authors: Safa A. Mohammed Ali, Ammar S. Ahamed, Mohammed Khair Abdalla
Abstract:
The objectives of this study were to estimate the means and genetic parameters of birth and weaning weight of calves of pure Holstein-Friesian cows raised in Sudan. The traits studied were:*Weight at birth *Weight at weaning. The study also included some of the important factors that affected these traits. The data were analyzed using Harvey’s Least Squares and Maximum Likelihood programme. The results obtained showed that the overall mean weight at birth of the calves under study was 34.36±0.94kg. Male calves were found to be heavier than females; the difference between the sexes was highly significant (P<0.001). The mean weight at birth of male calves was 34.27±1.17 kg while that of females was 32.51±1.14kg. The effect of sex of calves, sire and parity of dam were highly significant (P<0.001). The overall mean of weight at weaning was 67.10 ± 5.05 kg, weight at weaning was significantly (p<0.001) effected by sex of calves, sire, year and season of birth have highly significant (P<0.001) effect on either trait. Also estimates heritabilities of birth weight was (0.033±0.015) lower than heritabilities of weaning weight (0.224±0.039), and genetic correlation was 0.563, the phenotypic correlation 0.281, and the environmental correlation 0.268.Keywords: birth, weaning, weight, friesian
Procedia PDF Downloads 6657643 Non-Destructive Static Damage Detection of Structures Using Genetic Algorithm
Authors: Amir Abbas Fatemi, Zahra Tabrizian, Kabir Sadeghi
Abstract:
To find the location and severity of damage that occurs in a structure, characteristics changes in dynamic and static can be used. The non-destructive techniques are more common, economic, and reliable to detect the global or local damages in structures. This paper presents a non-destructive method in structural damage detection and assessment using GA and static data. Thus, a set of static forces is applied to some of degrees of freedom and the static responses (displacements) are measured at another set of DOFs. An analytical model of the truss structure is developed based on the available specification and the properties derived from static data. The damages in structure produce changes to its stiffness so this method used to determine damage based on change in the structural stiffness parameter. Changes in the static response which structural damage caused choose to produce some simultaneous equations. Genetic Algorithms are powerful tools for solving large optimization problems. Optimization is considered to minimize objective function involve difference between the static load vector of damaged and healthy structure. Several scenarios defined for damage detection (single scenario and multiple scenarios). The static damage identification methods have many advantages, but some difficulties still exist. So it is important to achieve the best damage identification and if the best result is obtained it means that the method is Reliable. This strategy is applied to a plane truss. This method is used for a plane truss. Numerical results demonstrate the ability of this method in detecting damage in given structures. Also figures show damage detections in multiple damage scenarios have really efficient answer. Even existence of noise in the measurements doesn’t reduce the accuracy of damage detections method in these structures.Keywords: damage detection, finite element method, static data, non-destructive, genetic algorithm
Procedia PDF Downloads 2377642 Talent-to-Vec: Using Network Graphs to Validate Models with Data Sparsity
Authors: Shaan Khosla, Jon Krohn
Abstract:
In a recruiting context, machine learning models are valuable for recommendations: to predict the best candidates for a vacancy, to match the best vacancies for a candidate, and compile a set of similar candidates for any given candidate. While useful to create these models, validating their accuracy in a recommendation context is difficult due to a sparsity of data. In this report, we use network graph data to generate useful representations for candidates and vacancies. We use candidates and vacancies as network nodes and designate a bi-directional link between them based on the candidate interviewing for the vacancy. After using node2vec, the embeddings are used to construct a validation dataset with a ranked order, which will help validate new recommender systems.Keywords: AI, machine learning, NLP, recruiting
Procedia PDF Downloads 847641 Bridging the Gap between Different Interfaces for Business Process Modeling
Authors: Katalina Grigorova, Kaloyan Mironov
Abstract:
The paper focuses on the benefits of business process modeling. Although this discipline is developing for many years, there is still necessity of creating new opportunities to meet the ever-increasing users’ needs. Because one of these needs is related to the conversion of business process models from one standard to another, the authors have developed a converter between BPMN and EPC standards using workflow patterns as intermediate tool. Nowadays there are too many systems for business process modeling. The variety of output formats is almost the same as the systems themselves. This diversity additionally hampers the conversion of the models. The presented study is aimed at discussing problems due to differences in the output formats of various modeling environments.Keywords: business process modeling, business process modeling standards, workflow patterns, converting models
Procedia PDF Downloads 5877640 Mutual Information Based Image Registration of Satellite Images Using PSO-GA Hybrid Algorithm
Authors: Dipti Patra, Guguloth Uma, Smita Pradhan
Abstract:
Registration is a fundamental task in image processing. It is used to transform different sets of data into one coordinate system, where data are acquired from different times, different viewing angles, and/or different sensors. The registration geometrically aligns two images (the reference and target images). Registration techniques are used in satellite images and it is important in order to be able to compare or integrate the data obtained from these different measurements. In this work, mutual information is considered as a similarity metric for registration of satellite images. The transformation is assumed to be a rigid transformation. An attempt has been made here to optimize the transformation function. The proposed image registration technique hybrid PSO-GA incorporates the notion of Particle Swarm Optimization and Genetic Algorithm and is used for finding the best optimum values of transformation parameters. The performance comparision obtained with the experiments on satellite images found that the proposed hybrid PSO-GA algorithm outperforms the other algorithms in terms of mutual information and registration accuracy.Keywords: image registration, genetic algorithm, particle swarm optimization, hybrid PSO-GA algorithm and mutual information
Procedia PDF Downloads 4087639 Distangling Biological Noise in Cellular Images with a Focus on Explainability
Authors: Manik Sharma, Ganapathy Krishnamurthi
Abstract:
The cost of some drugs and medical treatments has risen in recent years, that many patients are having to go without. A classification project could make researchers more efficient. One of the more surprising reasons behind the cost is how long it takes to bring new treatments to market. Despite improvements in technology and science, research and development continues to lag. In fact, finding new treatment takes, on average, more than 10 years and costs hundreds of millions of dollars. If successful, we could dramatically improve the industry's ability to model cellular images according to their relevant biology. In turn, greatly decreasing the cost of treatments and ensure these treatments get to patients faster. This work aims at solving a part of this problem by creating a cellular image classification model which can decipher the genetic perturbations in cell (occurring naturally or artificially). Another interesting question addressed is what makes the deep-learning model decide in a particular fashion, which can further help in demystifying the mechanism of action of certain perturbations and paves a way towards the explainability of the deep-learning model.Keywords: cellular images, genetic perturbations, deep-learning, explainability
Procedia PDF Downloads 1127638 Place and Importance of Goats in the Milk Sector in Algeria
Authors: Tennah Safia, Azzag Naouelle, Derdour Salima, Hafsi Fella, Laouadi Mourad, Laamari Abdalouahab, Ghalmi Farida, Kafidi Nacerredine
Abstract:
Currently, goat farming is widely practiced among the rural population of Algeria. Although milk yield of goats is low (110 liters per goat and per year on average), this milk partly ensures the feeding of small children and provides raw milk, curd, and fermented milk to the whole family. In addition, given its investment cost, which is ten times lower than that of a cow, this level of production is still of interest. This interest is reinforced by the qualities of goat's milk, highly sought after for its nutritional value superior to that of cow's milk. In the same way, its aptitude for the transformation, in particular in quality cheeses, is very sought after. The objective of this study is to give the situation of goat milk production in rural areas of Algeria and to establish a classification of goat breeds according to their production potential. For this, a survey was carried out with goat farmers in Algerian steppe. Three indigenous breeds were encountered in this study: the breed Arabia, Mozabite, and Mekatia; Arabia being the most dominant. The Mekatia breed and the Mozabite breed appear to have higher production and milking abilities than other local breeds. They are therefore indicated to play the role of local dairy breeds par excellence. The other breed that could be improved milk performance is the Arabia breed. There, however, the milk performance of this breed is low. However, in order to increase milk production, uncontrolled crosses with imported breeds (mainly Saanen and Alpine) were carried out. The third population that can be included in the category for dairy production is the dairy breed group of imported origin. There are farms in Algeria composed of Alpine and Saanen breeds born locally. Improved milk performance of local goats, Crusader population, and dairy breeds of imported origin could be done by selection. For this, it is necessary to set up a milk control to detect the best animals. This control could be carried out among interested farmers in each large goat breeding area. In conclusion, sustained efforts must be made to enable the sustainable development of the goat sector in Algeria. It will, therefore, be necessary to deepen the reflection on a national strategy to valorize goat's milk, taking into account the specificities of the environment, the genetic biodiversity, and the eating habits of the Algerian consumer.Keywords: goat, milk, Algeria, biodiversity
Procedia PDF Downloads 1857637 Hybrid Project Management Model Based on Lean and Agile Approach
Authors: Fatima-Zahra Eddoug, Jamal Benhra, Rajaa Benabbou
Abstract:
Several project management models exist in the literature and the most used ones are the hybrids for their multiple advantages. Our objective in this paper is to analyze the existing models, which are based on the Lean and Agile approaches and to propose a novel framework with the convenient tools that will allow efficient management of a general project. To create the desired framework, we were based essentially on 7 existing models. Only the Scrum tool among the agile tools was identified by several authors to be appropriate for project management. In contrast, multiple lean tools were proposed in different phases of the project.Keywords: agility, hybrid project management, lean, scrum
Procedia PDF Downloads 1387636 Photosynthesis Metabolism Affects Yield Potentials in Jatropha curcas L.: A Transcriptomic and Physiological Data Analysis
Authors: Nisha Govender, Siju Senan, Zeti-Azura Hussein, Wickneswari Ratnam
Abstract:
Jatropha curcas, a well-described bioenergy crop has been extensively accepted as future fuel need especially in tropical regions. Ideal planting material required for large-scale plantation is still lacking. Breeding programmes for improved J. curcas varieties are rendered difficult due to limitations in genetic diversity. Using a combined transcriptome and physiological data, we investigated the molecular and physiological differences in high and low yielding Jatropha curcas to address plausible heritable variations underpinning these differences, in regard to photosynthesis, a key metabolism affecting yield potentials. A total of 6 individual Jatropha plant from 4 accessions described as high and low yielding planting materials were selected from the Experimental Plot A, Universiti Kebangsaan Malaysia (UKM), Bangi. The inflorescence and shoots were collected for transcriptome study. For the physiological study, each individual plant (n=10) from the high and low yielding populations were screened for agronomic traits, chlorophyll content and stomatal patterning. The J. curcas transcriptomes are available under BioProject PRJNA338924 and BioSample SAMN05827448-65, respectively Each transcriptome was subjected to functional annotation analysis of sequence datasets using the BLAST2Go suite; BLASTing, mapping, annotation, statistical analysis and visualization Large-scale phenotyping of the number of fruits per plant (NFPP) and fruits per inflorescence (FPI) classified the high yielding Jatropha accessions with average NFPP =60 and FPI > 10, whereas the low yielding accessions yielded an average NFPP=10 and FPI < 5. Next generation sequencing revealed genes with differential expressions in the high yielding Jatropha relative to the low yielding plants. Distinct differences were observed in transcript level associated to photosynthesis metabolism. DEGs collection in the low yielding population showed comparable CAM photosynthetic metabolism and photorespiration, evident as followings: phosphoenolpyruvate phosphate translocator chloroplastic like isoform with 2.5 fold change (FC) and malate dehydrogenase (2.03 FC). Green leaves have the most pronounced photosynthetic activity in a plant body due to significant accumulation of chloroplast. In most plants, the leaf is always the dominant photosynthesizing heart of the plant body. Large number of the DEGS in the high-yielding population were found attributable to chloroplast and chloroplast associated events; STAY-GREEN chloroplastic, Chlorophyllase-1-like (5.08 FC), beta-amylase (3.66 FC), chlorophyllase-chloroplastic-like (3.1 FC), thiamine thiazole chloroplastic like (2.8 FC), 1-4, alpha glucan branching enzyme chloroplastic amyliplastic (2.6FC), photosynthetic NDH subunit (2.1 FC) and protochlorophyllide chloroplastic (2 FC). The results were parallel to a significant increase in chlorophyll a content in the high yielding population. In addition to the chloroplast associated transcript abundance, the TOO MANY MOUTHS (TMM) at 2.9 FC, which code for distant stomatal distribution and patterning in the high-yielding population may explain high concentration of CO2. The results were in agreement with the role of TMM. Clustered stomata causes back diffusion in the presence of gaps localized closely to one another. We conclude that high yielding Jatropha population corresponds to a collective function of C3 metabolism with a low degree of CAM photosynthetic fixation. From the physiological descriptions, high chlorophyll a content and even distribution of stomata in the leaf contribute to better photosynthetic efficiency in the high yielding Jatropha compared to the low yielding population.Keywords: chlorophyll, gene expression, genetic variation, stomata
Procedia PDF Downloads 2397635 Status of Reintroduced Houbara Bustard Chlamydotis macqueeni in Saudi Arabia
Authors: Mohammad Zafar-ul Islam
Abstract:
The breeding programme of Houbara bustard was started in Saudi Arabia in 1986 to undertake the restoration of native species such as Houbara through a programme of re-introduction, involving the release of captive-bred birds in the wild. Two sites were selected for houbara re-introduction, i.e., Mahazat as-Sayd and Saja Umm Ar-Rimth protected areas in 1988 and 1998 respectively. Both the areas are fenced fairly level, sandy plain with a few rock outcrops. Captive bred houbara have been released in Mahazat since 1992 by NWRC and those birds have been successfully breeding since then. The nesting season of the houbara at Mahazat recorded from February to May and on an average 20-25 nests are located each year but no nesting recorded in Saja. Houbara are monitored using radio transmitters through aerial tracking technique and also a vehicle for terrestrial tracking. Total population of houbara in Mahazat is roughly estimated around 300-400 birds, using the following: N = n1+n2+n3+n4+n5 (n1 = released or wild-born, radio, regularly monitored/checked; n2 = radio tagged missing; n3 = wild born chicks not recorded; n4 = wild born chicks, recorded but not tagged; n5 = immigrants). However, in Saja only 4-7 individuals of houbara have been survived since 2001 because most of the birds are predated immediately after the release. The mean annual home was also calculated using Kernel and Convex polygons methods with Range VII software. The minimum density of houbara was also calculated. In order to know the houbara movement or their migration to other regions, two captive-reared male houbara that were released into the wild and one wild born female were fitted with Platform Transmitter Terminals (PTT). The home range shows that wild-born female has larger movement than two males. More areas need to be selected for reintroduction programme to establish the network of sites to provide easy access to move these birds and mingle with the wild houbara. Some potential sites have been proposed which require more surveys to check the habitat suitability.Keywords: re-introduction, survival rate, home range, Saudi Arabia
Procedia PDF Downloads 4147634 Investigation of Soil Slopes Stability
Authors: Nima Farshidfar, Navid Daryasafar
Abstract:
In this paper, the seismic stability of reinforced soil slopes is studied using pseudo-dynamic analysis. Equilibrium equations that are applicable to the every kind of failure surface are written using Horizontal Slices Method. In written equations, the balance of the vertical and horizontal forces and moment equilibrium is fully satisfied. Failure surface is assumed to be log-spiral, and non-linear equilibrium equations obtained for the system are solved using Newton-Raphson Method. Earthquake effects are applied as horizontal and vertical pseudo-static coefficients to the problem. To solve this problem, a code was developed in MATLAB, and the critical failure surface is calculated using genetic algorithm. At the end, comparing the results obtained in this paper, effects of various parameters and the effect of using pseudo - dynamic analysis in seismic forces modeling is presented.Keywords: soil slopes, pseudo-dynamic, genetic algorithm, optimization, limit equilibrium method, log-spiral failure surface
Procedia PDF Downloads 3397633 Microarray Gene Expression Data Dimensionality Reduction Using PCA
Authors: Fuad M. Alkoot
Abstract:
Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.Keywords: PCA, gene expression, dimensionality reduction, classification, autism
Procedia PDF Downloads 5607632 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer
Authors: Binder Hans
Abstract:
Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas
Procedia PDF Downloads 1487631 Multiple Linear Regression for Rapid Estimation of Subsurface Resistivity from Apparent Resistivity Measurements
Authors: Sabiu Bala Muhammad, Rosli Saad
Abstract:
Multiple linear regression (MLR) models for fast estimation of true subsurface resistivity from apparent resistivity field measurements are developed and assessed in this study. The parameters investigated were apparent resistivity (ρₐ), horizontal location (X) and depth (Z) of measurement as the independent variables; and true resistivity (ρₜ) as the dependent variable. To achieve linearity in both resistivity variables, datasets were first transformed into logarithmic domain following diagnostic checks of normality of the dependent variable and heteroscedasticity to ensure accurate models. Four MLR models were developed based on hierarchical combination of the independent variables. The generated MLR coefficients were applied to another data set to estimate ρₜ values for validation. Contours of the estimated ρₜ values were plotted and compared to the observed data plots at the colour scale and blanking for visual assessment. The accuracy of the models was assessed using coefficient of determination (R²), standard error (SE) and weighted mean absolute percentage error (wMAPE). It is concluded that the MLR models can estimate ρₜ for with high level of accuracy.Keywords: apparent resistivity, depth, horizontal location, multiple linear regression, true resistivity
Procedia PDF Downloads 2767630 Evaluation of Newly Synthesized Steroid Derivatives Using In silico Molecular Descriptors and Chemometric Techniques
Authors: Milica Ž. Karadžić, Lidija R. Jevrić, Sanja Podunavac-Kuzmanović, Strahinja Z. Kovačević, Anamarija I. Mandić, Katarina Penov-Gaši, Andrea R. Nikolić, Aleksandar M. Oklješa
Abstract:
This study considered selection of the in silico molecular descriptors and the models for newly synthesized steroid derivatives description and their characterization using chemometric techniques. Multiple linear regression (MLR) models were established and gave the best molecular descriptors for quantitative structure-retention relationship (QSRR) modeling of the retention of the investigated molecules. MLR models were without multicollinearity among the selected molecular descriptors according to the variance inflation factor (VIF) values. Used molecular descriptors were ranked using generalized pair correlation method (GPCM). In this method, the significant difference between independent variables can be noticed regardless almost equal correlation between dependent variable. Generated MLR models were statistically and cross-validated and the best models were kept. Models were ranked using sum of ranking differences (SRD) method. According to this method, the most consistent QSRR model can be found and similarity or dissimilarity between the models could be noticed. In this study, SRD was performed using average values of experimentally observed data as a golden standard. Chemometric analysis was conducted in order to characterize newly synthesized steroid derivatives for further investigation regarding their potential biological activity and further synthesis. This article is based upon work from COST Action (CM1105), supported by COST (European Cooperation in Science and Technology).Keywords: generalized pair correlation method, molecular descriptors, regression analysis, steroids, sum of ranking differences
Procedia PDF Downloads 3477629 Profile of Programmed Death Ligand-1 (PD-L1) Expression and PD-L1 Gene Amplification in Indonesian Colorectal Cancer Patients
Authors: Akterono Budiyati, Gita Kusumo, Teguh Putra, Fritzie Rexana, Antonius Kurniawan, Aru Sudoyo, Ahmad Utomo, Andi Utama
Abstract:
The presence of the programmed death ligand-1 (PD-L1) has been used in multiple clinical trials and approved as biomarker for selecting patients more likely to respond to immune checkpoint inhibitors. However, the expression of PD-L1 is regulated in different ways, which leads to a different significance of its presence. Positive PD-L1 within tumors may result from two mechanisms, induced PD-L1 expression by T-cell presence or genetic mechanism that lead to constitutive PD-L1 expression. Amplification of PD-L1 genes was found as one of genetic mechanism which causes an increase in PD-L1 expression. In case of colorectal cancer (CRC), targeting immune checkpoint inhibitor has been recommended for patients with microsatellite instable (MSI). Although the correlation between PD-L1 expression and MSI status has been widely studied, so far the precise mechanism of PD-L1 gene activation in CRC patients, particularly in MSI population have yet to be clarified. In this present study we have profiled 61 archived formalin fixed paraffin embedded CRC specimens of patients from Medistra Hospital, Jakarta admitted in 2010 - 2016. Immunohistochemistry was performed to measure expression of PD-L1 in tumor cells as well as MSI status using antibodies against PD-L1 and MMR (MLH1, MSH2, PMS2 and MSH6), respectively. PD-L1 expression was measured on tumor cells with cut off of 1% whereas loss of nuclear MMR protein expressions in tumor cells but not in normal or stromal cells indicated presence of MSI. Subset of PD-L1 positive patients was then assessed for copy number variations (CNVs) using single Tube TaqMan Copy Number Assays Gene CD247PD-L1. We also observed KRAS mutation to profile possible genetic mechanism leading to the presence or absence of PD-L1 expression. Analysis of 61 CRC patients revealed 15 patients (24%) expressed PD-L1 on their tumor cell membranes. The prevalence of surface membrane PD-L1 was significantly higher in patients with MSI (87%; 7/8) compared to patients with microsatellite stable (MSS) (15%; 8/53) (P=0.001). Although amplification of PD-L1 gene was not found among PD-L1 positive patients, low-level amplification of PD-L1 gene was commonly observed in MSS patients (75%; 6/8) than in MSI patients (43%; 3/7). Additionally, we found 26% of CRC patients harbored KRAS mutations (16/61), so far the distribution of KRAS status did not correlate with PD-L1 expression. Our data suggest genetic mechanism through amplification of PD-L1 seems not to be the mechanism underlying upregulation of PD-L1 expression in CRC patients. However, further studies are warranted to confirm the results.Keywords: colorectal cancer, gene amplification, microsatellite instable, programmed death ligand-1
Procedia PDF Downloads 2227628 Performance and Physiological Responses of Broiler Chickens to Diets Supplemented with Propolis in Breeding, to in Ovo Propolis Feeding or to Propolis Supplementation of Diets for Their Chicks
Authors: Kalbiye Konanc, Ergin Ozturk
Abstract:
To examine the effects of an ethanol liquid extract obtained from raw bee propolis (PE) on fattening performance and physiology such as vaccine-antibody relationship, microbial profile, immune status and some blood parameters of broiler chickens were used a total of 600 broiler (Ross 308) chicks, obtained from eggs of 288, 38-weeks-old broiler breeding. There were 6 groups: CC (Parent-Control and Offspring-Control, CP (Parent-Control and Offspring-propolis extract, Cip (Parent-Control and Offspring-in-ovo propolis extract), Cis (Parent-Control and Chickens-in-ovo saline), PeC (Parent-propolis extract and Offspring-Control), PeP (Parent-Propolis extract and Offspring-Propolis extract). Each group was consisted of 10 replications with 10 broiler offspring, and the experiment was lasted for 6 weeks with ethanol-extracted propolis concentration is 400 ppm/kg diet. While the highest feed consumptions at 0-21 days and 0-42 days were found in PeC, the best feed conversion ratio at 0-42 days was found in CP group. The live weight gains were found not to be different among the groups. The highest alanine aminotransferase activities were found in CC and CP and aspartate aminotransferase activities in PeP and PeC groups. The highest triglyceride and total antioxidant levels were found highest in CC and the highest total oxidant level in Cip group. IgA level in hatched eggs and IgM value after slaughtering were highest in Cip group. The best immune response was obtained for 21st day Newcastle Disease vaccine in CC and Cis groups and for 28th day Infectious Bursal Disease vaccine in CP group. The highest total aerobic microorganism and the lowest total fungi count were found in PeP group. In conclusion, it was determined that in-ovo propolis ethanol extract (Cip) increased the maternal antibody levels, that had not consistent effects on blood biochemical parameters except for triglyceride, that led to decrease in E. coli counts and that it can provide strong immune response against Infectious Bursal Disease.Keywords: bee propolis, in-ovo feeding, immune parameters, poultry, maternal antibody, microorganisms
Procedia PDF Downloads 2897627 Estimating Lost Digital Video Frames Using Unidirectional and Bidirectional Estimation Based on Autoregressive Time Model
Authors: Navid Daryasafar, Nima Farshidfar
Abstract:
In this article, we make attempt to hide error in video with an emphasis on the time-wise use of autoregressive (AR) models. To resolve this problem, we assume that all information in one or more video frames is lost. Then, lost frames are estimated using analogous Pixels time information in successive frames. Accordingly, after presenting autoregressive models and how they are applied to estimate lost frames, two general methods are presented for using these models. The first method which is the same standard method of autoregressive models estimates lost frame in unidirectional form. Usually, in such condition, previous frames information is used for estimating lost frame. Yet, in the second method, information from the previous and next frames is used for estimating the lost frame. As a result, this method is known as bidirectional estimation. Then, carrying out a series of tests, performance of each method is assessed in different modes. And, results are compared.Keywords: error steganography, unidirectional estimation, bidirectional estimation, AR linear estimation
Procedia PDF Downloads 5407626 Validating Condition-Based Maintenance Algorithms through Simulation
Authors: Marcel Chevalier, Léo Dupont, Sylvain Marié, Frédérique Roffet, Elena Stolyarova, William Templier, Costin Vasile
Abstract:
Industrial end-users are currently facing an increasing need to reduce the risk of unexpected failures and optimize their maintenance. This calls for both short-term analysis and long-term ageing anticipation. At Schneider Electric, we tackle those two issues using both machine learning and first principles models. Machine learning models are incrementally trained from normal data to predict expected values and detect statistically significant short-term deviations. Ageing models are constructed by breaking down physical systems into sub-assemblies, then determining relevant degradation modes and associating each one to the right kinetic law. Validating such anomaly detection and maintenance models is challenging, both because actual incident and ageing data are rare and distorted by human interventions, and incremental learning depends on human feedback. To overcome these difficulties, we propose to simulate physics, systems, and humans -including asset maintenance operations- in order to validate the overall approaches in accelerated time and possibly choose between algorithmic alternatives.Keywords: degradation models, ageing, anomaly detection, soft sensor, incremental learning
Procedia PDF Downloads 1267625 Dual Duality for Unifying Spacetime and Internal Symmetry
Authors: David C. Ni
Abstract:
The current efforts for Grand Unification Theory (GUT) can be classified into General Relativity, Quantum Mechanics, String Theory and the related formalisms. In the geometric approaches for extending General Relativity, the efforts are establishing global and local invariance embedded into metric formalisms, thereby additional dimensions are constructed for unifying canonical formulations, such as Hamiltonian and Lagrangian formulations. The approaches of extending Quantum Mechanics adopt symmetry principle to formulate algebra-group theories, which evolved from Maxwell formulation to Yang-Mills non-abelian gauge formulation, and thereafter manifested the Standard model. This thread of efforts has been constructing super-symmetry for mapping fermion and boson as well as gluon and graviton. The efforts of String theory currently have been evolving to so-called gauge/gravity correspondence, particularly the equivalence between type IIB string theory compactified on AdS5 × S5 and N = 4 supersymmetric Yang-Mills theory. Other efforts are also adopting cross-breeding approaches of above three formalisms as well as competing formalisms, nevertheless, the related symmetries, dualities, and correspondences are outlined as principles and techniques even these terminologies are defined diversely and often generally coined as duality. In this paper, we firstly classify these dualities from the perspective of physics. Then examine the hierarchical structure of classes from mathematical perspective referring to Coleman-Mandula theorem, Hidden Local Symmetry, Groupoid-Categorization and others. Based on Fundamental Theorems of Algebra, we argue that rather imposing effective constraints on different algebras and the related extensions, which are mainly constructed by self-breeding or self-mapping methodologies for sustaining invariance, we propose a new addition, momentum-angular momentum duality at the level of electromagnetic duality, for rationalizing the duality algebras, and then characterize this duality numerically with attempt for addressing some unsolved problems in physics and astrophysics.Keywords: general relativity, quantum mechanics, string theory, duality, symmetry, correspondence, algebra, momentum-angular-momentum
Procedia PDF Downloads 3977624 Learning Predictive Models for Efficient Energy Management of Exhibition Hall
Authors: Jeongmin Kim, Eunju Lee, Kwang Ryel Ryu
Abstract:
This paper addresses the problem of predictive control for energy management of large-scaled exhibition halls, where a lot of energy is consumed to maintain internal atmosphere under certain required conditions. Predictive control achieves better energy efficiency by optimizing the operation of air-conditioning facilities with not only the current but also some future status taken into account. In this paper, we propose to use predictive models learned from past sensor data of hall environment, for use in optimizing the operating plan for the air-conditioning facilities by simulating future environmental change. We have implemented an emulator of an exhibition hall by using EnergyPlus, a widely used building energy emulation tool, to collect data for learning environment-change models. Experimental results show that the learned models predict future change highly accurately on a short-term basis.Keywords: predictive control, energy management, machine learning, optimization
Procedia PDF Downloads 2747623 Bioinformatics Approach to Support Genetic Research in Autism in Mali
Authors: M. Kouyate, M. Sangare, S. Samake, S. Keita, H. G. Kim, D. H. Geschwind
Abstract:
Background & Objectives: Human genetic studies can be expensive, even unaffordable, in developing countries, partly due to the sequencing costs. Our aim is to pilot the use of bioinformatics tools to guide scientifically valid, locally relevant, and economically sound autism genetic research in Mali. Methods: The following databases, NCBI, HGMD, and LSDB, were used to identify hot point mutations. Phenotype, transmission pattern, theoretical protein expression in the brain, the impact of the mutation on the 3D structure of the protein) were used to prioritize selected autism genes. We used the protein database, Modeller, and clustal W. Results: We found Mef2c (Gly27Ala/Leu38Gln), Pten (Thr131IIle), Prodh (Leu289Met), Nme1 (Ser120Gly), and Dhcr7 (Pro227Thr/Glu224Lys). These mutations were associated with endonucleases BseRI, NspI, PfrJS2IV, BspGI, BsaBI, and SpoDI, respectively. Gly27Ala/Leu38Gln mutations impacted the 3D structure of the Mef2c protein. Mef2c protein sequences across species showed a high percentage of similarity with a highly conserved MADS domain. Discussion: Mef2c, Pten, Prodh, Nme1, and Dhcr 7 gene mutation frequencies in the Malian population will be very informative. PCR coupled with restriction enzyme digestion can be used to screen the targeted gene mutations. Sanger sequencing will be used for confirmation only. This will cut down considerably the sequencing cost for gene-to-gene mutation screening. The knowledge of the 3D structure and potential impact of the mutations on Mef2c protein informed the protein family and altered function (ex. Leu38Gln). Conclusion & Future Work: Bio-informatics will positively impact autism research in Mali. Our approach can be applied to another neuropsychiatric disorder.Keywords: bioinformatics, endonucleases, autism, Sanger sequencing, point mutations
Procedia PDF Downloads 837622 Empirical Roughness Progression Models of Heavy Duty Rural Pavements
Authors: Nahla H. Alaswadko, Rayya A. Hassan, Bayar N. Mohammed
Abstract:
Empirical deterministic models have been developed to predict roughness progression of heavy duty spray sealed pavements for a dataset representing rural arterial roads. The dataset provides a good representation of the relevant network and covers a wide range of operating and environmental conditions. A sample with a large size of historical time series data for many pavement sections has been collected and prepared for use in multilevel regression analysis. The modelling parameters include road roughness as performance parameter and traffic loading, time, initial pavement strength, reactivity level of subgrade soil, climate condition, and condition of drainage system as predictor parameters. The purpose of this paper is to report the approaches adopted for models development and validation. The study presents multilevel models that can account for the correlation among time series data of the same section and to capture the effect of unobserved variables. Study results show that the models fit the data very well. The contribution and significance of relevant influencing factors in predicting roughness progression are presented and explained. The paper concludes that the analysis approach used for developing the models confirmed their accuracy and reliability by well-fitting to the validation data.Keywords: roughness progression, empirical model, pavement performance, heavy duty pavement
Procedia PDF Downloads 168