Search results for: whole exome sequencing data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25184

Search results for: whole exome sequencing data

24374 A DEA Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most DEA models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp DEA into DEA with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the DEA model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units’ efficiency. Finally, the developed DEA model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, DEA, fuzzy, decision making units, higher education institutions

Procedia PDF Downloads 47
24373 Data-Driven Decision Making: Justification of Not Leaving Class without It

Authors: Denise Hexom, Judith Menoher

Abstract:

Teachers and administrators across America are being asked to use data and hard evidence to inform practice as they begin the task of implementing Common Core State Standards. Yet, the courses they are taking in schools of education are not preparing teachers or principals to understand the data-driven decision making (DDDM) process nor to utilize data in a much more sophisticated fashion. DDDM has been around for quite some time, however, it has only recently become systematically and consistently applied in the field of education. This paper discusses the theoretical framework of DDDM; empirical evidence supporting the effectiveness of DDDM; a process a department in a school of education has utilized to implement DDDM; and recommendations to other schools of education who attempt to implement DDDM in their decision-making processes and in their students’ coursework.

Keywords: data-driven decision making, institute of higher education, special education, continuous improvement

Procedia PDF Downloads 380
24372 Quantile Coherence Analysis: Application to Precipitation Data

Authors: Yaeji Lim, Hee-Seok Oh

Abstract:

The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.

Keywords: coherence, cross periodogram, spectrum, quantile

Procedia PDF Downloads 385
24371 Conception of a Predictive Maintenance System for Forest Harvesters from Multiple Data Sources

Authors: Lazlo Fauth, Andreas Ligocki

Abstract:

For cost-effective use of harvesters, expensive repairs and unplanned downtimes must be reduced as far as possible. The predictive detection of failing systems and the calculation of intelligent service intervals, necessary to avoid these factors, require in-depth knowledge of the machines' behavior. Such know-how needs permanent monitoring of the machine state from different technical perspectives. In this paper, three approaches will be presented as they are currently pursued in the publicly funded project PreForst at Ostfalia University of Applied Sciences. These include the intelligent linking of workshop and service data, sensors on the harvester, and a special online hydraulic oil condition monitoring system. Furthermore the paper shows potentials as well as challenges for the use of these data in the conception of a predictive maintenance system.

Keywords: predictive maintenance, condition monitoring, forest harvesting, forest engineering, oil data, hydraulic data

Procedia PDF Downloads 134
24370 Sampled-Data Control for Fuel Cell Systems

Authors: H. Y. Jung, Ju H. Park, S. M. Lee

Abstract:

A sampled-data controller is presented for solid oxide fuel cell systems which is expressed by a sector bounded nonlinear model. The sector bounded nonlinear systems, which have a feedback connection with a linear dynamical system and nonlinearity satisfying certain sector type constraints. Also, the sampled-data control scheme is very useful since it is possible to handle digital controller and increasing research efforts have been devoted to sampled-data control systems with the development of modern high-speed computers. The proposed control law is obtained by solving a convex problem satisfying several linear matrix inequalities. Simulation results are given to show the effectiveness of the proposed design method.

Keywords: sampled-data control, fuel cell, linear matrix inequalities, nonlinear control

Procedia PDF Downloads 562
24369 How Western Donors Allocate Official Development Assistance: New Evidence From a Natural Language Processing Approach

Authors: Daniel Benson, Yundan Gong, Hannah Kirk

Abstract:

Advancement in national language processing techniques has led to increased data processing speeds, and reduced the need for cumbersome, manual data processing that is often required when processing data from multilateral organizations for specific purposes. As such, using named entity recognition (NER) modeling and the Organisation of Economically Developed Countries (OECD) Creditor Reporting System database, we present the first geotagged dataset of OECD donor Official Development Assistance (ODA) projects on a global, subnational basis. Our resulting data contains 52,086 ODA projects geocoded to subnational locations across 115 countries, worth a combined $87.9bn. This represents the first global, OECD donor ODA project database with geocoded projects. We use this new data to revisit old questions of how ‘well’ donors allocate ODA to the developing world. This understanding is imperative for policymakers seeking to improve ODA effectiveness.

Keywords: international aid, geocoding, subnational data, natural language processing, machine learning

Procedia PDF Downloads 70
24368 A Novel Chicken W Chromosome Specific Tandem Repeat

Authors: Alsu F. Saifitdinova, Alexey S. Komissarov, Svetlana A. Galkina, Elena I. Koshel, Maria M. Kulak, Stephen J. O'Brien, Elena R. Gaginskaya

Abstract:

The mystery of sex determination is one of the most ancient and still not solved until the end so far. In many species, sex determination is genetic and often accompanied by the presence of dimorphic sex chromosomes in the karyotype. Genomic sequencing gave the information about the gene content of sex chromosomes which allowed to reveal their origin from ordinary autosomes and to trace their evolutionary history. Female-specific W chromosome in birds as well as mammalian male-specific Y chromosome is characterized by the degeneration of gene content and the accumulation of repetitive DNA. Tandem repeats complicate the analysis of genomic data. Despite the best efforts chicken W chromosome assembly includes only 1.2 Mb from expected 55 Mb. Supplementing the information on the sex chromosome composition not only helps to complete the assembly of genomes but also moves us in the direction of understanding of the sex-determination systems evolution. A whole-genome survey to the assembly Gallus_gallus WASHUC 2.60 was applied for repeats search in assembled genome and performed search and assembly of high copy number repeats in unassembled reads of SRR867748 short reads datasets. For cytogenetic analysis conventional methods of fluorescent in situ hybridization was used for previously cloned W specific satellites and specifically designed directly labeled synthetic oligonucleotide DNA probe was used for bioinformatically identified repetitive sequence. Hybridization was performed with mitotic chicken chromosomes and manually isolated giant meiotic lampbrush chromosomes from growing oocytes. A novel chicken W specific satellite (GGAAA)n which is not co-localizes with any previously described classes of W specific repeats was identified and mapped with high resolution. In the composition of autosomes this repeat units was found as a part of upstream regions of gonad specific protein coding sequences. These findings may contribute to the understanding of the role of tandem repeats in sex specific differentiation regulation in birds and sex chromosome evolution. This work was supported by the postdoctoral fellowships from St. Petersburg State University (#1.50.1623.2013 and #1.50.1043.2014), the grant for Leading Scientific Schools (#3553.2014.4) and the grant from Russian foundation for basic researches (#15-04-05684). The equipment and software of Research Resource Center “Chromas” and Theodosius Dobzhansky Center for Genome Bioinformatics of Saint Petersburg State University were used.

Keywords: birds, lampbrush chromosomes, sex chromosomes, tandem repeats

Procedia PDF Downloads 384
24367 Morphological and Molecular Studies (ITS1) of Hydatid Cysts in Slaughtered Sheep in Mashhad Area

Authors: G. R. Hashemi Tabar, G. R. Razmi, F. Mirshekar

Abstract:

Echinococcus granulosus have ten strains from G1 to G9. Each strain is related to special intermediated host. The morphology, epidemiology, treatment and control in these strains are different. There are many morphological and molecular methods to differentiate of Echinococcus strains. However, using both methods were provided better information about identification of each strain. The aim of study was to identify Echinococcus granulosus strain of hydrated cysts in slaughtered sheep using morphological and molecular methods in Mashhad area. In the present study, the infected liver and lung with hydatid cysts were collected and transferred to laboratory. The hydatid cyst liquid was extracted and morphological characters of rostellar hook protosclocies were measured using micrometer ocular. The total length of large blade length of large hooks, total length of small and blade length of small hooks, and number of hooks per protoscolex were 23± 0.3μm, 11.7±0.5 μm, 19.3±1.1 μm,8±1.1 and 33.7±0.7 μm, respectively. In molecular section of the study, DNA each samples was extracted with MBST Kit and development of PCR using special primers (EgF, EgR) which amplify fragment of ITS1 gen. The PCR product was digested with Bsh1236I enzyme. Based on pattern of PCR-RLFP results (four band forming), G1, G2 and G3 strain of Echinococcus granulosus were obtained. Differentiation of three strains was done using sequencing analysis and G1 strain was diagnosed. The agreement between the molecular results with morphometric characters of rosetellar hook was confirmed the presence of G1 strain of Echinococcus in the slaughtered sheep of Mashhad area.

Keywords: Echinococcus granulosus, Hydatid cyst, PCR, sheep

Procedia PDF Downloads 513
24366 Compressed Suffix Arrays to Self-Indexes Based on Partitioned Elias-Fano

Authors: Guo Wenyu, Qu Youli

Abstract:

A practical and simple self-indexing data structure, Partitioned Elias-Fano (PEF) - Compressed Suffix Arrays (CSA), is built in linear time for the CSA based on PEF indexes. Moreover, the PEF-CSA is compared with two classical compressed indexing methods, Ferragina and Manzini implementation (FMI) and Sad-CSA on different type and size files in Pizza & Chili. The PEF-CSA performs better on the existing data in terms of the compression ratio, count, and locates time except for the evenly distributed data such as proteins data. The observations of the experiments are that the distribution of the φ is more important than the alphabet size on the compression ratio. Unevenly distributed data φ makes better compression effect, and the larger the size of the hit counts, the longer the count and locate time.

Keywords: compressed suffix array, self-indexing, partitioned Elias-Fano, PEF-CSA

Procedia PDF Downloads 248
24365 Data, Digital Identity and Antitrust Law: An Exploratory Study of Facebook’s Novi Digital Wallet

Authors: Wanjiku Karanja

Abstract:

Facebook has monopoly power in the social networking market. It has grown and entrenched its monopoly power through the capture of its users’ data value chains. However, antitrust law’s consumer welfare roots have prevented it from effectively addressing the role of data capture in Facebook’s market dominance. These regulatory blind spots are augmented in Facebook’s proposed Diem cryptocurrency project and its Novi Digital wallet. Novi, which is Diem’s digital identity component, shall enable Facebook to collect an unprecedented volume of consumer data. Consequently, Novi has seismic implications on internet identity as the network effects of Facebook’s large user base could establish it as the de facto internet identity layer. Moreover, the large tracts of data Facebook shall collect through Novi shall further entrench Facebook's market power. As such, the attendant lock-in effects of this project shall be very difficult to reverse. Urgent regulatory action is therefore required to prevent this expansion of Facebook’s data resources and monopoly power. This research thus highlights the importance of data capture to competition and market health in the social networking industry. It utilizes interviews with key experts to empirically interrogate the impact of Facebook’s data capture and control of its users’ data value chains on its market power. This inquiry is contextualized against Novi’s expansive effect on Facebook’s data value chains. It thus addresses the novel antitrust issues arising at the nexus of Facebook’s monopoly power and the privacy of its users’ data. It also explores the impact of platform design principles, specifically data portability and data portability, in mitigating Facebook’s anti-competitive practices. As such, this study finds that Facebook is a powerful monopoly that dominates the social media industry to the detriment of potential competitors. Facebook derives its power from its size, annexure of the consumer data value chain, and control of its users’ social graphs. Additionally, the platform design principles of data interoperability and data portability are not a panacea to restoring competition in the social networking market. Their success depends on the establishment of robust technical standards and regulatory frameworks.

Keywords: antitrust law, data protection law, data portability, data interoperability, digital identity, Facebook

Procedia PDF Downloads 120
24364 Study on Developmental and Pathogenesis Related Genes Expression Deregulation in Brassica compestris Infected with 16Sr-IX Associated Phytoplasma

Authors: Samina Jam Nazeer Ahmad, Samia Yasin, Ijaz Ahmad, Muhammad Tahir, Jam Nazeer Ahmad

Abstract:

Phytoplasmas are phloem-inhibited plant pathogenic bacteria that are transferred by insect vectors. Among biotic factors, Phytoplasma infection induces abnormality influencing the physiology as well as morphology of plants. In 16Sr-IX group phytoplasma-infected brassica compestris, flower abnormalities have been associated with changes in the expression of floral development genes. To determine whether methylation was involved in down-regulation of flower development, the process of DNA methylation and Demethylation was investigated as a possible mechanism for regulation of floral gene expression in phytoplasma infected Brassica transmitted by Orosious orientalis vector by using RT-PCR, MSRE-PCR, Southern blotting, Bisulfite Sequencing, etc. Transcriptional expression of methylated genes was found to be globally down-regulated in plants infected with phytoplasma, but not severely in those infested by insect vectors and variation in expression was found in genes involved in methylation. These results also showed that genes particularly orthologous to Arabidopsis APETALA3 involved in petal formation and flower development was down-regulated severely in phytoplasma-infected brassica and with the fact that phytoplasma and insect induce variation in developmental gene expression. The DNA methylation status of flower developmental gene in phytoplasma infected plants with 5-azacytidine restored gene expression strongly suggesting that DNA methylation was involved in down-regulation of floral development genes in phytoplasma infected brassica.

Keywords: genes expression, phytoplasma, DNA methylation, flower development

Procedia PDF Downloads 367
24363 Recommendations for Data Quality Filtering of Opportunistic Species Occurrence Data

Authors: Camille Van Eupen, Dirk Maes, Marc Herremans, Kristijn R. R. Swinnen, Ben Somers, Stijn Luca

Abstract:

In ecology, species distribution models are commonly implemented to study species-environment relationships. These models increasingly rely on opportunistic citizen science data when high-quality species records collected through standardized recording protocols are unavailable. While these opportunistic data are abundant, uncertainty is usually high, e.g., due to observer effects or a lack of metadata. Data quality filtering is often used to reduce these types of uncertainty in an attempt to increase the value of studies relying on opportunistic data. However, filtering should not be performed blindly. In this study, recommendations are built for data quality filtering of opportunistic species occurrence data that are used as input for species distribution models. Using an extensive database of 5.7 million citizen science records from 255 species in Flanders, the impact on model performance was quantified by applying three data quality filters, and these results were linked to species traits. More specifically, presence records were filtered based on record attributes that provide information on the observation process or post-entry data validation, and changes in the area under the receiver operating characteristic (AUC), sensitivity, and specificity were analyzed using the Maxent algorithm with and without filtering. Controlling for sample size enabled us to study the combined impact of data quality filtering, i.e., the simultaneous impact of an increase in data quality and a decrease in sample size. Further, the variation among species in their response to data quality filtering was explored by clustering species based on four traits often related to data quality: commonness, popularity, difficulty, and body size. Findings show that model performance is affected by i) the quality of the filtered data, ii) the proportional reduction in sample size caused by filtering and the remaining absolute sample size, and iii) a species ‘quality profile’, resulting from a species classification based on the four traits related to data quality. The findings resulted in recommendations on when and how to filter volunteer generated and opportunistically collected data. This study confirms that correctly processed citizen science data can make a valuable contribution to ecological research and species conservation.

Keywords: citizen science, data quality filtering, species distribution models, trait profiles

Procedia PDF Downloads 194
24362 Data Quality Enhancement with String Length Distribution

Authors: Qi Xiu, Hiromu Hota, Yohsuke Ishii, Takuya Oda

Abstract:

Recently, collectable manufacturing data are rapidly increasing. On the other hand, mega recall is getting serious as a social problem. Under such circumstances, there are increasing needs for preventing mega recalls by defect analysis such as root cause analysis and abnormal detection utilizing manufacturing data. However, the time to classify strings in manufacturing data by traditional method is too long to meet requirement of quick defect analysis. Therefore, we present String Length Distribution Classification method (SLDC) to correctly classify strings in a short time. This method learns character features, especially string length distribution from Product ID, Machine ID in BOM and asset list. By applying the proposal to strings in actual manufacturing data, we verified that the classification time of strings can be reduced by 80%. As a result, it can be estimated that the requirement of quick defect analysis can be fulfilled.

Keywords: string classification, data quality, feature selection, probability distribution, string length

Procedia PDF Downloads 314
24361 Temporally Coherent 3D Animation Reconstruction from RGB-D Video Data

Authors: Salam Khalifa, Naveed Ahmed

Abstract:

We present a new method to reconstruct a temporally coherent 3D animation from single or multi-view RGB-D video data using unbiased feature point sampling. Given RGB-D video data, in form of a 3D point cloud sequence, our method first extracts feature points using both color and depth information. In the subsequent steps, these feature points are used to match two 3D point clouds in consecutive frames independent of their resolution. Our new motion vectors based dynamic alignment method then fully reconstruct a spatio-temporally coherent 3D animation. We perform extensive quantitative validation using novel error functions to analyze the results. We show that despite the limiting factors of temporal and spatial noise associated to RGB-D data, it is possible to extract temporal coherence to faithfully reconstruct a temporally coherent 3D animation from RGB-D video data.

Keywords: 3D video, 3D animation, RGB-D video, temporally coherent 3D animation

Procedia PDF Downloads 368
24360 Determining Abnomal Behaviors in UAV Robots for Trajectory Control in Teleoperation

Authors: Kiwon Yeom

Abstract:

Change points are abrupt variations in a data sequence. Detection of change points is useful in modeling, analyzing, and predicting time series in application areas such as robotics and teleoperation. In this paper, a change point is defined to be a discontinuity in one of its derivatives. This paper presents a reliable method for detecting discontinuities within a three-dimensional trajectory data. The problem of determining one or more discontinuities is considered in regular and irregular trajectory data from teleoperation. We examine the geometric detection algorithm and illustrate the use of the method on real data examples.

Keywords: change point, discontinuity, teleoperation, abrupt variation

Procedia PDF Downloads 162
24359 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 436
24358 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 190
24357 An A-Star Approach for the Quickest Path Problem with Time Windows

Authors: Christofas Stergianos, Jason Atkin, Herve Morvan

Abstract:

As air traffic increases, more airports are interested in utilizing optimization methods. Many processes happen in parallel at an airport, and complex models are needed in order to have a reliable solution that can be implemented for ground movement operations. The ground movement for aircraft in an airport, allocating a path to each aircraft to follow in order to reach their destination (e.g. runway or gate), is one process that could be optimized. The Quickest Path Problem with Time Windows (QPPTW) algorithm has been developed to provide a conflict-free routing of vehicles and has been applied to routing aircraft around an airport. It was subsequently modified to increase the accuracy for airport applications. These modifications take into consideration specific characteristics of the problem, such as: the pushback process, which considers the extra time that is needed for pushing back an aircraft and turning its engines on; stand holding where any waiting should be allocated to the stand; and runway sequencing, where the sequence of the aircraft that take off is optimized and has to be respected. QPPTW involves searching for the quickest path by expanding the search in all directions, similarly to Dijkstra’s algorithm. Finding a way to direct the expansion can potentially assist the search and achieve a better performance. We have further modified the QPPTW algorithm to use a heuristic approach in order to guide the search. This new algorithm is based on the A-star search method but estimates the remaining time (instead of distance) in order to assess how far the target is. It is important to consider the remaining time that it is needed to reach the target, so that delays that are caused by other aircraft can be part of the optimization method. All of the other characteristics are still considered and time windows are still used in order to route multiple aircraft rather than a single aircraft. In this way the quickest path is found for each aircraft while taking into account the movements of the previously routed aircraft. After running experiments using a week of real aircraft data from Zurich Airport, the new algorithm (A-star QPPTW) was found to route aircraft much more quickly, being especially fast in routing the departing aircraft where pushback delays are significant. On average A-star QPPTW could route a full day (755 to 837 aircraft movements) 56% faster than the original algorithm. In total the routing of a full week of aircraft took only 12 seconds with the new algorithm, 15 seconds faster than the original algorithm. For real time application, the algorithm needs to be very fast, and this speed increase will allow us to add additional features and complexity, allowing further integration with other processes in airports and leading to more optimized and environmentally friendly airports.

Keywords: a-star search, airport operations, ground movement optimization, routing and scheduling

Procedia PDF Downloads 225
24356 Procedure Model for Data-Driven Decision Support Regarding the Integration of Renewable Energies into Industrial Energy Management

Authors: M. Graus, K. Westhoff, X. Xu

Abstract:

The climate change causes a change in all aspects of society. While the expansion of renewable energies proceeds, industry could not be convinced based on general studies about the potential of demand side management to reinforce smart grid considerations in their operational business. In this article, a procedure model for a case-specific data-driven decision support for industrial energy management based on a holistic data analytics approach is presented. The model is executed on the example of the strategic decision problem, to integrate the aspect of renewable energies into industrial energy management. This question is induced due to considerations of changing the electricity contract model from a standard rate to volatile energy prices corresponding to the energy spot market which is increasingly more affected by renewable energies. The procedure model corresponds to a data analytics process consisting on a data model, analysis, simulation and optimization step. This procedure will help to quantify the potentials of sustainable production concepts based on the data from a factory. The model is validated with data from a printer in analogy to a simple production machine. The overall goal is to establish smart grid principles for industry via the transformation from knowledge-driven to data-driven decisions within manufacturing companies.

Keywords: data analytics, green production, industrial energy management, optimization, renewable energies, simulation

Procedia PDF Downloads 431
24355 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization

Authors: K. Umbleja, M. Ichino, H. Yaguchi

Abstract:

In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.

Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data

Procedia PDF Downloads 166
24354 The Impact of Data Science on Geography: A Review

Authors: Roberto Machado

Abstract:

We conducted a systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses methodology, analyzing 2,996 studies and synthesizing 41 of them to explore the evolution of data science and its integration into geography. By employing optimization algorithms, we accelerated the review process, significantly enhancing the efficiency and precision of literature selection. Our findings indicate that data science has developed over five decades, facing challenges such as the diversified integration of data and the need for advanced statistical and computational skills. In geography, the integration of data science underscores the importance of interdisciplinary collaboration and methodological innovation. Techniques like large-scale spatial data analysis and predictive algorithms show promise in natural disaster management and transportation route optimization, enabling faster and more effective responses. These advancements highlight the transformative potential of data science in geography, providing tools and methodologies to address complex spatial problems. The relevance of this study lies in the use of optimization algorithms in systematic reviews and the demonstrated need for deeper integration of data science into geography. Key contributions include identifying specific challenges in combining diverse spatial data and the necessity for advanced computational skills. Examples of connections between these two fields encompass significant improvements in natural disaster management and transportation efficiency, promoting more effective and sustainable environmental solutions with a positive societal impact.

Keywords: data science, geography, systematic review, optimization algorithms, supervised learning

Procedia PDF Downloads 17
24353 Evaluation of Brca1/2 Mutational Status among Algerian Familial Breast Cancer

Authors: Arab M., Ait Abdallah M., Zeraoulia N., Boumaza H., Aoutia M., Griene L., Ait Abdelkader B.,

Abstract:

breast and ovarian cancer are respectively the first and fourth leading causes of cancer among women in Algeria. A family story of cancer in the most important risk factor, and in most cases of families with breast and /or ovarian cancer, the pattern of cancer family can be attributed to mutation in BRCA1/2genes. objectibes: the aim of our study in to investigate the spectrum of BRCA1/2 germiline mutation in familial breast and /or ovarian cancer and to determine the prevalence and the nature of BRCA1/2mutation in Algeria methods: we deremined the prevalence of BRCA1/2 mutation within a cohort of 161 probands selected according the eisinger score double stranded sanger sequencing of all coding exons of BRCA1/2including flanking intronic region were performed results: we identified a total of 23 distinct deleterious mutations (class5) 12 differents mutations in BRCA1(52%) and 11 in BRCA2(48%). 78% (18/23) were protein truncating and 22%(5/23) missens mutations.3 novel deleterious mutations have been identified, which have not been described in public mutation database. one new mutation were found in two unrelated patients. the overall mutation detection rate in our study is 28,5%(46/161).more over, an UVS c7783 located in BRCA2 is found in two unrelated probands and segregate in the 02 families/ conclusion: our results sugget of large spectrum of BRCA1/2 mutation in Algerian breast/ovarian cancer family. The nature and prevalence of BRCA1/2mutation in algerian families are ongoing in a larger study, 80 probands are to day under investigation. This study which may therefore identify the genetic particularity of Algerian breast /ovarian cancer.

Keywords: BRCA1/2 mutations, hereditary breast cancer, algerian women, prvalence

Procedia PDF Downloads 172
24352 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which covers the variety of figure proportions in both height and girth. 3,000 data has been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from some states of India to produce the sizing system suitable for clothing manufacture and retailing. This data is used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from a large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: anthropometric data, data mining, decision tree, garments manufacturing, sizing systems, ready-made garments

Procedia PDF Downloads 131
24351 A Framework on Data and Remote Sensing for Humanitarian Logistics

Authors: Vishnu Nagendra, Marten Van Der Veen, Stefania Giodini

Abstract:

Effective humanitarian logistics operations are a cornerstone in the success of disaster relief operations. However, for effectiveness, they need to be demand driven and supported by adequate data for prioritization. Without this data operations are carried out in an ad hoc manner and eventually become chaotic. The current availability of geospatial data helps in creating models for predictive damage and vulnerability assessment, which can be of great advantage to logisticians to gain an understanding on the nature and extent of the disaster damage. This translates into actionable information on the demand for relief goods, the state of the transport infrastructure and subsequently the priority areas for relief delivery. However, due to the unpredictable nature of disasters, the accuracy in the models need improvement which can be done using remote sensing data from UAVs (Unmanned Aerial Vehicles) or satellite imagery, which again come with certain limitations. This research addresses the need for a framework to combine data from different sources to support humanitarian logistic operations and prediction models. The focus is on developing a workflow to combine data from satellites and UAVs post a disaster strike. A three-step approach is followed: first, the data requirements for logistics activities are made explicit, which is done by carrying out semi-structured interviews with on field logistics workers. Second, the limitations in current data collection tools are analyzed to develop workaround solutions by following a systems design approach. Third, the data requirements and the developed workaround solutions are fit together towards a coherent workflow. The outcome of this research will provide a new method for logisticians to have immediately accurate and reliable data to support data-driven decision making.

Keywords: unmanned aerial vehicles, damage prediction models, remote sensing, data driven decision making

Procedia PDF Downloads 373
24350 Facility Data Model as Integration and Interoperability Platform

Authors: Nikola Tomasevic, Marko Batic, Sanja Vranes

Abstract:

Emerging Semantic Web technologies can be seen as the next step in evolution of the intelligent facility management systems. Particularly, this considers increased usage of open source and/or standardized concepts for data classification and semantic interpretation. To deliver such facility management systems, providing the comprehensive integration and interoperability platform in from of the facility data model is a prerequisite. In this paper, one of the possible modelling approaches to provide such integrative facility data model which was based on the ontology modelling concept was presented. Complete ontology development process, starting from the input data acquisition, ontology concepts definition and finally ontology concepts population, was described. At the beginning, the core facility ontology was developed representing the generic facility infrastructure comprised of the common facility concepts relevant from the facility management perspective. To develop the data model of a specific facility infrastructure, first extension and then population of the core facility ontology was performed. For the development of the full-blown facility data models, Malpensa and Fiumicino airports in Italy, two major European air-traffic hubs, were chosen as a test-bed platform. Furthermore, the way how these ontology models supported the integration and interoperability of the overall airport energy management system was analyzed as well.

Keywords: airport ontology, energy management, facility data model, ontology modeling

Procedia PDF Downloads 444
24349 A Machine Learning Model for Dynamic Prediction of Chronic Kidney Disease Risk Using Laboratory Data, Non-Laboratory Data, and Metabolic Indices

Authors: Amadou Wurry Jallow, Adama N. S. Bah, Karamo Bah, Shih-Ye Wang, Kuo-Chung Chu, Chien-Yeh Hsu

Abstract:

Chronic kidney disease (CKD) is a major public health challenge with high prevalence, rising incidence, and serious adverse consequences. Developing effective risk prediction models is a cost-effective approach to predicting and preventing complications of chronic kidney disease (CKD). This study aimed to develop an accurate machine learning model that can dynamically identify individuals at risk of CKD using various kinds of diagnostic data, with or without laboratory data, at different follow-up points. Creatinine is a key component used to predict CKD. These models will enable affordable and effective screening for CKD even with incomplete patient data, such as the absence of creatinine testing. This retrospective cohort study included data on 19,429 adults provided by a private research institute and screening laboratory in Taiwan, gathered between 2001 and 2015. Univariate Cox proportional hazard regression analyses were performed to determine the variables with high prognostic values for predicting CKD. We then identified interacting variables and grouped them according to diagnostic data categories. Our models used three types of data gathered at three points in time: non-laboratory, laboratory, and metabolic indices data. Next, we used subgroups of variables within each category to train two machine learning models (Random Forest and XGBoost). Our machine learning models can dynamically discriminate individuals at risk for developing CKD. All the models performed well using all three kinds of data, with or without laboratory data. Using only non-laboratory-based data (such as age, sex, body mass index (BMI), and waist circumference), both models predict chronic kidney disease as accurately as models using laboratory and metabolic indices data. Our machine learning models have demonstrated the use of different categories of diagnostic data for CKD prediction, with or without laboratory data. The machine learning models are simple to use and flexible because they work even with incomplete data and can be applied in any clinical setting, including settings where laboratory data is difficult to obtain.

Keywords: chronic kidney disease, glomerular filtration rate, creatinine, novel metabolic indices, machine learning, risk prediction

Procedia PDF Downloads 100
24348 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 267
24347 A Relational Data Base for Radiation Therapy

Authors: Raffaele Danilo Esposito, Domingo Planes Meseguer, Maria Del Pilar Dorado Rodriguez

Abstract:

As far as we know, it is still unavailable a commercial solution which would allow to manage, openly and configurable up to user needs, the huge amount of data generated in a modern Radiation Oncology Department. Currently, available information management systems are mainly focused on Record & Verify and clinical data, and only to a small extent on physical data. Thus, results in a partial and limited use of the actually available information. In the present work we describe the implementation at our department of a centralized information management system based on a web server. Our system manages both information generated during patient planning and treatment, and information of general interest for the whole department (i.e. treatment protocols, quality assurance protocols etc.). Our objective it to be able to analyze in a simple and efficient way all the available data and thus to obtain quantitative evaluations of our treatments. This would allow us to improve our work flow and protocols. To this end we have implemented a relational data base which would allow us to use in a practical and efficient way all the available information. As always we only use license free software.

Keywords: information management system, radiation oncology, medical physics, free software

Procedia PDF Downloads 232
24346 A Study of Safety of Data Storage Devices of Graduate Students at Suan Sunandha Rajabhat University

Authors: Komol Phaisarn, Natcha Wattanaprapa

Abstract:

This research is a survey research with an objective to study the safety of data storage devices of graduate students of academic year 2013, Suan Sunandha Rajabhat University. Data were collected by questionnaire on the safety of data storage devices according to CIA principle. A sample size of 81 was drawn from population by purposive sampling method. The results show that most of the graduate students of academic year 2013 at Suan Sunandha Rajabhat University use handy drive to store their data and the safety level of the devices is at good level.

Keywords: security, safety, storage devices, graduate students

Procedia PDF Downloads 349
24345 Simulation of a Cost Model Response Requests for Replication in Data Grid Environment

Authors: Kaddi Mohammed, A. Benatiallah, D. Benatiallah

Abstract:

Data grid is a technology that has full emergence of new challenges, such as the heterogeneity and availability of various resources and geographically distributed, fast data access, minimizing latency and fault tolerance. Researchers interested in this technology address the problems of the various systems related to the industry such as task scheduling, load balancing and replication. The latter is an effective solution to achieve good performance in terms of data access and grid resources and better availability of data cost. In a system with duplication, a coherence protocol is used to impose some degree of synchronization between the various copies and impose some order on updates. In this project, we present an approach for placing replicas to minimize the cost of response of requests to read or write, and we implement our model in a simulation environment. The placement techniques are based on a cost model which depends on several factors, such as bandwidth, data size and storage nodes.

Keywords: response time, query, consistency, bandwidth, storage capacity, CERN

Procedia PDF Downloads 266