Search results for: data logging
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25309

Search results for: data logging

24349 Efficient Pre-Processing of Single-Cell Assay for Transposase Accessible Chromatin with High-Throughput Sequencing Data

Authors: Fan Gao, Lior Pachter

Abstract:

The primary tool currently used to pre-process 10X Chromium single-cell ATAC-seq data is Cell Ranger, which can take very long to run on standard datasets. To facilitate rapid pre-processing that enables reproducible workflows, we present a suite of tools called scATAK for pre-processing single-cell ATAC-seq data that is 15 to 18 times faster than Cell Ranger on mouse and human samples. Our tool can also calculate chromatin interaction potential matrices, and generate open chromatin signal and interaction traces for cell groups. We use scATAK tool to explore the chromatin regulatory landscape of a healthy adult human brain and unveil cell-type specific features, and show that it provides a convenient and computational efficient approach for pre-processing single-cell ATAC-seq data.

Keywords: single-cell, ATAC-seq, bioinformatics, open chromatin landscape, chromatin interactome

Procedia PDF Downloads 156
24348 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 141
24347 Feature Selection Approach for the Classification of Hydraulic Leakages in Hydraulic Final Inspection using Machine Learning

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Manufacturing companies are facing global competition and enormous cost pressure. The use of machine learning applications can help reduce production costs and create added value. Predictive quality enables the securing of product quality through data-supported predictions using machine learning models as a basis for decisions on test results. Furthermore, machine learning methods are able to process large amounts of data, deal with unfavourable row-column ratios and detect dependencies between the covariates and the given target as well as assess the multidimensional influence of all input variables on the target. Real production data are often subject to highly fluctuating boundary conditions and unbalanced data sets. Changes in production data manifest themselves in trends, systematic shifts, and seasonal effects. Thus, Machine learning applications require intensive pre-processing and feature selection. Data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets. Within the used real data set of Bosch hydraulic valves, the comparability of the same production conditions in the production of hydraulic valves within certain time periods can be identified by applying the concept drift method. Furthermore, a classification model is developed to evaluate the feature importance in different subsets within the identified time periods. By selecting comparable and stable features, the number of features used can be significantly reduced without a strong decrease in predictive power. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. In this research, the ada boosting classifier is used to predict the leakage of hydraulic valves based on geometric gauge blocks from machining, mating data from the assembly, and hydraulic measurement data from end-of-line testing. In addition, the most suitable methods are selected and accurate quality predictions are achieved.

Keywords: classification, achine learning, predictive quality, feature selection

Procedia PDF Downloads 162
24346 Secure Data Sharing of Electronic Health Records With Blockchain

Authors: Kenneth Harper

Abstract:

The secure sharing of Electronic Health Records (EHRs) is a critical challenge in modern healthcare, demanding solutions to enhance interoperability, privacy, and data integrity. Traditional standards like Health Information Exchange (HIE) and HL7 have made significant strides in facilitating data exchange between healthcare entities. However, these approaches rely on centralized architectures that are often vulnerable to data breaches, lack sufficient privacy measures, and have scalability issues. This paper proposes a framework for secure, decentralized sharing of EHRs using blockchain technology, cryptographic tokens, and Non-Fungible Tokens (NFTs). The blockchain's immutable ledger, decentralized control, and inherent security mechanisms are leveraged to improve transparency, accountability, and auditability in healthcare data exchanges. Furthermore, we introduce the concept of tokenizing patient data through NFTs, creating unique digital identifiers for each record, which allows for granular data access controls and proof of data ownership. These NFTs can also be employed to grant access to authorized parties, establishing a secure and transparent data sharing model that empowers both healthcare providers and patients. The proposed approach addresses common privacy concerns by employing privacy-preserving techniques such as zero-knowledge proofs (ZKPs) and homomorphic encryption to ensure that sensitive patient information can be shared without exposing the actual content of the data. This ensures compliance with regulations like HIPAA and GDPR. Additionally, the integration of Fast Healthcare Interoperability Resources (FHIR) with blockchain technology allows for enhanced interoperability, enabling healthcare organizations to exchange data seamlessly and securely across various systems while maintaining data governance and regulatory compliance. Through real-world case studies and simulations, this paper demonstrates how blockchain-based EHR sharing can reduce operational costs, improve patient outcomes, and enhance the security and privacy of healthcare data. This decentralized framework holds great potential for revolutionizing healthcare information exchange, providing a transparent, scalable, and secure method for managing patient data in a highly regulated environment.

Keywords: blockchain, electronic health records (ehrs), fast healthcare interoperability resources (fhir), health information exchange (hie), hl7, interoperability, non-fungible tokens (nfts), privacy-preserving techniques, tokens, secure data sharing,

Procedia PDF Downloads 23
24345 The Twin Terminal of Pedestrian Trajectory Based on City Intelligent Model (CIM) 4.0

Authors: Chen Xi, Liu Xuebing, Lao Xueru, Kuan Sinman, Jiang Yike, Wang Hanwei, Yang Xiaolang, Zhou Junjie, Xie Jinpeng

Abstract:

To further promote the development of smart cities, the microscopic "nerve endings" of the City Intelligent Model (CIM) are extended to be more sensitive. In this paper, we develop a pedestrian trajectory twin terminal based on the CIM and CNN technology. It also uses 5G networks, architectural and geoinformatics technologies, convolutional neural networks, combined with deep learning networks for human behavior recognition models, to provide empirical data such as 'pedestrian flow data and human behavioral characteristics data', and ultimately form spatial performance evaluation criteria and spatial performance warning systems, to make the empirical data accurate and intelligent for prediction and decision making.

Keywords: urban planning, urban governance, CIM, artificial intelligence, sustainable development

Procedia PDF Downloads 422
24344 An Extended Inverse Pareto Distribution, with Applications

Authors: Abdel Hadi Ebraheim

Abstract:

This paper introduces a new extension of the Inverse Pareto distribution in the framework of Marshal-Olkin (1997) family of distributions. This model is capable of modeling various shapes of aging and failure data. The statistical properties of the new model are discussed. Several methods are used to estimate the parameters involved. Explicit expressions are derived for different types of moments of value in reliability analysis are obtained. Besides, the order statistics of samples from the new proposed model have been studied. Finally, the usefulness of the new model for modeling reliability data is illustrated using two real data sets with simulation study.

Keywords: pareto distribution, marshal-Olkin, reliability, hazard functions, moments, estimation

Procedia PDF Downloads 83
24343 Potential Determinants of Research Output: Comparing Economics and Business

Authors: Osiris Jorge Parcero, Néstor Gandelman, Flavia Roldán, Josef Montag

Abstract:

This paper uses cross-country unbalanced panel data of up to 146 countries over the period 1996 to 2015 to be the first study to identify potential determinants of a country’s relative research output in Economics versus Business. More generally, it is also one of the first studies comparing Economics and Business. The results show that better policy-related data availability, higher income inequality, and lower ethnic fractionalization relatively favor economics. The findings are robust to two alternative fixed effects specifications, three alternative definitions of economics and business, two alternative measures of research output (publications and citations), and the inclusion of meaningful control variables. To the best of our knowledge, our paper is also the first to demonstrate the importance of policy-related data as drivers of economic research. Our regressions show that the availability of this type of data is the single most important factor associated with the prevalence of economics over business as a research domain. Thus, our work has policy implications, as the availability of policy-related data is partially under policy control. Moreover, it has implications for students, professionals, universities, university departments, and research-funding agencies that face choices between profiles oriented toward economics and those oriented toward business. Finally, the conclusions show potential lines for further research.

Keywords: research output, publication performance, bibliometrics, economics, business, policy-related data

Procedia PDF Downloads 134
24342 Assessment of Routine Health Information System (RHIS) Quality Assurance Practices in Tarkwa Sub-Municipal Health Directorate, Ghana

Authors: Richard Okyere Boadu, Judith Obiri-Yeboah, Kwame Adu Okyere Boadu, Nathan Kumasenu Mensah, Grace Amoh-Agyei

Abstract:

Routine health information system (RHIS) quality assurance has become an important issue, not only because of its significance in promoting a high standard of patient care but also because of its impact on government budgets for the maintenance of health services. A routine health information system comprises healthcare data collection, compilation, storage, analysis, report generation, and dissemination on a routine basis in various healthcare settings. The data from RHIS give a representation of health status, health services, and health resources. The sources of RHIS data are normally individual health records, records of services delivered, and records of health resources. Using reliable information from routine health information systems is fundamental in the healthcare delivery system. Quality assurance practices are measures that are put in place to ensure the health data that are collected meet required quality standards. Routine health information system quality assurance practices ensure that data that are generated from the system are fit for use. This study considered quality assurance practices in the RHIS processes. Methods: A cross-sectional study was conducted in eight health facilities in Tarkwa Sub-Municipal Health Service in the western region of Ghana. The study involved routine quality assurance practices among the 90 health staff and management selected from facilities in Tarkwa Sub-Municipal who collected or used data routinely from 24th December 2019 to 20th January 2020. Results: Generally, Tarkwa Sub-Municipal health service appears to practice quality assurance during data collection, compilation, storage, analysis and dissemination. The results show some achievement in quality control performance in report dissemination (77.6%), data analysis (68.0%), data compilation (67.4%), report compilation (66.3%), data storage (66.3%) and collection (61.1%). Conclusions: Even though the Tarkwa Sub-Municipal Health Directorate engages in some control measures to ensure data quality, there is a need to strengthen the process to achieve the targeted percentage of performance (90.0%). There was a significant shortfall in quality assurance practices performance, especially during data collection, with respect to the expected performance.

Keywords: quality assurance practices, assessment of routine health information system quality, routine health information system, data quality

Procedia PDF Downloads 81
24341 Heart Failure Identification and Progression by Classifying Cardiac Patients

Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan

Abstract:

Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.

Keywords: decision tree, heart failure, data mining, classification model

Procedia PDF Downloads 402
24340 Critically Analyzing the Application of Big Data for Smart Transportation: A Case Study of Mumbai

Authors: Tanuj Joshi

Abstract:

Smart transportation is fast emerging as a solution to modern cities’ approach mobility issues, delayed emergency response rate and high congestion on streets. Present day scenario with Google Maps, Waze, Yelp etc. demonstrates how information and communications technologies controls the intelligent transportation system. This intangible and invisible infrastructure is largely guided by the big data analytics. On the other side, the exponential increase in Indian urban population has intensified the demand for better services and infrastructure to satisfy the transportation needs of its citizens. No doubt, India’s huge internet usage is looked as an important resource to guide to achieve this. However, with a projected number of over 40 billion objects connected to the Internet by 2025, the need for systems to handle massive volume of data (big data) also arises. This research paper attempts to identify the ways of exploiting the big data variables which will aid commuters on Indian tracks. This study explores real life inputs by conducting survey and interviews to identify which gaps need to be targeted to better satisfy the customers. Several experts at Mumbai Metropolitan Region Development Authority (MMRDA), Mumbai Metro and Brihanmumbai Electric Supply and Transport (BEST) were interviewed regarding the Information Technology (IT) systems currently in use. The interviews give relevant insights and requirements into the workings of public transportation systems whereas the survey investigates the macro situation.

Keywords: smart transportation, mobility issue, Mumbai transportation, big data, data analysis

Procedia PDF Downloads 179
24339 Scientific Linux Cluster for BIG-DATA Analysis (SLBD): A Case of Fayoum University

Authors: Hassan S. Hussein, Rania A. Abul Seoud, Amr M. Refaat

Abstract:

Scientific researchers face in the analysis of very large data sets that is increasing noticeable rate in today’s and tomorrow’s technologies. Hadoop and Spark are types of software that developed frameworks. Hadoop framework is suitable for many Different hardware platforms. In this research, a scientific Linux cluster for Big Data analysis (SLBD) is presented. SLBD runs open source software with large computational capacity and high performance cluster infrastructure. SLBD composed of one cluster contains identical, commodity-grade computers interconnected via a small LAN. SLBD consists of a fast switch and Gigabit-Ethernet card which connect four (nodes). Cloudera Manager is used to configure and manage an Apache Hadoop stack. Hadoop is a framework allows storing and processing big data across the cluster by using MapReduce algorithm. MapReduce algorithm divides the task into smaller tasks which to be assigned to the network nodes. Algorithm then collects the results and form the final result dataset. SLBD clustering system allows fast and efficient processing of large amount of data resulting from different applications. SLBD also provides high performance, high throughput, high availability, expandability and cluster scalability.

Keywords: big data platforms, cloudera manager, Hadoop, MapReduce

Procedia PDF Downloads 361
24338 Investigating the Effects of Data Transformations on a Bi-Dimensional Chi-Square Test

Authors: Alexandru George Vaduva, Adriana Vlad, Bogdan Badea

Abstract:

In this research, we conduct a Monte Carlo analysis on a two-dimensional χ2 test, which is used to determine the minimum distance required for independent sampling in the context of chaotic signals. We investigate the impact of transforming initial data sets from any probability distribution to new signals with a uniform distribution using the Spearman rank correlation on the χ2 test. This transformation removes the randomness of the data pairs, and as a result, the observed distribution of χ2 test values differs from the expected distribution. We propose a solution to this problem and evaluate it using another chaotic signal.

Keywords: chaotic signals, logistic map, Pearson’s test, Chi Square test, bivariate distribution, statistical independence

Procedia PDF Downloads 99
24337 Real Time Data Communication with FlightGear Using Simulink Over a UDP Protocol

Authors: Adil Loya, Ali Haider, Arslan A. Ghaffor, Abubaker Siddique

Abstract:

Simulation and modelling of Unmanned Aero Vehicle (UAV) has gained wide popularity in front of aerospace community. The demand of designing and modelling optimized control system for UAV has increased ten folds since last decade. The reason is next generation warfare is dependent on unmanned technologies. Therefore, this research focuses on the simulation of nonlinear UAV dynamics on Simulink and its integration with Flightgear. There has been lots of research on implementation of optimizing control using Simulink, however, there are fewer known techniques to simulate these dynamics over Flightgear and a tedious technique of acquiring data has been tackled in this research horizon. Sending data to Flightgear is easy but receiving it from Simulink is not that straight forward, i.e. we can only receive control data on the output. However, in this research we have managed to get the data out from the Flightgear by implementation of level 2 s-function block within Simulink. Moreover, the results captured from Flightgear over a Universal Datagram Protocol (UDP) communication are then compared with the attitude signal that were sent previously. This provide useful information regarding the difference in outputs attained from Simulink to Flightgear. It was found that values received on Simulink were in high agreement with that of the Flightgear output. And complete study has been conducted in a discrete way.

Keywords: aerospace, flight control, flightgear, communication, Simulink

Procedia PDF Downloads 288
24336 Open Source, Open Hardware Ground Truth for Visual Odometry and Simultaneous Localization and Mapping Applications

Authors: Janusz Bedkowski, Grzegorz Kisala, Michal Wlasiuk, Piotr Pokorski

Abstract:

Ground-truth data is essential for VO (Visual Odometry) and SLAM (Simultaneous Localization and Mapping) quantitative evaluation using e.g. ATE (Absolute Trajectory Error) and RPE (Relative Pose Error). Many open-access data sets provide raw and ground-truth data for benchmark purposes. The issue appears when one would like to validate Visual Odometry and/or SLAM approaches on data captured using the device for which the algorithm is targeted for example mobile phone and disseminate data for other researchers. For this reason, we propose an open source, open hardware groundtruth system that provides an accurate and precise trajectory with a 3D point cloud. It is based on LiDAR Livox Mid-360 with a non-repetitive scanning pattern, on-board Raspberry Pi 4B computer, battery and software for off-line calculations (camera to LiDAR calibration, LiDAR odometry, SLAM, georeferencing). We show how this system can be used for the evaluation of various the state of the art algorithms (Stella SLAM, ORB SLAM3, DSO) in typical indoor monocular VO/SLAM.

Keywords: SLAM, ground truth, navigation, LiDAR, visual odometry, mapping

Procedia PDF Downloads 76
24335 Prediction of Gully Erosion with Stochastic Modeling by using Geographic Information System and Remote Sensing Data in North of Iran

Authors: Reza Zakerinejad

Abstract:

Gully erosion is a serious problem that threading the sustainability of agricultural area and rangeland and water in a large part of Iran. This type of water erosion is the main source of sedimentation in many catchment areas in the north of Iran. Since in many national assessment approaches just qualitative models were applied the aim of this study is to predict the spatial distribution of gully erosion processes by means of detail terrain analysis and GIS -based logistic regression in the loess deposition in a case study in the Golestan Province. This study the DEM with 25 meter result ion from ASTER data has been used. The Landsat ETM data have been used to mapping of land use. The TreeNet model as a stochastic modeling was applied to prediction the susceptible area for gully erosion. In this model ROC we have set 20 % of data as learning and 20 % as learning data. Therefore, applying the GIS and satellite image analysis techniques has been used to derive the input information for these stochastic models. The result of this study showed a high accurate map of potential for gully erosion.

Keywords: TreeNet model, terrain analysis, Golestan Province, Iran

Procedia PDF Downloads 537
24334 Data Science/Artificial Intelligence: A Possible Panacea for Refugee Crisis

Authors: Avi Shrivastava

Abstract:

In 2021, two heart-wrenching scenes, shown live on television screens across countries, painted a grim picture of refugees. One of them was of people clinging onto an airplane's wings in their desperate attempt to flee war-torn Afghanistan. They ultimately fell to their death. The other scene was the U.S. government authorities separating children from their parents or guardians to deter migrants/refugees from coming to the U.S. These events show the desperation refugees feel when they are trying to leave their homes in disaster zones. However, data paints a grave picture of the current refugee situation. It also indicates that a bleak future lies ahead for the refugees across the globe. Data and information are the two threads that intertwine to weave the shimmery fabric of modern society. Data and information are often used interchangeably, but they differ considerably. For example, information analysis reveals rationale, and logic, while data analysis, on the other hand, reveals a pattern. Moreover, patterns revealed by data can enable us to create the necessary tools to combat huge problems on our hands. Data analysis paints a clear picture so that the decision-making process becomes simple. Geopolitical and economic data can be used to predict future refugee hotspots. Accurately predicting the next refugee hotspots will allow governments and relief agencies to prepare better for future refugee crises. The refugee crisis does not have binary answers. Given the emotionally wrenching nature of the ground realities, experts often shy away from realistically stating things as they are. This hesitancy can cost lives. When decisions are based solely on data, emotions can be removed from the decision-making process. Data also presents irrefutable evidence and tells whether there is a solution or not. Moreover, it also responds to a nonbinary crisis with a binary answer. Because of all that, it becomes easier to tackle a problem. Data science and A.I. can predict future refugee crises. With the recent explosion of data due to the rise of social media platforms, data and insight into data has solved many social and political problems. Data science can also help solve many issues refugees face while staying in refugee camps or adopted countries. This paper looks into various ways data science can help solve refugee problems. A.I.-based chatbots can help refugees seek legal help to find asylum in the country they want to settle in. These chatbots can help them find a marketplace where they can find help from the people willing to help. Data science and technology can also help solve refugees' many problems, including food, shelter, employment, security, and assimilation. The refugee problem seems to be one of the most challenging for social and political reasons. Data science and machine learning can help prevent the refugee crisis and solve or alleviate some of the problems that refugees face in their journey to a better life. With the explosion of data in the last decade, data science has made it possible to solve many geopolitical and social issues.

Keywords: refugee crisis, artificial intelligence, data science, refugee camps, Afghanistan, Ukraine

Procedia PDF Downloads 73
24333 A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing

Authors: Youngji Yoo, Seung Hwan Park, Daewoong An, Sung-Shick Kim, Jun-Geol Baek

Abstract:

The yield management system is very important to produce high-quality semiconductor chips in the semiconductor manufacturing process. In order to improve quality of semiconductors, various tests are conducted in the post fabrication (FAB) process. During the test process, large amount of data are collected and the data includes a lot of information about defect. In general, the defect on the wafer is the main causes of yield loss. Therefore, analyzing the defect data is necessary to improve performance of yield prediction. The wafer bin map (WBM) is one of the data collected in the test process and includes defect information such as the fail bit patterns. The fail bit has characteristics of spatial point patterns. Therefore, this paper proposes the feature extraction method using the spatial point pattern analysis. Actual data obtained from the semiconductor process is used for experiments and the experimental result shows that the proposed method is more accurately recognize the fail bit patterns.

Keywords: semiconductor, wafer bin map, feature extraction, spatial point patterns, contour map

Procedia PDF Downloads 385
24332 The Measurement of the Multi-Period Efficiency of the Turkish Health Care Sector

Authors: Erhan Berk

Abstract:

The purpose of this study is to examine the efficiency and productivity of the health care sector in Turkey based on four years of health care cross-sectional data. Efficiency measures are calculated by a nonparametric approach known as Data Envelopment Analysis (DEA). Productivity is measured by the Malmquist index. The research shows how DEA-based Malmquist productivity index can be operated to appraise the technology and productivity changes resulted in the Turkish hospitals which are located all across the country.

Keywords: data envelopment analysis, efficiency, health care, Malmquist Index

Procedia PDF Downloads 336
24331 Comparison Of Data Mining Models To Predict Future Bridge Conditions

Authors: Pablo Martinez, Emad Mohamed, Osama Mohsen, Yasser Mohamed

Abstract:

Highway and bridge agencies, such as the Ministry of Transportation in Ontario, use the Bridge Condition Index (BCI) which is defined as the weighted condition of all bridge elements to determine the rehabilitation priorities for its bridges. Therefore, accurate forecasting of BCI is essential for bridge rehabilitation budgeting planning. The large amount of data available in regard to bridge conditions for several years dictate utilizing traditional mathematical models as infeasible analysis methods. This research study focuses on investigating different classification models that are developed to predict the bridge condition index in the province of Ontario, Canada based on the publicly available data for 2800 bridges over a period of more than 10 years. The data preparation is a key factor to develop acceptable classification models even with the simplest one, the k-NN model. All the models were tested, compared and statistically validated via cross validation and t-test. A simple k-NN model showed reasonable results (within 0.5% relative error) when predicting the bridge condition in an incoming year.

Keywords: asset management, bridge condition index, data mining, forecasting, infrastructure, knowledge discovery in databases, maintenance, predictive models

Procedia PDF Downloads 191
24330 Piql Preservation Services - A Holistic Approach to Digital Long-Term Preservation

Authors: Alexander Rych

Abstract:

Piql Preservation Services (“Piql”) is a turnkey solution designed for secure, migration-free long- term preservation of digital data. Piql sets an open standard for long- term preservation for the future. It consists of equipment and processes needed for writing and retrieving digital data. Exponentially growing amounts of data demand for logistically effective and cost effective processes. Digital storage media (hard disks, magnetic tape) exhibit limited lifetime. Repetitive data migration to overcome rapid obsolescence of hardware and software bears accelerated risk of data loss, data corruption or even manipulation and adds significant repetitive costs for hardware and software investments. Piql stores any kind of data in its digital as well as analog form securely for 500 years. The medium that provides this is a film reel. Using photosensitive film polyester base, a very stable material that is known for its immutability over hundreds of years, secure and cost-effective long- term preservation can be provided. The film reel itself is stored in a packaging capable of protecting the optical storage medium. These components have undergone extensive testing to ensure longevity of up to 500 years. In addition to its durability, film is a true WORM (write once- read many) medium. It therefore is resistant to editing or manipulation. Being able to store any form of data onto the film makes Piql a superior solution for long-term preservation. Paper documents, images, video or audio sequences – all of those file formats and documents can be preserved in its native file structure. In order to restore the encoded digital data, only a film scanner, a digital camera or any appropriate optical reading device will be needed in the future. Every film reel includes an index section describing the data saved on the film. It also contains a content section carrying meta-data, enabling users in the future to rebuild software in order to read and decode the digital information.

Keywords: digital data, long-term preservation, migration-free, photosensitive film

Procedia PDF Downloads 392
24329 Inversion of Gravity Data for Density Reconstruction

Authors: Arka Roy, Chandra Prakash Dubey

Abstract:

Inverse problem generally used for recovering hidden information from outside available data. Vertical component of gravity field we will be going to use for underneath density structure calculation. Ill-posing nature is main obstacle for any inverse problem. Linear regularization using Tikhonov formulation are used for appropriate choice of SVD and GSVD components. For real time data handle, signal to noise ratios should have to be less for reliable solution. In our study, 2D and 3D synthetic model with rectangular grid are used for gravity field calculation and its corresponding inversion for density reconstruction. Fine grid also we have considered to hold any irregular structure. Keeping in mind of algebraic ambiguity factor number of observation point should be more than that of number of data point. Picard plot is represented here for choosing appropriate or main controlling Eigenvalues for a regularized solution. Another important study is depth resolution plot (DRP). DRP are generally used for studying how the inversion is influenced by regularizing or discretizing. Our further study involves real time gravity data inversion of Vredeforte Dome South Africa. We apply our method to this data. The results include density structure is in good agreement with known formation in that region, which puts an additional support of our method.

Keywords: depth resolution plot, gravity inversion, Picard plot, SVD, Tikhonov formulation

Procedia PDF Downloads 214
24328 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 98
24327 An Efficient Data Mining Technique for Online Stores

Authors: Mohammed Al-Shalabi, Alaa Obeidat

Abstract:

In any food stores, some items will be expired or destroyed because the demand on these items is infrequent, so we need a system that can help the decision maker to make an offer on such items to improve the demand on the items by putting them with some other frequent item and decrease the price to avoid losses. The system generates hundreds or thousands of patterns (offers) for each low demand item, then it uses the association rules (support, confidence) to find the interesting patterns (the best offer to achieve the lowest losses). In this paper, we propose a data mining method for determining the best offer by merging the data mining techniques with the e-commerce strategy. The task is to build a model to predict the best offer. The goal is to maximize the profits of a store and avoid the loss of products. The idea in this paper is the using of the association rules in marketing with a combination with e-commerce.

Keywords: data mining, association rules, confidence, online stores

Procedia PDF Downloads 411
24326 Elemental Graph Data Model: A Semantic and Topological Representation of Building Elements

Authors: Yasmeen A. S. Essawy, Khaled Nassar

Abstract:

With the rapid increase of complexity in the building industry, professionals in the A/E/C industry were forced to adopt Building Information Modeling (BIM) in order to enhance the communication between the different project stakeholders throughout the project life cycle and create a semantic object-oriented building model that can support geometric-topological analysis of building elements during design and construction. This paper presents a model that extracts topological relationships and geometrical properties of building elements from an existing fully designed BIM, and maps this information into a directed acyclic Elemental Graph Data Model (EGDM). The model incorporates BIM-based search algorithms for automatic deduction of geometrical data and topological relationships for each building element type. Using graph search algorithms, such as Depth First Search (DFS) and topological sortings, all possible construction sequences can be generated and compared against production and construction rules to generate an optimized construction sequence and its associated schedule. The model is implemented in a C# platform.

Keywords: building information modeling (BIM), elemental graph data model (EGDM), geometric and topological data models, graph theory

Procedia PDF Downloads 384
24325 Wireless Sensor Network for Forest Fire Detection and Localization

Authors: Tarek Dandashi

Abstract:

WSNs may provide a fast and reliable solution for the early detection of environment events like forest fires. This is crucial for alerting and calling for fire brigade intervention. Sensor nodes communicate sensor data to a host station, which enables a global analysis and the generation of a reliable decision on a potential fire and its location. A WSN with TinyOS and nesC for the capturing and transmission of a variety of sensor information with controlled source, data rates, duration, and the records/displaying activity traces is presented. We propose a similarity distance (SD) between the distribution of currently sensed data and that of a reference. At any given time, a fire causes diverging opinions in the reported data, which alters the usual data distribution. Basically, SD consists of a metric on the Cumulative Distribution Function (CDF). SD is designed to be invariant versus day-to-day changes of temperature, changes due to the surrounding environment, and normal changes in weather, which preserve the data locality. Evaluation shows that SD sensitivity is quadratic versus an increase in sensor node temperature for a group of sensors of different sizes and neighborhood. Simulation of fire spreading when ignition is placed at random locations with some wind speed shows that SD takes a few minutes to reliably detect fires and locate them. We also discuss the case of false negative and false positive and their impact on the decision reliability.

Keywords: forest fire, WSN, wireless sensor network, algortihm

Procedia PDF Downloads 263
24324 A Feasibility Study of Crowdsourcing Data Collection for Facility Maintenance Management

Authors: Mohamed Bin Alhaj, Hexu Liu, Mohammed Sulaiman, Osama Abudayyeh

Abstract:

An effective facility maintenance management (FMM) system plays a crucial role in improving the quality of services and maintaining the facility in good condition. Current FMM heavily relies on the quality of the data collection function of the FMM systems, at times resulting in inefficient FMM decision-making. The new technology-based crowdsourcing provides great potential to improve the current FMM practices, especially in terms of timeliness and quality of data. This research aims to investigate the feasibility of using new technology-driven crowdsourcing for FMM and highlight its opportunities and challenges. A survey was carried out to understand the human, data, system, geospatial, and automation characteristics of crowdsourcing for an educational campus FMM via social networks. The survey results were analyzed to reveal the challenges and recommendations for the implementation of crowdsourcing for FMM. This research contributes to the body of knowledge by synthesizing the challenges and opportunities of using crowdsourcing for facility maintenance and providing a road map for applying crowdsourcing technology in FMM. In future work, a conceptual framework will be proposed to support data-driven FMM using social networks.

Keywords: crowdsourcing, facility maintenance management, social networks

Procedia PDF Downloads 176
24323 Challenges and Opportunities: One Stop Processing for the Automation of Indonesian Large-Scale Topographic Base Map Using Airborne LiDAR Data

Authors: Elyta Widyaningrum

Abstract:

The LiDAR data acquisition has been recognizable as one of the fastest solution to provide the basis data for topographic base mapping in Indonesia. The challenges to accelerate the provision of large-scale topographic base maps as a development plan basis gives the opportunity to implement the automated scheme in the map production process. The one stop processing will also contribute to accelerate the map provision especially to conform with the Indonesian fundamental spatial data catalog derived from ISO 19110 and geospatial database integration. Thus, the automated LiDAR classification, DTM generation and feature extraction will be conducted in one GIS-software environment to form all layers of topographic base maps. The quality of automated topographic base map will be assessed and analyzed based on its completeness, correctness, contiguity, consistency and possible customization.

Keywords: automation, GIS environment, LiDAR processing, map quality

Procedia PDF Downloads 369
24322 Mixtures of Length-Biased Weibull Distributions for Loss Severity Modelling

Authors: Taehan Bae

Abstract:

In this paper, a class of length-biased Weibull mixtures is presented to model loss severity data. The proposed model generalizes the Erlang mixtures with the common scale parameter, and it shares many important modelling features, such as flexibility to fit various data distribution shapes and weak-denseness in the class of positive continuous distributions, with the Erlang mixtures. We show that the asymptotic tail estimate of the length-biased Weibull mixture is Weibull-type, which makes the model effective to fit loss severity data with heavy-tailed observations. A method of statistical estimation is discussed with applications on real catastrophic loss data sets.

Keywords: Erlang mixture, length-biased distribution, transformed gamma distribution, asymptotic tail estimate, EM algorithm, expectation-maximization algorithm

Procedia PDF Downloads 224
24321 Robust Data Image Watermarking for Data Security

Authors: Harsh Vikram Singh, Ankur Rai, Anand Mohan

Abstract:

In this paper, we propose secure and robust data hiding algorithm based on DCT by Arnold transform and chaotic sequence. The watermark image is scrambled by Arnold cat map to increases its security and then the chaotic map is used for watermark signal spread in middle band of DCT coefficients of the cover image The chaotic map can be used as pseudo-random generator for digital data hiding, to increase security and robustness .Performance evaluation for robustness and imperceptibility of proposed algorithm has been made using bit error rate (BER), normalized correlation (NC), and peak signal to noise ratio (PSNR) value for different watermark and cover images such as Lena, Girl, Tank images and gain factor .We use a binary logo image and text image as watermark. The experimental results demonstrate that the proposed algorithm achieves higher security and robustness against JPEG compression as well as other attacks such as addition of noise, low pass filtering and cropping attacks compared to other existing algorithm using DCT coefficients. Moreover, to recover watermarks in proposed algorithm, there is no need to original cover image.

Keywords: data hiding, watermarking, DCT, chaotic sequence, arnold transforms

Procedia PDF Downloads 515
24320 An Empirical Investigation of Big Data Analytics: The Financial Performance of Users versus Vendors

Authors: Evisa Mitrou, Nicholas Tsitsianis, Supriya Shinde

Abstract:

In the age of digitisation and globalisation, businesses have shifted online and are investing in big data analytics (BDA) to respond to changing market conditions and sustain their performance. Our study shifts the focus from the adoption of BDA to the impact of BDA on financial performance. We explore the financial performance of both BDA-vendors (business-to-business) and BDA-clients (business-to-customer). We distinguish between the five BDA-technologies (big-data-as-a-service (BDaaS), descriptive, diagnostic, predictive, and prescriptive analytics) and discuss them individually. Further, we use four perspectives (internal business process, learning and growth, customer, and finance) and discuss the significance of how each of the five BDA-technologies affects the performance measures of these four perspectives. We also present the analysis of employee engagement, average turnover, average net income, and average net assets for BDA-clients and BDA-vendors. Our study also explores the effect of the COVID-19 pandemic on business continuity for both BDA-vendors and BDA-clients.

Keywords: BDA-clients, BDA-vendors, big data analytics, financial performance

Procedia PDF Downloads 125