Search results for: data pipeline
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24319

Search results for: data pipeline

24169 Investigation of Nucleation and Thermal Conductivity of Waxy Crude Oil on Pipe Wall via Particle Dynamics

Authors: Jinchen Cao, Tiantian Du

Abstract:

As waxy crude oil is easy to crystallization and deposition in the pipeline wall, it causes pipeline clogging and leads to the reduction of oil and gas gathering and transmission efficiency. In this paper, a mesoscopic scale dissipative particle dynamics method is employed, and constructed four pipe wall models, including smooth wall (SW), hydroxylated wall (HW), rough wall (RW), and single-layer graphene wall (GW). Snapshots of the simulation output trajectories show that paraffin molecules interact with each other to form a network structure that constrains water molecules as their nucleation sites. Meanwhile, it is observed that the paraffin molecules on the near-wall side are adsorbed horizontally between inter-lattice gaps of the solid wall. In the pressure range of 0 - 50 MPa, the pressure change has less effect on the affinity properties of SS, HS, and GS walls, but for RS walls, the contact angle between paraffin wax and water molecules was found to decrease with the increase in pressure, while the water molecules showed the opposite trend, the phenomenon is due to the change in pressure, leading to the transition of paraffin wax molecules from amorphous to crystalline state. Meanwhile, the minimum crystalline phase pressure (MCPP) was proposed to describe the lowest pressure at which crystallization of paraffin molecules occurs. The maximum number of crystalline clusters formed by paraffin molecules at MCPP in the system showed NSS (0.52 MPa) > NHS (0.55 MPa) > NRS (0.62 MPa) > NGS (0.75 MPa). The MCPP on the graphene surface, with the least number of clusters formed, indicates that the addition of graphene inhibited the crystallization process of paraffin deposition on the wall surface. Finally, the thermal conductivity was calculated, and the results show that on the near-wall side, the thermal conductivity changes drastically due to the occurrence of adsorption crystallization of paraffin waxes; on the fluid side the thermal conductivity gradually tends to stabilize, and the average thermal conductivity shows: ĸRS(0.254W/(m·K)) > ĸRS(0.249W/(m·K)) > ĸRS(0.218W/(m·K)) > ĸRS(0.188W/(m·K)).This study provides a theoretical basis for improving the transport efficiency and heat transfer characteristics of waxy crude oil in terms of wall type, wall roughness, and MCPP.

Keywords: waxy crude oil, thermal conductivity, crystallization, dissipative particle dynamics, MCPP

Procedia PDF Downloads 40
24168 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 533
24167 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 50
24166 Target and Biomarker Identification Platform to Design New Drugs against Aging and Age-Related Diseases

Authors: Peter Fedichev

Abstract:

We studied fundamental aspects of aging to develop a mathematical model of gene regulatory network. We show that aging manifests itself as an inherent instability of gene network leading to exponential accumulation of regulatory errors with age. To validate our approach we studied age-dependent omic data such as transcriptomes, metabolomes etc. of different model organisms and humans. We build a computational platform based on our model to identify the targets and biomarkers of aging to design new drugs against aging and age-related diseases. As biomarkers of aging, we choose the rate of aging and the biological age since they completely determine the state of the organism. Since rate of aging rapidly changes in response to an external stress, this kind of biomarker can be useful as a tool for quantitative efficacy assessment of drugs, their combinations, dose optimization, chronic toxicity estimate, personalized therapies selection, clinical endpoints achievement (within clinical research), and death risk assessments. According to our model, we propose a method for targets identification for further interventions against aging and age-related diseases. Being a biotech company, we offer a complete pipeline to develop an anti-aging drug-candidate.

Keywords: aging, longevity, biomarkers, senescence

Procedia PDF Downloads 248
24165 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 369
24164 Domain Adaptation Save Lives - Drowning Detection in Swimming Pool Scene Based on YOLOV8 Improved by Gaussian Poisson Generative Adversarial Network Augmentation

Authors: Simiao Ren, En Wei

Abstract:

Drowning is a significant safety issue worldwide, and a robust computer vision-based alert system can easily prevent such tragedies in swimming pools. However, due to domain shift caused by the visual gap (potentially due to lighting, indoor scene change, pool floor color etc.) between the training swimming pool and the test swimming pool, the robustness of such algorithms has been questionable. The annotation cost for labeling each new swimming pool is too expensive for mass adoption of such a technique. To address this issue, we propose a domain-aware data augmentation pipeline based on Gaussian Poisson Generative Adversarial Network (GP-GAN). Combined with YOLOv8, we demonstrate that such a domain adaptation technique can significantly improve the model performance (from 0.24 mAP to 0.82 mAP) on new test scenes. As the augmentation method only require background imagery from the new domain (no annotation needed), we believe this is a promising, practical route for preventing swimming pool drowning.

Keywords: computer vision, deep learning, YOLOv8, detection, swimming pool, drowning, domain adaptation, generative adversarial network, GAN, GP-GAN

Procedia PDF Downloads 58
24163 An Accurate Brain Tumor Segmentation for High Graded Glioma Using Deep Learning

Authors: Sajeeha Ansar, Asad Ali Safi, Sheikh Ziauddin, Ahmad R. Shahid, Faraz Ahsan

Abstract:

Gliomas are most challenging and aggressive type of tumors which appear in different sizes, locations, and scattered boundaries. CNN is most efficient deep learning approach with outstanding capability of solving image analysis problems. A fully automatic deep learning based 2D-CNN model for brain tumor segmentation is presented in this paper. We used small convolution filters (3 x 3) to make architecture deeper. We increased convolutional layers for efficient learning of complex features from large dataset. We achieved better results by pushing convolutional layers up to 16 layers for HGG model. We achieved reliable and accurate results through fine-tuning among dataset and hyper-parameters. Pre-processing of this model includes generation of brain pipeline, intensity normalization, bias correction and data augmentation. We used the BRATS-2015, and Dice Similarity Coefficient (DSC) is used as performance measure for the evaluation of the proposed method. Our method achieved DSC score of 0.81 for complete, 0.79 for core, 0.80 for enhanced tumor regions. However, these results are comparable with methods already implemented 2D CNN architecture.

Keywords: brain tumor segmentation, convolutional neural networks, deep learning, HGG

Procedia PDF Downloads 221
24162 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 85
24161 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 89
24160 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 349
24159 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 489
24158 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing

Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti

Abstract:

Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.

Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis

Procedia PDF Downloads 107
24157 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 442
24156 High Viscous Oil–Water Flow: Experiments and CFD Simulations

Authors: A. Archibong-Eso, J. Shi, Y Baba, S. Alagbe, W. Yan, H. Yeung

Abstract:

This study presents over 100 experiments conducted in a 25.4 mm internal diameter (ID) horizontal pipeline. Oil viscosity ranging from 3.5 Pa.s–5.0 Pa.s are used with superficial velocities of oil and water ranging from 0.06 to 0.55 m/s and 0.01 m/s to 1.0 m/s, respectively. Pressure gradient measurements and flow pattern observations are discussed. Numerical simulation of some flow conditions is performed using the commercial CFD code ANSYS Fluent® and the simulation results are compared with experimental results. Results indicate that CFD numerical simulation performed moderately well in predicting the flow configurations observed in this study while discrepancies were observed in the pressure gradient predictions.

Keywords: flow patterns, plug, pressure gradient, rivulet

Procedia PDF Downloads 394
24155 Non-Centrifugal Cane Sugar Production: Heat Transfer Study to Optimize the Use of Energy

Authors: Fabian Velasquez, John Espitia, Henry Hernadez, Sebastian Escobar, Jader Rodriguez

Abstract:

Non-centrifuged cane sugar (NCS) is a concentrated product obtained through the evaporation of water contain from sugarcane juice inopen heat exchangers (OE). The heat supplied to the evaporation stages is obtained from the cane bagasse through the thermochemical process of combustion, where the thermal energy released is transferred to OE by the flue gas. Therefore, the optimization of energy usage becomes essential for the proper design of the production process. For optimize the energy use, it is necessary modeling and simulation of heat transfer between the combustion gases and the juice and to understand the major mechanisms involved in the heat transfer. The main objective of this work was simulated heat transfer phenomena between the flue gas and open heat exchangers using Computational Fluid Dynamics model (CFD). The simulation results were compared to field measured data. Numerical results about temperature profile along the flue gas pipeline at the measurement points are in good accordance with field measurements. Thus, this study could be of special interest in design NCS production process and the optimization of the use of energy.

Keywords: mathematical modeling, design variables, computational fluid dynamics, overall thermal efficiency

Procedia PDF Downloads 98
24154 Environment Management Practices at Oil and Natural Gas Corporation Hazira Gas Processing Complex

Authors: Ashish Agarwal, Vaibhav Singh

Abstract:

Harmful emissions from oil and gas processing facilities have long remained a matter of concern for governments and environmentalists throughout the world. This paper analyses Oil and Natural Gas Corporation (ONGC) gas processing plant in Hazira, Gujarat, India. It is the largest gas-processing complex in the country designed to process 41MMSCMD sour natural gas & associated sour condensate. The complex, sprawling over an area of approximate 705 hectares is the mother plant for almost all industries at Hazira and enroute Hazira Bijapur Jagdishpur pipeline. Various sources of pollution from each unit starting from Gas Terminal to Dew Point Depression unit and Caustic Wash unit along the processing chain were examined with the help of different emission data obtained from ONGC. Pollution discharged to the environment was classified into Water, Air, Hazardous Waste and Solid (Non-Hazardous) Waste so as to analyze each one of them efficiently. To protect air environment, Sulphur recovery unit along with automatic ambient air quality monitoring stations, automatic stack monitoring stations among numerous practices were adopted. To protect water environment different effluent treatment plants were used with due emphasis on aquaculture of the nearby area. Hazira plant has obtained the authorization for handling and disposal of five types of hazardous waste. Most of the hazardous waste were sold to authorized recyclers and the rest was given to Gujarat Pollution Control Board authorized vendors. Non-Hazardous waste was also handled with an overall objective of zero negative impact on the environment. The effect of methods adopted is evident from emission data of the plant which was found to be well under Gujarat Pollution Control Board limits.

Keywords: sulphur recovery unit, effluent treatment plant, hazardous waste, sour gas

Procedia PDF Downloads 198
24153 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 367
24152 Results of the Field-and-Scientific Study in the Water Area of the Estuaries of the Major Rivers of the Black Sea and Sea Ports on the Territory of Georgia

Authors: Ana Gavardashvili

Abstract:

The field-and-scientific studies to evaluate the modern ecological state in the water area of the estuaries of the major water-abundant rivers in the coastal line of the Black Sea (Chorokhi, Kintrishi, Natanebi, Supsa, Khobistskali, Rioni and Enguri) and sea ports (Batumi, Poti) and sea terminals of the oil pipeline (Baku-Tbilisi-Supsa, Kulevi) were accomplished in the months of June and July of 2015. GPS coordinates and GIS programs were used to fix the areas of the estuaries of the above-listed rivers on a digital map, with their values varying within the limits of 0,861 and 20,390 km2. Water samples from the Black Sea were taken from the river estuaries and sea ports during the field works, with their statistical series of 125 points. The temperatures of air (t2) and water in the Black Sea (t1) were measured locally, and their relative value is (t1 /t2 ) = 0,69 – 0,92. 125 water samples taken from the study object in the Black Sea coastal line were subject to laboratory analysis, and it was established that the Black Sea acidity (pH) changes within the limits of 7,71 – 8,22 in the river estuaries and within 8,42 - 8,65 in the port water areas and at oil terminals. As for the Sea water salinity index (TDS), it changes within the limits of 6,15 – 12,67 in the river estuaries, and (TDS) = 11,80 – 13,67 in the port water areas and at oil terminals. By taking the gained data and climatic changes into account, by using the theories of reliability and risk at the following stage, the nature of the changes of the function of the Black Sea ecological parameters will be established.

Keywords: acidity, estuary, salinity, sea

Procedia PDF Downloads 263
24151 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 609
24150 Fueling Efficient Reporting And Decision-Making In Public Health With Large Data Automation In Remote Areas, Neno Malawi

Authors: Wiseman Emmanuel Nkhomah, Chiyembekezo Kachimanga, Julia Huggins, Fabien Munyaneza

Abstract:

Background: Partners In Health – Malawi introduced one of Operational Researches called Primary Health Care (PHC) Surveys in 2020, which seeks to assess progress of delivery of care in the district. The study consists of 5 long surveys, namely; Facility assessment, General Patient, Provider, Sick Child, Antenatal Care (ANC), primarily conducted in 4 health facilities in Neno district. These facilities include Neno district hospital, Dambe health centre, Chifunga and Matope. Usually, these annual surveys are conducted from January, and the target is to present final report by June. Once data is collected and analyzed, there are a series of reviews that take place before reaching final report. In the first place, the manual process took over 9 months to present final report. Initial findings reported about 76.9% of the data that added up when cross-checked with paper-based sources. Purpose: The aim of this approach is to run away from manually pulling the data, do fresh analysis, and reporting often associated not only with delays in reporting inconsistencies but also with poor quality of data if not done carefully. This automation approach was meant to utilize features of new technologies to create visualizations, reports, and dashboards in Power BI that are directly fished from the data source – CommCare hence only require a single click of a ‘refresh’ button to have the updated information populated in visualizations, reports, and dashboards at once. Methodology: We transformed paper-based questionnaires into electronic using CommCare mobile application. We further connected CommCare Mobile App directly to Power BI using Application Program Interface (API) connection as data pipeline. This provided chance to create visualizations, reports, and dashboards in Power BI. Contrary to the process of manually collecting data in paper-based questionnaires, entering them in ordinary spreadsheets, and conducting analysis every time when preparing for reporting, the team utilized CommCare and Microsoft Power BI technologies. We utilized validations and logics in CommCare to capture data with less errors. We utilized Power BI features to host the reports online by publishing them as cloud-computing process. We switched from sharing ordinary report files to sharing the link to potential recipients hence giving them freedom to dig deep into extra findings within Power BI dashboards and also freedom to export to any formats of their choice. Results: This data automation approach reduced research timelines from the initial 9 months’ duration to 5. It also improved the quality of the data findings from the original 76.9% to 98.9%. This brought confidence to draw conclusions from the findings that help in decision-making and gave opportunities for further researches. Conclusion: These results suggest that automating the research data process has the potential of reducing overall amount of time spent and improving the quality of the data. On this basis, the concept of data automation should be taken into serious consideration when conducting operational research for efficiency and decision-making.

Keywords: reporting, decision-making, power BI, commcare, data automation, visualizations, dashboards

Procedia PDF Downloads 82
24149 Cardiokey: A Binary and Multi-Class Machine Learning Approach to Identify Individuals Using Electrocardiographic Signals on Wearable Devices

Authors: S. Chami, J. Chauvin, T. Demarest, Stan Ng, M. Straus, W. Jahner

Abstract:

Biometrics tools such as fingerprint and iris are widely used in industry to protect critical assets. However, their vulnerability and lack of robustness raise several worries about the protection of highly critical assets. Biometrics based on Electrocardiographic (ECG) signals is a robust identification tool. However, most of the state-of-the-art techniques have worked on clinical signals, which are of high quality and less noisy, extracted from wearable devices like a smartwatch. In this paper, we are presenting a complete machine learning pipeline that identifies people using ECG extracted from an off-person device. An off-person device is a wearable device that is not used in a medical context such as a smartwatch. In addition, one of the main challenges of ECG biometrics is the variability of the ECG of different persons and different situations. To solve this issue, we proposed two different approaches: per person classifier, and one-for-all classifier. The first approach suggests making binary classifier to distinguish one person from others. The second approach suggests a multi-classifier that distinguishes the selected set of individuals from non-selected individuals (others). The preliminary results, the binary classifier obtained a performance 90% in terms of accuracy within a balanced data. The second approach has reported a log loss of 0.05 as a multi-class score.

Keywords: biometrics, electrocardiographic, machine learning, signals processing

Procedia PDF Downloads 116
24148 Screening for Hit Identification against Mycobacterium abscessus

Authors: Jichan Jang

Abstract:

Mycobacterium abscessus is a rapidly growing life-threatening mycobacterium with multiple drug-resistance mechanisms. In this study, we screened the library to identify active molecules targeting Mycobacterium abscessus using resazurin live/dead assays. In this screening assay, the Z-factor was 0.7, as an indication of the statistical confidence of the assay. A cut-off of 80% growth inhibition in the screening resulted in the identification of four different compounds at a single concentration (20 μM). Dose-response curves identified three different hit candidates, which generated good inhibitory curves. All hit candidates were expected to have different molecular targets. Thus, we found that compound X, identified, may be a promising candidate in the M. abscessus drug discovery pipeline.

Keywords: Mycobacterium abscessus, antibiotics, drug discovery, emerging Pathogen

Procedia PDF Downloads 174
24147 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 344
24146 Ribotaxa: Combined Approaches for Taxonomic Resolution Down to the Species Level from Metagenomics Data Revealing Novelties

Authors: Oshma Chakoory, Sophie Comtet-Marre, Pierre Peyret

Abstract:

Metagenomic classifiers are widely used for the taxonomic profiling of metagenomic data and estimation of taxa relative abundance. Small subunit rRNA genes are nowadays a gold standard for the phylogenetic resolution of complex microbial communities, although the power of this marker comes down to its use as full-length. We benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then built a pipeline called RiboTaxa to generate a highly sensitive and specific metataxonomic approach. Using metagenomics data, RiboTaxa gave the best results compared to other tools (Kraken2, Centrifuge (1), METAXA2 (2), PhyloFlash (3)) with precise taxonomic identification and relative abundance description, giving no false positive detection. Using real datasets from various environments (ocean, soil, human gut) and from different approaches (metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not seen by current bioinformatics analysis opening new biological perspectives in human and environmental health. In a study focused on corals’ health involving 20 metagenomic samples (4), an affiliation of prokaryotes was limited to the family level with Endozoicomonadaceae characterising healthy octocoral tissue. RiboTaxa highlighted 2 species of uncultured Endozoicomonas which were dominant in the healthy tissue. Both species belonged to a genus not yet described, opening new research perspectives on corals’ health. Applied to metagenomics data from a study on human gut and extreme longevity (5), RiboTaxa detected the presence of an uncultured archaeon in semi-supercentenarians (aged 105 to 109 years) highlighting an archaeal genus, not yet described, and 3 uncultured species belonging to the Enorma genus that could be species of interest participating in the longevity process. RiboTaxa is user-friendly, rapid, allowing microbiota structure description from any environment and the results can be easily interpreted. This software is freely available at https://github.com/oschakoory/RiboTaxa under the GNU Affero General Public License 3.0.

Keywords: metagenomics profiling, microbial diversity, SSU rRNA genes, full-length phylogenetic marker

Procedia PDF Downloads 88
24145 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 128
24144 Incorporating Spatial Transcriptome Data into Ligand-Receptor Analyses to Discover Regional Activation in Cells

Authors: Eric Bang

Abstract:

Interactions between receptors and ligands are crucial for many essential biological processes, including neurotransmission and metabolism. Ligand-receptor analyses that examine cell behavior and interactions often utilize cell type-specific RNA expressions from single-cell RNA sequencing (scRNA-seq) data. Using CellPhoneDB, a public repository consisting of ligands, receptors, and ligand-receptor interactions, the cell-cell interactions were explored in a specific scRNA-seq dataset from kidney tissue and portrayed the results with dot plots and heat maps. Depending on the type of cell, each ligand-receptor pair was aligned with the interacting cell type and calculated the positori probabilities of these associations, with corresponding P values reflecting average expression values between the triads and their significance. Using single-cell data (sample kidney cell references), genes in the dataset were cross-referenced with ones in the existing CellPhoneDB dataset. For example, a gene such as Pleiotrophin (PTN) present in the single-cell data also needed to be present in the CellPhoneDB dataset. Using the single-cell transcriptomics data via slide-seq and reference data, the CellPhoneDB program defines cell types and plots them in different formats, with the two main ones being dot plots and heat map plots. The dot plot displays derived measures of the cell to cell interaction scores and p values. For the dot plot, each row shows a ligand-receptor pair, and each column shows the two interacting cell types. CellPhoneDB defines interactions and interaction levels from the gene expression level, so since the p-value is on a -log10 scale, the larger dots represent more significant interactions. By performing an interaction analysis, a significant interaction was discovered for myeloid and T-cell ligand-receptor pairs, including those between Secreted Phosphoprotein 1 (SPP1) and Fibronectin 1 (FN1), which is consistent with previous findings. It was proposed that an effective protocol would involve a filtration step where cell types would be filtered out, depending on which ligand-receptor pair is activated in that part of the tissue, as well as the incorporation of the CellPhoneDB data in a streamlined workflow pipeline. The filtration step would be in the form of a Python script that expedites the manual process necessary for dataset filtration. Being in Python allows it to be integrated with the CellPhoneDB dataset for future workflow analysis. The manual process involves filtering cell types based on what ligand/receptor pair is activated in kidney cells. One limitation of this would be the fact that some pairings are activated in multiple cells at a time, so the manual manipulation of the data is reflected prior to analysis. Using the filtration script, accurate sorting is incorporated into the CellPhoneDB database rather than waiting until the output is produced and then subsequently applying spatial data. It was envisioned that this would reveal wherein the cell various ligands and receptors are interacting with different cell types, allowing for easier identification of which cells are being impacted and why, for the purpose of disease treatment. The hope is this new computational method utilizing spatially explicit ligand-receptor association data can be used to uncover previously unknown specific interactions within kidney tissue.

Keywords: bioinformatics, Ligands, kidney tissue, receptors, spatial transcriptome

Procedia PDF Downloads 117
24143 A Lifeline Vulnerability Study of Constantine, Algeria

Authors: Mounir Ait Belkacem, Mehdi Boukri, Omar Amellal, Nacim Yousfi, Abderrahmane Kibboua, Med Naboussi Farsi, Mounir Naili

Abstract:

The North of Algeria is located in a seismic zone, then earthquakes are probably the most likely natural disaster that would lead to major lifeline disruption. The adequate operation of lifelines is vital for the economic development of regions under moderate to high seismic activity. After an earthquake, the proper operation of all vital systems is necessary, for instance hospitals for medical attention of the wounded and highways for communication and assistance for victims.In this work we apply the knowledge of pipeline vulnerability to the water supply system, sanitary sewer pipelines (waste water), and telephone in Constantine (Algeria).

Keywords: lifeline, earthquake, vulnerability, pipelines

Procedia PDF Downloads 534
24142 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 180
24141 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 137
24140 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 174