Search results for: data databases
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24902

Search results for: data databases

24662 Private Coded Computation of Matrix Multiplication

Authors: Malihe Aliasgari, Yousef Nejatbakhsh

Abstract:

The era of Big Data and the immensity of real-life datasets compels computation tasks to be performed in a distributed fashion, where the data is dispersed among many servers that operate in parallel. However, massive parallelization leads to computational bottlenecks due to faulty servers and stragglers. Stragglers refer to a few slow or delay-prone processors that can bottleneck the entire computation because one has to wait for all the parallel nodes to finish. The problem of straggling processors, has been well studied in the context of distributed computing. Recently, it has been pointed out that, for the important case of linear functions, it is possible to improve over repetition strategies in terms of the tradeoff between performance and latency by carrying out linear precoding of the data prior to processing. The key idea is that, by employing suitable linear codes operating over fractions of the original data, a function may be completed as soon as enough number of processors, depending on the minimum distance of the code, have completed their operations. The problem of matrix-matrix multiplication in the presence of practically big sized of data sets faced with computational and memory related difficulties, which makes such operations are carried out using distributed computing platforms. In this work, we study the problem of distributed matrix-matrix multiplication W = XY under storage constraints, i.e., when each server is allowed to store a fixed fraction of each of the matrices X and Y, which is a fundamental building of many science and engineering fields such as machine learning, image and signal processing, wireless communication, optimization. Non-secure and secure matrix multiplication are studied. We want to study the setup, in which the identity of the matrix of interest should be kept private from the workers and then obtain the recovery threshold of the colluding model, that is, the number of workers that need to complete their task before the master server can recover the product W. The problem of secure and private distributed matrix multiplication W = XY which the matrix X is confidential, while matrix Y is selected in a private manner from a library of public matrices. We present the best currently known trade-off between communication load and recovery threshold. On the other words, we design an achievable PSGPD scheme for any arbitrary privacy level by trivially concatenating a robust PIR scheme for arbitrary colluding workers and private databases and the proposed SGPD code that provides a smaller computational complexity at the workers.

Keywords: coded distributed computation, private information retrieval, secret sharing, stragglers

Procedia PDF Downloads 107
24661 Prevalence of Depression among Post Stroke Survivors in South Asian Region: A Systematic Review and Meta-Analysis

Authors: Roseminu Varghese, Laveena Anitha Barboza, Jyothi Chakrabarty, Ravishankar

Abstract:

Depression among post-stroke survivors is prevalent, but it is unidentified. The purpose of this review was to determine the pooled prevalence of depression among post-stroke survivors in the South Asian region from all published health sciences research articles. The review also aimed to analyze the disparities in the prevalence of depression among the post-stroke survivors from different study locations. Data search to identify the relevant research articles published from 2005 to 2016 was done by using mesh terms and keywords in Web of Science, PubMed Medline, CINAHL, Scopus, J gate, IndMED databases. The final analysis comprised of 9 studies, including a population of 1,520 men and women. Meta-analysis was performed in STATA version 13.0. The overall pooled post-stroke depression prevalence was 0.46, 95% (CI), (0.3- 0.62). The prevalence rate in this systematic review is evident of depression among post-stroke survivors in the South Asian Region. Identifying the prevalence of post-stroke depression at an early stage is important to improve outcomes of the rehabilitative process of stroke survivors and for its early intervention.

Keywords: depression, post stroke survivors, prevalence, systematic review

Procedia PDF Downloads 145
24660 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 310
24659 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 448
24658 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 256
24657 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 349
24656 New Approach for Constructing a Secure Biometric Database

Authors: A. Kebbeb, M. Mostefai, F. Benmerzoug, Y. Chahir

Abstract:

The multimodal biometric identification is the combination of several biometric systems. The challenge of this combination is to reduce some limitations of systems based on a single modality while significantly improving performance. In this paper, we propose a new approach to the construction and the protection of a multimodal biometric database dedicated to an identification system. We use a topological watermarking to hide the relation between face image and the registered descriptors extracted from other modalities of the same person for more secure user identification.

Keywords: biometric databases, multimodal biometrics, security authentication, digital watermarking

Procedia PDF Downloads 375
24655 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 69
24654 The Impact of the New Head Injury Pathway on the Number of CTs Performed in a Paediatric Population

Authors: Amel M. A. Osman, Roy Mahony, Lisa Dann, McKenna S.

Abstract:

Background: Computed Tomography (CT) is a significant source of radiation in the pediatric population. A new head injury (HI) pathway was introduced in 2021, which altered the previous process of HI being jointly admitted with general pediatrics and surgery to admit these patients under the Emergency Medicine Team. Admitted patients included those with positive CT findings not requiring immediate neurosurgical intervention and those who did not meet current criteria for urgent CT brain as per NICE guidelines but were still symptomatic for prolonged observations. This approach aims to decrease the number of CT scans performed. The main aim is to assess the variation in CT scanning rates since the change in the admitting process. A retrospective review of patients presenting to CHI PECU with HI over 6-month period (01/01/19-31/05/19) compared to a 6-month period post introduction of the new pathway (01/06/2022-31/12/2022). Data was collected from the electronic record databases, symphony, and PACS. Results: In 2019, there were 869 presentations of HI, among which 32 (3.68%) had CT scans performed. 2 (6.25%) of those scanned had positive findings. In 2022, there were 1122 HI presentations, with 47 (4.19%) CT scans performed and positive findings in 5 (10.6%) cases. 57 patients were admitted under the new pathway for observation, with 1 having a CT scan following admission. Conclusion: Quantitative lifetime radiation risks for children are not negligible. While there was no statistically significant reduction in CTs performed amongst HIs presenting to our department, a significant group met the criteria for admission under the PECU consultant for prolonged monitoring. There was also a greater proportion of abnormalities on CT scans performed in 2022, demonstrating improved patient selection for imaging. Further data analysis is ongoing to determine if those who were admitted would have previously been scanned under the old pathway.

Keywords: head injury, CT, admission, guidline

Procedia PDF Downloads 38
24653 DeepNIC a Method to Transform Each Tabular Variable into an Independant Image Analyzable by Basic CNNs

Authors: Nguyen J. M., Lucas G., Ruan S., Digonnet H., Antonioli D.

Abstract:

Introduction: Deep Learning (DL) is a very powerful tool for analyzing image data. But for tabular data, it cannot compete with machine learning methods like XGBoost. The research question becomes: can tabular data be transformed into images that can be analyzed by simple CNNs (Convolutional Neuron Networks)? Will DL be the absolute tool for data classification? All current solutions consist in repositioning the variables in a 2x2 matrix using their correlation proximity. In doing so, it obtains an image whose pixels are the variables. We implement a technology, DeepNIC, that offers the possibility of obtaining an image for each variable, which can be analyzed by simple CNNs. Material and method: The 'ROP' (Regression OPtimized) model is a binary and atypical decision tree whose nodes are managed by a new artificial neuron, the Neurop. By positioning an artificial neuron in each node of the decision trees, it is possible to make an adjustment on a theoretically infinite number of variables at each node. From this new decision tree whose nodes are artificial neurons, we created the concept of a 'Random Forest of Perfect Trees' (RFPT), which disobeys Breiman's concepts by assembling very large numbers of small trees with no classification errors. From the results of the RFPT, we developed a family of 10 statistical information criteria, Nguyen Information Criterion (NICs), which evaluates in 3 dimensions the predictive quality of a variable: Performance, Complexity and Multiplicity of solution. A NIC is a probability that can be transformed into a grey level. The value of a NIC depends essentially on 2 super parameters used in Neurops. By varying these 2 super parameters, we obtain a 2x2 matrix of probabilities for each NIC. We can combine these 10 NICs with the functions AND, OR, and XOR. The total number of combinations is greater than 100,000. In total, we obtain for each variable an image of at least 1166x1167 pixels. The intensity of the pixels is proportional to the probability of the associated NIC. The color depends on the associated NIC. This image actually contains considerable information about the ability of the variable to make the prediction of Y, depending on the presence or absence of other variables. A basic CNNs model was trained for supervised classification. Results: The first results are impressive. Using the GSE22513 public data (Omic data set of markers of Taxane Sensitivity in Breast Cancer), DEEPNic outperformed other statistical methods, including XGBoost. We still need to generalize the comparison on several databases. Conclusion: The ability to transform any tabular variable into an image offers the possibility of merging image and tabular information in the same format. This opens up great perspectives in the analysis of metadata.

Keywords: tabular data, CNNs, NICs, DeepNICs, random forest of perfect trees, classification

Procedia PDF Downloads 101
24652 Prevalence of Visual Impairment among School Children in Ethiopia: A Systematic Review and Meta-Analysis

Authors: Merkineh Markos Lorato, Gedefaw Diress Alene

Abstract:

Introduction: Visual impairment is any condition of the eye or visual system that results in loss/reduction of visual functioning. It significantly influences the academic routine and social activities of children, and the effect is severe for low-income countries like Ethiopia. So, this study aimed to determine the pooled prevalence of visual impairment among school children in Ethiopia. Methods: Databases such as Medical Literature Analysis and Retrieval System Online, Excerpta Medica dataBASE, World Wide Web of Science, and Cochrane Library searched to retrieve eligible articles. In addition, Google Scholar and a reference list of the retrieved eligible articles were addressed. Studies that reported the prevalence of visual impairment were included to estimate the pooled prevalence. Data were extracted using a standardized data extraction format prepared in Microsoft Excel and analysis was held using STATA 11 statistical software. I² was used to assess the heterogeneity. Because of considerable heterogeneity, a random effect meta-analysis model was used to estimate the pooled prevalence of visual impairment among school children in Ethiopia. Results: The result of 9 eligible studies showed that the pooled prevalence of visual impairment among school children in Ethiopia was 7.01% (95% CI: 5.46, 8.56%). In the subgroup analysis, the highest prevalence was reported in South Nations Nationalities and Tigray region together (7.99%; 3.63, 12.35), while the lowest prevalence was reported in Addis Ababa (5.73%; 3.93, 7.53). Conclusion: The prevalence of visual impairment among school children is significantly high in Ethiopia. If it is not detected and intervened early, it will cause a lifetime threat to visually impaired school children, so that school vision screening program plan and its implementation may cure the life quality of future generations in Ethiopia.

Keywords: visual impairment, school children, Ethiopia, prevalence

Procedia PDF Downloads 23
24651 Decision-Tree-Based Foot Disorders Classification Using Demographic Variable

Authors: Adel Khorramrouz, Monireh Ahmadi Bani, Ehsan Norouzi

Abstract:

Background:-Due to the essential role of the foot in movement, foot disorders (FDs) have significant impacts on activity and quality of life. Many studies confirmed the association between FDs and demographic characteristics. On the other hand, recent advances in data collection and statistical analysis led to an increase in the volume of databases. Analysis of patient’s data through the decision tree can be used to explore the relationship between demographic characteristics and FDs. Significance of the study: This study aimed to investigate the relationship between demographic characteristics with common FDs. The second purpose is to better inform foot intervention, we classify FDs based on demographic variables. Methodologies: We analyzed 2323 subjects with pes-planus (PP), pes-cavus (PC), hallux-valgus (HV) and plantar-fasciitis (PF) who were referred to a foot therapy clinic between 2015 and 2021. Subjects had to fulfill the following inclusion criteria: (1) weight between 14 to 150 kilogram, (2) height between 30 to 220, (3) age between 3 to 100 years old, and (4) BMI between 12 to 35. Medical archives of 2323 subjects were recorded retrospectively and all the subjects examined by an experienced physician. Age and BMI were classified into five and four groups, respectively. 80% of the data were randomly selected as training data and 20% tested. We build a decision tree model to classify FDs using demographic characteristics. Findings: Results demonstrated 981 subjects from 2323 (41.9%) of people who were referred to the clinic with FDs were diagnosed as PP, 657 (28.2%) PC, 628 (27%) HV and 213 (9%) identified with PF. The results revealed that the prevalence of PP decreased in people over 18 years of age and in children over 7 years. In adults, the prevalence depends first on BMI and then on gender. About 10% of adults and 81% of children with low BMI have PP. There is no relationship between gender and PP. PC is more dependent on age and gender. In children under 7 years, the prevalence was twice in girls (10%) than boys (5%) and in adults over 18 years slightly higher in men (62% vs 57%). HV increased with age in women and decreased in men. Aging and obesity have increased the prevalence of PF. We conclude that the accuracy of our approach is sufficient for most research applications in FDs. Conclusion:-The increased prevalence of PP in children is probably due to the formation of the arch of the foot at this age. Increasing BMI by applying high pressure on the foot can increase the prevalence of this disorder in the foot. In PC, the Increasing prevalence of PC from women to men with age may be due to genetics and innate susceptibility of men to this disorder. HV is more common in adult women, which may be due to environmental reasons such as shoes, and the prevalence of PF in obese adult women may also be due to higher foot pressure and housekeeping activities.

Keywords: decision tree, demographic characteristics, foot disorders, machine learning

Procedia PDF Downloads 249
24650 A Systematic Review of Prevalence, Gender and Age Differences in Cyberbullying Studies in Croatia

Authors: Stjepka Popović, Lucija Vejmelka

Abstract:

Background: Cyberbullying has become a prevalent issue worldwide, including in Croatia. However, a comprehensive understanding of the extent and nature of cyberbullying in the Croatian context is lacking. Objective: The objective of this systematic review is to evaluate the quality of current research conducted in Croatia on the subject of cyberbullying, identify any gaps in the research, and provide suggestions for future investigations. It examines the prevalence gender and age differences of cyberbullying in Croatia. Participants and Setting: Research is done on secondary data resources (published studies) of cyberbullying in Croatia. The participants in these studies that were included in systematic review are children and youth of all ages residing in Croatia who have been involved in cyberbullying incidents. The setting includes various environments where cyberbullying may occur, such as social media platforms and educational institutions. Methods: To identify pertinent studies on cyberbullying in Croatia, a comprehensive exploration of both international and domestic electronic databases was systematically undertaken. Relevant studies were chosen according to predefined criteria that determined inclusion and exclusion. Key findings from the selected studies were extracted and synthesized, enabling the identification of patterns in the data. Results: A total of 43 studies that fulfilled the inclusion criteria were identified in the review. The prevalence of cyberbullying victimization in Croatia ranged from 7% - 55.3%, with adolescents being the most affected group. The prevalence of cyberbullying perpetration was ranging from 3.2% - 30.3%. The most prevalent form of cyberbullying included gossiping and mocking others. Gender and age differences are highlighted. Conclusions: The outcomes of this systematic review highlight the pressing need for targeted interventions and preventative measures to address cyberbullying in Croatia. Additionally, it is crucial to conduct further research to investigate the long-term impacts and potential factors that can help mitigate cyberbullying in the context of Croatia.

Keywords: cyberbullying, online risky behavior, Croatia, systematic review

Procedia PDF Downloads 69
24649 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing

Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti

Abstract:

Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.

Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis

Procedia PDF Downloads 119
24648 Furniture Embodied Carbon Calculator for Interior Design Projects

Authors: Javkhlan Nyamjav, Simona Fischer, Lauren Garner, Veronica McCracken

Abstract:

Current whole building life cycle assessments (LCA) primarily focus on structural and major architectural elements to measure building embodied carbon. Most of the interior finishes and fixtures are available on digital tools (such as Tally); however, furniture is still left unaccounted for. Due to its repeated refreshments and its complexity, furniture embodied carbon can accumulate over time, becoming comparable to structure and envelope numbers. This paper presents a method to calculate the Global Warming Potential (GWP) of furniture elements in commercial buildings. The calculator uses the quantity takeoff method with GWP averages gathered from environmental product declarations (EPD). The data was collected from EPD databases and furniture manufacturers from North America to Europe. A total of 48 GWP numbers were collected, with 16 GWP coming from alternative EPD. The finalized calculator shows the average GWP of typical commercial furniture and helps the decision-making process to reduce embodied carbon. The calculator was tested on MSR Design projects and showed furniture can account for more than half of the interior embodied carbon. The calculator highlights the importance of adding furniture to the overall conversation. However, the data collection process showed a) acquiring furniture EPD is not straightforward as other building materials; b) there are very limited furniture EPD, which can be explained from many perspectives, including the EPD price; c) the EPD themselves vary in terms of units, LCA scopes, and timeframes, which makes it hard to compare the products. Even though there are current limitations, the emerging focus on interior embodied carbon will create more demand for furniture EPD. It will allow manufacturers to represent all their efforts on reducing embodied carbon. In addition, the study concludes with recommendations on how designers can reduce furniture-embodied carbon through reuse and closed-loop systems.

Keywords: furniture, embodied carbon, calculator, tenant improvement, interior design

Procedia PDF Downloads 191
24647 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 361
24646 Creative Mapping Landuse and Human Activities: From the Inventories of Factories to the History of the City and Citizens

Authors: R. Tamborrino, F. Rinaudo

Abstract:

Digital technologies offer possibilities to effectively convert historical archives into instruments of knowledge able to provide a guide for the interpretation of historical phenomena. Digital conversion and management of those documents allow the possibility to add other sources in a unique and coherent model that permits the intersection of different data able to open new interpretations and understandings. Urban history uses, among other sources, the inventories that register human activities in a specific space (e.g. cadastres, censuses, etc.). The geographic localisation of that information inside cartographic supports allows for the comprehension and visualisation of specific relationships between different historical realities registering both the urban space and the peoples living there. These links that merge the different nature of data and documentation through a new organisation of the information can suggest a new interpretation of other related events. In all these kinds of analysis, the use of GIS platforms today represents the most appropriate answer. The design of the related databases is the key to realise the ad-hoc instrument to facilitate the analysis and the intersection of data of different origins. Moreover, GIS has become the digital platform where it is possible to add other kinds of data visualisation. This research deals with the industrial development of Turin at the beginning of the 20th century. A census of factories realized just prior to WWI provides the opportunity to test the potentialities of GIS platforms for the analysis of urban landscape modifications during the first industrial development of the town. The inventory includes data about location, activities, and people. GIS is shaped in a creative way linking different sources and digital systems aiming to create a new type of platform conceived as an interface integrating different kinds of data visualisation. The data processing allows linking this information to an urban space, and also visualising the growth of the city at that time. The sources, related to the urban landscape development in that period, are of a different nature. The emerging necessity to build, enlarge, modify and join different buildings to boost the industrial activities, according to their fast development, is recorded by different official permissions delivered by the municipality and now stored in the Historical Archive of the Municipality of Turin. Those documents, which are reports and drawings, contain numerous data on the buildings themselves, including the block where the plot is located, the district, and the people involved such as the owner, the investor, and the engineer or architect designing the industrial building. All these collected data offer the possibility to firstly re-build the process of change of the urban landscape by using GIS and 3D modelling technologies thanks to the access to the drawings (2D plans, sections and elevations) that show the previous and the planned situation. Furthermore, they access information for different queries of the linked dataset that could be useful for different research and targets such as economics, biographical, architectural, or demographical. By superimposing a layer of the present city, the past meets to the present-industrial heritage, and people meet urban history.

Keywords: digital urban history, census, digitalisation, GIS, modelling, digital humanities

Procedia PDF Downloads 179
24645 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 70
24644 Clinical Validation of an Automated Natural Language Processing Algorithm for Finding COVID-19 Symptoms and Complications in Patient Notes

Authors: Karolina Wieczorek, Sophie Wiliams

Abstract:

Introduction: Patient data is often collected in Electronic Health Record Systems (EHR) for purposes such as providing care as well as reporting data. This information can be re-used to validate data models in clinical trials or in epidemiological studies. Manual validation of automated tools is vital to pick up errors in processing and to provide confidence in the output. Mentioning a disease in a discharge letter does not necessarily mean that a patient suffers from this disease. Many of them discuss a diagnostic process, different tests, or discuss whether a patient has a certain disease. The COVID-19 dataset in this study used natural language processing (NLP), an automated algorithm which extracts information related to COVID-19 symptoms, complications, and medications prescribed within the hospital. Free-text patient clinical patient notes are rich sources of information which contain patient data not captured in a structured form, hence the use of named entity recognition (NER) to capture additional information. Methods: Patient data (discharge summary letters) were exported and screened by an algorithm to pick up relevant terms related to COVID-19. Manual validation of automated tools is vital to pick up errors in processing and to provide confidence in the output. A list of 124 Systematized Nomenclature of Medicine (SNOMED) Clinical Terms has been provided in Excel with corresponding IDs. Two independent medical student researchers were provided with a dictionary of SNOMED list of terms to refer to when screening the notes. They worked on two separate datasets called "A” and "B”, respectively. Notes were screened to check if the correct term had been picked-up by the algorithm to ensure that negated terms were not picked up. Results: Its implementation in the hospital began on March 31, 2020, and the first EHR-derived extract was generated for use in an audit study on June 04, 2020. The dataset has contributed to large, priority clinical trials (including International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) by bulk upload to REDcap research databases) and local research and audit studies. Successful sharing of EHR-extracted datasets requires communicating the provenance and quality, including completeness and accuracy of this data. The results of the validation of the algorithm were the following: precision (0.907), recall (0.416), and F-score test (0.570). Percentage enhancement with NLP extracted terms compared to regular data extraction alone was low (0.3%) for relatively well-documented data such as previous medical history but higher (16.6%, 29.53%, 30.3%, 45.1%) for complications, presenting illness, chronic procedures, acute procedures respectively. Conclusions: This automated NLP algorithm is shown to be useful in facilitating patient data analysis and has the potential to be used in more large-scale clinical trials to assess potential study exclusion criteria for participants in the development of vaccines.

Keywords: automated, algorithm, NLP, COVID-19

Procedia PDF Downloads 86
24643 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 409
24642 Hsa-miR-192-5p, and Hsa-miR-129-5p Prominent Biomarkers in Regulation Glioblastoma Cancer Stem Cells Genes Microenvironment

Authors: Rasha Ahmadi

Abstract:

Glioblastoma is one of the most frequent brain malignancies, having a high mortality rate and limited survival in individuals with this malignancy. Despite different treatments and surgery, recurrence of glioblastoma cancer stem cells may arise as a subsequent tumor. For this reason, it is crucial to research the markers associated with glioblastoma stem cells and specifically their microenvironment. In this study, using bioinformatics analysis, we analyzed and nominated genes in the microenvironment pathways of glioblastoma stem cells. In this study, an appropriate database was selected for analysis by referring to the GEO database. This dataset comprised gene expression patterns in stem cells derived from glioblastoma patients. Gene clusters were divided as high and low expression. Enrichment databases such as Enrichr, STRING, and GEPIA were utilized to analyze the data appropriately. Finally, we extracted the potential genes 2700 high-expression and 1100 low-expression genes are implicated in the metabolic pathways of glioblastoma cancer progression. Cellular senescence, MAPK, TNF, hypoxia, zimosterol biosynthesis, and phosphatidylinositol metabolism pathways were substantially expressed and the metabolic pathways were downregulated. After assessing the association between protein networks, MSMP, SOX2, FGD4 ,and CNTNAP3 genes with high expression and DMKN and SBSN genes with low were selected. All of these genes were observed in the survival curve, with a survival of fewer than 10 percent over around 15 months. hsa-mir-192-5p, hsa-mir-129-5p, hsa-mir-215-5p, hsa-mir-335-5p, and hsa-mir-340-5p played key function in glioblastoma cancer stem cells microenviroments. We introduced critical genes through integrated and regular bioinformatics studies by assessing the amount of gene expression profile data that can play an important role in targeting genes involved in the energy and microenvironment of glioblastoma cancer stem cells. Have. This study indicated that hsa-mir-192-5p, and hsa-mir-129-5p are appropriate candidates for this.

Keywords: Glioblastoma, Cancer Stem Cells, Biomarker Discovery, Gene Expression Profiles, Bioinformatics Analysis, Tumor Microenvironment

Procedia PDF Downloads 123
24641 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 164
24640 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 835
24639 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 66
24638 Practice Patterns of Physiotherapists for Learners with Disabilities at Special Schools: A Scoping Review

Authors: Lubisi L. V., Madumo M. B., Mudau N. P., Makhuvele L., Sibuyi M. M.

Abstract:

Background and Aims: Learners with disabilities can be integrated into mainstream schools, whereas there are those learners that are accommodated in special schools based on the support needs they require. These needs, among others, pertain to access to high-intensity therapeutic support by physiotherapists, occupational therapists, and speech therapists. However, access to physiotherapists in low- and middle-income countries is limited, and this creates a knowledge gap in identifying, to the best of our knowledge, best practice patterns aligned with physiotherapy at special schools. This gap compromises the quality of support to be rendered towards strengthening rehabilitation and optimising the participation of learners with disabilities in special schools. The aim of the scoping review was to map the evidence on practice patterns employed by physiotherapists at special schools for learners with disabilities. Methods: The Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines were followed. Key terms regarding physiotherapy practice patterns for learners with disabilities at special schools were used to search the literature on the databases. Literature was sourced from Google Scholar, EBSCO, PEDro, PubMed, and Research Gate from 2013 to 2023. A total of 28 articles were initially retrieved and after a process of screening and exclusion, nine articles were included. All the researchers reviewed the articles for eligibility. Articles were initially screened based on the titles, followed by full text. Articles written in English or translated into English mentioned physical / physiotherapy interventions in special schools, both published and unpublished, were included. A qualitative data extraction template was developed and an inductive approach to thematic data analysis was used for included articles to see which themes emerged. Results: Three themes emerged after inductive thematic data analysis. 1. Collaboration with educators, parents, and therapists 2. Family Centred Approach 3. Telehealth. Conclusion: Collaboration is key in delivering therapeutic support to learners with disabilities at special schools. Physiotherapists need to be collaborators at the level of interprofessional and transprofessional. In addition, they need to explore technology to work remotely, especially when learners become absent physically from school.

Keywords: learners with disabilities, special school, physiotherapists, therapeutic support

Procedia PDF Downloads 53
24637 A Bibliometric Assessment of the Nexus Between Corporate Social Responsibility and Sustainable Development

Authors: Trilochana Dash, Chandan Kumar Sahoo

Abstract:

In today's environment of intensive industrialization, the role of business in societal modernization is critical. The concept of corporate social responsibility (CSR) arose due to rising societal awareness of company conduct. Corporations that practice CSR devote a portion of their profits to society’s sustainable development (SD). The concept of CSR and SD has increased the impact of industries on society. In this study, bibliometric analysis was conducted using the “R” programming language to determine the comprehensiveness of CSR and SD. From 2003 to 2022, bibliometric data was collected from two databases: Scopus and Web of Science (WOS). According to the findings, CSR and SD research has risen exponentially in the past two decades, and “Corporate Social Responsibility and Environment Management” emerged as the most influential journal in this field. The findings also show that relatively very few researchers collaborate in CSR and SD research in the last twenty years. It is widely acknowledged that most CSR and SD research is conducted in developed countries and developing countries undergoing fast industrialization. Thematic evolution and cluster analysis clearly show that the notion of CSR and SD among scholars has been quite popular over the last two decades. Finally, limitations and future directions are discussed.

Keywords: corporate social responsibility, sustainable development, bibliometric analysis, “R” programming language, visualization, holistic picture

Procedia PDF Downloads 73
24636 The Accuracy of Measures for Screening Adults for Spiritual Suffering in Health Care Settings: A Systematic Review

Authors: Sayna Bahraini, Wendy Gifford, Ian Graham, Liquaa Wazni, Suzettee Bremault-Phillips, Rebekah Hackbusch, Catrine Demers, Mary Egan

Abstract:

Objective: Guidelines for palliative and spiritual care emphasize the importance of screening patients for spiritual suffering. The aim of this review was to synthesize the research evidence on the accuracy of measures used to screen adults for spiritual suffering. Methods: A systematic review has been conducted. We searched five scientific databases to identify relevant articles. Two independent reviewers screened extracted data and assessed study methodological quality. Results: We identified five articles that yielded information on 24 spiritual screening measures. Among all identified measures, the 2-item Meaning/Joy & Self-Described Struggle has the highest sensitivity (82-87%), and the revised Rush protocol has the highest specificity (81-90%). The methodological quality of all included studies was low. Significance of Results: While most of the identified spiritual screening measures are brief (comprise 1 to 12 number of items), few have sufficient accuracy to effectively screen patients for spiritual suffering. We advise clinicians to use their critical appraisal skills and clinical judgment when selecting and using any of the identified measures to screen for spiritual suffering.

Keywords: screening, suffering, spirituality, diagnostic test accuracy, systematic review

Procedia PDF Downloads 127
24635 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 93
24634 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 165
24633 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 489