Search results for: data personalization
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25162

Search results for: data personalization

23032 Measures of Phylogenetic Support for Phylogenomic and the Whole Genomes of Two Lungfish Restate Lungfish and Origin of Land Vertebrates

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to reassess the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high gene support confidence with confidence intervals exceeding 95%, high internode certainty, and high gene concordance factor. The evidence stems from two datasets containing recently deciphered whole genomes of two lungfish species, as well as five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa diminishes the number of orthologues and leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction (LBA) and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: gene support confidence (GSC), origin of land vertebrates, coelacanth, two whole genomes of lungfishes, confidence intervals

Procedia PDF Downloads 87
23031 Big Data for Local Decision-Making: Indicators Identified at International Conference on Urban Health 2017

Authors: Dana R. Thomson, Catherine Linard, Sabine Vanhuysse, Jessica E. Steele, Michal Shimoni, Jose Siri, Waleska Caiaffa, Megumi Rosenberg, Eleonore Wolff, Tais Grippa, Stefanos Georganos, Helen Elsey

Abstract:

The Sustainable Development Goals (SDGs) and Urban Health Equity Assessment and Response Tool (Urban HEART) identify dozens of key indicators to help local decision-makers prioritize and track inequalities in health outcomes. However, presentations and discussions at the International Conference on Urban Health (ICUH) 2017 suggested that additional indicators are needed to make decisions and policies. A local decision-maker may realize that malaria or road accidents are a top priority. However, s/he needs additional health determinant indicators, for example about standing water or traffic, to address the priority and reduce inequalities. Health determinants reflect the physical and social environments that influence health outcomes often at community- and societal-levels and include such indicators as access to quality health facilities, access to safe parks, traffic density, location of slum areas, air pollution, social exclusion, and social networks. Indicator identification and disaggregation are necessarily constrained by available datasets – typically collected about households and individuals in surveys, censuses, and administrative records. Continued advancements in earth observation, data storage, computing and mobile technologies mean that new sources of health determinants indicators derived from 'big data' are becoming available at fine geographic scale. Big data includes high-resolution satellite imagery and aggregated, anonymized mobile phone data. While big data are themselves not representative of the population (e.g., satellite images depict the physical environment), they can provide information about population density, wealth, mobility, and social environments with tremendous detail and accuracy when combined with population-representative survey, census, administrative and health system data. The aim of this paper is to (1) flag to data scientists important indicators needed by health decision-makers at the city and sub-city scale - ideally free and publicly available, and (2) summarize for local decision-makers new datasets that can be generated from big data, with layperson descriptions of difficulties in generating them. We include SDGs and Urban HEART indicators, as well as indicators mentioned by decision-makers attending ICUH 2017.

Keywords: health determinant, health outcome, mobile phone, remote sensing, satellite imagery, SDG, urban HEART

Procedia PDF Downloads 209
23030 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of Big Data Technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centers or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through Vader and Roberta model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and TFIDF Vectorization, and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: counter vectorization, convolutional neural network, crawler, data technology, long short-term memory, web scraping, sentiment analysis

Procedia PDF Downloads 88
23029 Working in Multidisciplinary Care Teams: Perspectives from Health Care and Social Service Providers

Authors: Lindy Van Vliet, Saloni Phadke, Anthea Nelson, Ann Gallant

Abstract:

Holistic and patient-centred palliative care and support require an integrated system of care that includes health and social service providers working together to ensure that patients and families have access to the care they need. The objective of this study is to further explore and understand the benefits and challenges of mobilizing multidisciplinary care teams for health care professionals and social service providers. Drawing on an interpretivist, exploratory, qualitative design, our multidisciplinary research team (medicine, nursing and social work) conducted interviews with 15 health care and social service providers in the Ottawa region. Interview data was audio-recorded, transcribed, and analyzed using a reflexive thematic analysis approach. The data deepens our understandings of the facilitators and barriers posed by multidisciplinary care teams. Three main findings emerged: First, the data highlighted the benefits of multidisciplinary care teams for both patient outcomes and quality of life and provider mental health; second, the data showed that the lack of a system-wide integrated communication system reduces the quality of patient care and increases provider stress while working in multidisciplinary care teams; finally, the data demonstrated the existence of implicit hierarchies between disciplines, this coupled with different disciplinary perspectives of palliative care provision can lead to friction and challenges within care teams. These findings will have important implications for the future of palliative care as they will help to facilitate and build stronger person-centred/relationship-centred palliative care practices by naming the challenges faced by multidisciplinary palliative care teams and providing examples of best practices.

Keywords: public health palliative care, palliative care nursing, care networks, integrated health care, palliative care approach, public health, multidisciplinary work, care teams

Procedia PDF Downloads 82
23028 Evaluation of Vehicle Classification Categories: Florida Case Study

Authors: Ren Moses, Jaqueline Masaki

Abstract:

This paper addresses the need for accurate and updated vehicle classification system through a thorough evaluation of vehicle class categories to identify errors arising from the existing system and proposing modifications. The data collected from two permanent traffic monitoring sites in Florida were used to evaluate the performance of the existing vehicle classification table. The vehicle data were collected and classified by the automatic vehicle classifier (AVC), and a video camera was used to obtain ground truth data. The Federal Highway Administration (FHWA) vehicle classification definitions were used to define vehicle classes from the video and compare them to the data generated by AVC in order to identify the sources of misclassification. Six types of errors were identified. Modifications were made in the classification table to improve the classification accuracy. The results of this study include the development of updated vehicle classification table with a reduction in total error by 5.1%, a step by step procedure to use for evaluation of vehicle classification studies and recommendations to improve FHWA 13-category rule set. The recommendations for the FHWA 13-category rule set indicate the need for the vehicle classification definitions in this scheme to be updated to reflect the distribution of current traffic. The presented results will be of interest to States’ transportation departments and consultants, researchers, engineers, designers, and planners who require accurate vehicle classification information for planning, designing and maintenance of transportation infrastructures.

Keywords: vehicle classification, traffic monitoring, pavement design, highway traffic

Procedia PDF Downloads 180
23027 The Significant of Effective Leadership on Management Growth and Survival: A Case Study of Bunato Limited Company, Ring Road Ibadan

Authors: A. S. Adegoke, O. N. Popoola

Abstract:

The central purpose of management in any organization is that of coordinating the efforts of people towards the achievement of its goal. Effective and productive management is the function of leadership. Leadership plays a critical role in helping groups, organizations and societies to achieve their goals. Factors considered to make leadership to be effective are intelligence, social maturity, inner motivation and achievement drives and lastly, human relations attitudes. The factors affecting leadership style and effectiveness were examined. Also, the study examined which of the various leadership style best befits an organization and discussed the ways in which the style was determined. In order to meet the objectives of this study, different types of methods of data gathering were carried out. The methods include data from primary and secondary sources. The primary sources include personal interview, personal observation, and questionnaire while data from secondary sources were derived from various books, journal write up and other documentary records. Data were collected from respondents through questionnaire, and the field research carried out through oral interview to test each of the related hypotheses. From the data analysed it was determined that 45% strongly agreed that leadership traits are inborn not acquired and 28.3% agreed that leadership traits are inborn, while 11.7% and 10% strongly disagreed and disagreed respectively and 5% were undecided. 48.4% strongly agreed, and 43.3% agreed that environmental factors determined the appropriate style of leadership to be employed while 3.3% strongly disagreed, 1.7% disagreed and 3.3% were undecided. From the study, no single style of leadership is appropriate in any situation instead of concentrating on single leadership style; leader can vary approaches depending on forces in the leaders, characteristic of the subordinates, situational forces of the organization, lastly the expectations and behaviour of superior.

Keywords: hypothesis, leadership, management, organization

Procedia PDF Downloads 144
23026 Validation of Mapping Historical Linked Data to International Committee for Documentation (CIDOC) Conceptual Reference Model Using Shapes Constraint Language

Authors: Ghazal Faraj, András Micsik

Abstract:

Shapes Constraint Language (SHACL), a World Wide Web Consortium (W3C) language, provides well-defined shapes and RDF graphs, named "shape graphs". These shape graphs validate other resource description framework (RDF) graphs which are called "data graphs". The structural features of SHACL permit generating a variety of conditions to evaluate string matching patterns, value type, and other constraints. Moreover, the framework of SHACL supports high-level validation by expressing more complex conditions in languages such as SPARQL protocol and RDF Query Language (SPARQL). SHACL includes two parts: SHACL Core and SHACL-SPARQL. SHACL Core includes all shapes that cover the most frequent constraint components. While SHACL-SPARQL is an extension that allows SHACL to express more complex customized constraints. Validating the efficacy of dataset mapping is an essential component of reconciled data mechanisms, as the enhancement of different datasets linking is a sustainable process. The conventional validation methods are the semantic reasoner and SPARQL queries. The former checks formalization errors and data type inconsistency, while the latter validates the data contradiction. After executing SPARQL queries, the retrieved information needs to be checked manually by an expert. However, this methodology is time-consuming and inaccurate as it does not test the mapping model comprehensively. Therefore, there is a serious need to expose a new methodology that covers the entire validation aspects for linking and mapping diverse datasets. Our goal is to conduct a new approach to achieve optimal validation outcomes. The first step towards this goal is implementing SHACL to validate the mapping between the International Committee for Documentation (CIDOC) conceptual reference model (CRM) and one of its ontologies. To initiate this project successfully, a thorough understanding of both source and target ontologies was required. Subsequently, the proper environment to run SHACL and its shape graphs were determined. As a case study, we performed SHACL over a CIDOC-CRM dataset after running a Pellet reasoner via the Protégé program. The applied validation falls under multiple categories: a) data type validation which constrains whether the source data is mapped to the correct data type. For instance, checking whether a birthdate is assigned to xsd:datetime and linked to Person entity via crm:P82a_begin_of_the_begin property. b) Data integrity validation which detects inconsistent data. For instance, inspecting whether a person's birthdate occurred before any of the linked event creation dates. The expected results of our work are: 1) highlighting validation techniques and categories, 2) selecting the most suitable techniques for those various categories of validation tasks. The next plan is to establish a comprehensive validation model and generate SHACL shapes automatically.

Keywords: SHACL, CIDOC-CRM, SPARQL, validation of ontology mapping

Procedia PDF Downloads 253
23025 Impacts of Urbanization on Forest and Agriculture Areas in Savannakhet Province, Lao People's Democratic Republic

Authors: Chittana Phompila

Abstract:

The current increased population pushes increasing demands for natural resources and living space. In Laos, urban areas have been expanding rapidly in recent years. The rapid urbanization can have negative impacts on landscapes, including forest and agriculture lands. The primary objective of this research were to map current urban areas in a large city in Savannakhet province, in Laos, 2) to compare changes in urbanization between 1990 and 2018, and 3) to estimate forest and agriculture areas lost due to expansions of urban areas during the last over twenty years within study area. Landsat 8 data was used and existing GIS data was collected including spatial data on rivers, lakes, roads, vegetated areas and other land use/land covers). GIS data was obtained from the government sectors. Object based classification (OBC) approach was applied in ECognition for image processing and analysis of urban area using. Historical data from other Landsat instruments (Landsat 5 and 7) were used to allow us comparing changes in urbanization in 1990, 2000, 2010 and 2018 in this study area. Only three main land cover classes were focused and classified, namely forest, agriculture and urban areas. Change detection approach was applied to illustrate changes in built-up areas in these periods. Our study shows that the overall accuracy of map was 95% assessed, kappa~ 0.8. It is found that that there is an ineffective control over forest and land-use conversions from forests and agriculture to urban areas in many main cities across the province. A large area of agriculture and forest has been decreased due to this conversion. Uncontrolled urban expansion and inappropriate land use planning can lead to creating a pressure in our resource utilisation. As consequence, it can lead to food insecurity and national economic downturn in a long term.

Keywords: urbanisation, forest cover, agriculture areas, Landsat 8 imagery

Procedia PDF Downloads 158
23024 Data-Driven Surrogate Models for Damage Prediction of Steel Liquid Storage Tanks under Seismic Hazard

Authors: Laura Micheli, Majd Hijazi, Mahmoud Faytarouni

Abstract:

The damage reported by oil and gas industrial facilities revealed the utmost vulnerability of steel liquid storage tanks to seismic events. The failure of steel storage tanks may yield devastating and long-lasting consequences on built and natural environments, including the release of hazardous substances, uncontrolled fires, and soil contamination with hazardous materials. It is, therefore, fundamental to reliably predict the damage that steel liquid storage tanks will likely experience under future seismic hazard events. The seismic performance of steel liquid storage tanks is usually assessed using vulnerability curves obtained from the numerical simulation of a tank under different hazard scenarios. However, the computational demand of high-fidelity numerical simulation models, such as finite element models, makes the vulnerability assessment of liquid storage tanks time-consuming and often impractical. As a solution, this paper presents a surrogate model-based strategy for predicting seismic-induced damage in steel liquid storage tanks. In the proposed strategy, the surrogate model is leveraged to reduce the computational demand of time-consuming numerical simulations. To create the data set for training the surrogate model, field damage data from past earthquakes reconnaissance surveys and reports are collected. Features representative of steel liquid storage tank characteristics (e.g., diameter, height, liquid level, yielding stress) and seismic excitation parameters (e.g., peak ground acceleration, magnitude) are extracted from the field damage data. The collected data are then utilized to train a surrogate model that maps the relationship between tank characteristics, seismic hazard parameters, and seismic-induced damage via a data-driven surrogate model. Different types of surrogate algorithms, including naïve Bayes, k-nearest neighbors, decision tree, and random forest, are investigated, and results in terms of accuracy are reported. The model that yields the most accurate predictions is employed to predict future damage as a function of tank characteristics and seismic hazard intensity level. Results show that the proposed approach can be used to estimate the extent of damage in steel liquid storage tanks, where the use of data-driven surrogates represents a viable alternative to computationally expensive numerical simulation models.

Keywords: damage prediction , data-driven model, seismic performance, steel liquid storage tanks, surrogate model

Procedia PDF Downloads 143
23023 Using Risk Management Indicators in Decision Tree Analysis

Authors: Adel Ali Elshaibani

Abstract:

Risk management indicators augment the reporting infrastructure, particularly for the board and senior management, to identify, monitor, and manage risks. This enhancement facilitates improved decision-making throughout the banking organization. Decision tree analysis is a tool that visually outlines potential outcomes, costs, and consequences of complex decisions. It is particularly beneficial for analyzing quantitative data and making decisions based on numerical values. By calculating the expected value of each outcome, decision tree analysis can help assess the best course of action. In the context of banking, decision tree analysis can assist lenders in evaluating a customer’s creditworthiness, thereby preventing losses. However, applying these tools in developing countries may face several limitations, such as data availability, lack of technological infrastructure and resources, lack of skilled professionals, cultural factors, and cost. Moreover, decision trees can create overly complex models that do not generalize well to new data, known as overfitting. They can also be sensitive to small changes in the data, which can result in different tree structures and can become computationally expensive when dealing with large datasets. In conclusion, while risk management indicators and decision tree analysis are beneficial for decision-making in banks, their effectiveness is contingent upon how they are implemented and utilized by the board of directors, especially in the context of developing countries. It’s important to consider these limitations when planning to implement these tools in developing countries.

Keywords: risk management indicators, decision tree analysis, developing countries, board of directors, bank performance, risk management strategy, banking institutions

Procedia PDF Downloads 60
23022 Observatory of Sustainability of the Algarve Region for Tourism: Proposal for Environmental and Sociocultural Indicators

Authors: Miguel José Oliveira, Fátima Farinha, Elisa M. J. da Silva, Rui Lança, Manuel Duarte Pinheiro, Cátia Miguel

Abstract:

The Observatory of Sustainability of the Algarve Region for Tourism (OBSERVE) will be a valuable tool to assess the sustainability of this region. The OBSERVE tool is designed to provide data and maintain an up-to-date, consistent set of indicators defined to describe the region on the environmental, sociocultural, economic and institutional domains. This ongoing two-year project has the active participation of the Algarve’s stakeholders, since they were consulted and asked to participate in the discussion for the indicators proposal. The environmental and sociocultural indicators chosen must indicate the characteristics of the region and should be in alignment with other global systems used to monitor the sustainability. This paper presents a review of sustainability indicators systems that support the first proposal for the environmental and sociocultural indicators. Others constraints are discussed, namely the existing data and the data available in digital platforms in a format suitable for automatic importation to the platform of OBSERVE. It is intended that OBSERVE will be a valuable tool to assess the sustainability of the region of Algarve.

Keywords: Algarve, development, environmental indicators, observatory, sociocultural indicators, sustainability, tourism

Procedia PDF Downloads 176
23021 Emerging Cyber Threats and Cognitive Vulnerabilities: Cyberterrorism

Authors: Oludare Isaac Abiodun, Esther Omolara Abiodun

Abstract:

The purpose of this paper is to demonstrate that cyberterrorism is existing and poses a threat to computer security and national security. Nowadays, people have become excitedly dependent upon computers, phones, the Internet, and the Internet of things systems to share information, communicate, conduct a search, etc. However, these network systems are at risk from a different source that is known and unknown. These network systems risk being caused by some malicious individuals, groups, organizations, or governments, they take advantage of vulnerabilities in the computer system to hawk sensitive information from people, organizations, or governments. In doing so, they are engaging themselves in computer threats, crime, and terrorism, thereby making the use of computers insecure for others. The threat of cyberterrorism is of various forms and ranges from one country to another country. These threats include disrupting communications and information, stealing data, destroying data, leaking, and breaching data, interfering with messages and networks, and in some cases, demanding financial rewards for stolen data. Hence, this study identifies many ways that cyberterrorists utilize the Internet as a tool to advance their malicious mission, which negatively affects computer security and safety. One could identify causes for disparate anomaly behaviors and the theoretical, ideological, and current forms of the likelihood of cyberterrorism. Therefore, for a countermeasure, this paper proposes the use of previous and current computer security models as found in the literature to help in countering cyberterrorism

Keywords: cyberterrorism, computer security, information, internet, terrorism, threat, digital forensic solution

Procedia PDF Downloads 96
23020 Reliability Prediction of Tires Using Linear Mixed-Effects Model

Authors: Myung Hwan Na, Ho- Chun Song, EunHee Hong

Abstract:

We widely use normal linear mixed-effects model to analysis data in repeated measurement. In case of detecting heteroscedasticity and the non-normality of the population distribution at the same time, normal linear mixed-effects model can give improper result of analysis. To achieve more robust estimation, we use heavy tailed linear mixed-effects model which gives more exact and reliable analysis conclusion than standard normal linear mixed-effects model.

Keywords: reliability, tires, field data, linear mixed-effects model

Procedia PDF Downloads 564
23019 Data Quality and Associated Factors on Regular Immunization Programme at Ararso District: Somali Region- Ethiopia

Authors: Eyob Seife, Molla Alemayaehu, Tesfalem Teshome, Bereket Seyoum, Behailu Getachew

Abstract:

Globally, immunization averts between 2 and 3 million deaths yearly, but Vaccine-Preventable Diseases still account for more in Sub-Saharan African countries and takes the majority of under-five deaths yearly, which indicates the need for consistent and on-time information to have evidence-based decision so as to save lives of these vulnerable groups. However, ensuring data of sufficient quality and promoting an information-use culture at the point of collection remains critical and challenging, especially in remote areas where the Ararso district is selected based on a hypothesis of there is a difference in reported and recounted immunization data consistency. Data quality is dependent on different factors where organizational, behavioral, technical and contextual factors are the mentioned ones. A cross-sectional quantitative study was conducted on September 2022 in the Ararso district. The study used the world health organization (WHO) recommended data quality self-assessment (DQS) tools. Immunization tally sheets, registers and reporting documents were reviewed at 4 health facilities (1 health center and 3 health posts) of primary health care units for one fiscal year (12 months) to determine the accuracy ratio, availability and timeliness of reports. The data was collected by trained DQS assessors to explore the quality of monitoring systems at health posts, health centers, and at the district health office. A quality index (QI), availability and timeliness of reports were assessed. Accuracy ratios formulated were: the first and third doses of pentavalent vaccines, fully immunized (FI), TT2+ and the first dose of measles-containing vaccines (MCV). In this study, facility-level results showed poor timeliness at all levels and both over-reporting and under-reporting were observed at all levels when computing the accuracy ratio of registration to health post reports found at health centers for almost all antigens verified. A quality index (QI) of all facilities also showed poor results. Most of the verified immunization data accuracy ratios were found to be relatively better than that of quality index and timeliness of reports. So attention should be given to improving the capacity of staff, timeliness of reports and quality of monitoring system components, namely recording, reporting, archiving, data analysis and using information for decisions at all levels, especially in remote and areas.

Keywords: accuracy ratio, ararso district, quality of monitoring system, regular immunization program, timeliness of reports, Somali region-Ethiopia

Procedia PDF Downloads 72
23018 Study on the Demolition Waste Management in Malaysia Construction Industry

Authors: Gunalan Vasudevan

Abstract:

The Malaysia construction industry generates a large quantity of construction and demolition waste nowadays. In the handbook for demolition work only comprised small portion of demolition waste management. It is important to study and determine the ways to provide a practical guide for the professional in the building industry about handling the demolition waste. In general, demolition defined as tearing down or wrecking of structural work or architectural work of the building and other infrastructures work such as road, bridge and etc. It’s a common misconception that demolition is nothing more than taking down a structure and carrying the debris to a landfill. On many projects, 80-90% of the structure is kept for reuse or recycling which help the owner to save cost. Demolition contractors required a lot of knowledge and experience to minimize the impact of demolition work to the existing surrounding area. For data collecting method, postal questionnaires and interviews have been selected to collect data. Questionnaires have distributed to 80 respondents from the construction industry in Klang Valley. 67 of 80 respondents have replied the questionnaire while 4 people have interviewed. Microsoft Excel and Statistical Package for Social Science version 17.0 were used to analyze the data collected.

Keywords: demolition, waste management, construction material, Malaysia

Procedia PDF Downloads 443
23017 MLOps Scaling Machine Learning Lifecycle in an Industrial Setting

Authors: Yizhen Zhao, Adam S. Z. Belloum, Goncalo Maia Da Costa, Zhiming Zhao

Abstract:

Machine learning has evolved from an area of academic research to a real-word applied field. This change comes with challenges, gaps and differences exist between common practices in academic environments and the ones in production environments. Following continuous integration, development and delivery practices in software engineering, similar trends have happened in machine learning (ML) systems, called MLOps. In this paper we propose a framework that helps to streamline and introduce best practices that facilitate the ML lifecycle in an industrial setting. This framework can be used as a template that can be customized to implement various machine learning experiment. The proposed framework is modular and can be recomposed to be adapted to various use cases (e.g. data versioning, remote training on cloud). The framework inherits practices from DevOps and introduces other practices that are unique to the machine learning system (e.g.data versioning). Our MLOps practices automate the entire machine learning lifecycle, bridge the gap between development and operation.

Keywords: cloud computing, continuous development, data versioning, DevOps, industrial setting, MLOps

Procedia PDF Downloads 265
23016 LTE Performance Analysis in the City of Bogota Northern Zone for Two Different Mobile Broadband Operators over Qualipoc

Authors: Víctor D. Rodríguez, Edith P. Estupiñán, Juan C. Martínez

Abstract:

The evolution in mobile broadband technologies has allowed to increase the download rates in users considering the current services. The evaluation of technical parameters at the link level is of vital importance to validate the quality and veracity of the connection, thus avoiding large losses of data, time and productivity. Some of these failures may occur between the eNodeB (Evolved Node B) and the user equipment (UE), so the link between the end device and the base station can be observed. LTE (Long Term Evolution) is considered one of the IP-oriented mobile broadband technologies that work stably for data and VoIP (Voice Over IP) for those devices that have that feature. This research presents a technical analysis of the connection and channeling processes between UE and eNodeB with the TAC (Tracking Area Code) variables, and analysis of performance variables (Throughput, Signal to Interference and Noise Ratio (SINR)). Three measurement scenarios were proposed in the city of Bogotá using QualiPoc, where two operators were evaluated (Operator 1 and Operator 2). Once the data were obtained, an analysis of the variables was performed determining that the data obtained in transmission modes vary depending on the parameters BLER (Block Error Rate), performance and SNR (Signal-to-Noise Ratio). In the case of both operators, differences in transmission modes are detected and this is reflected in the quality of the signal. In addition, due to the fact that both operators work in different frequencies, it can be seen that Operator 1, despite having spectrum in Band 7 (2600 MHz), together with Operator 2, is reassigning to another frequency, a lower band, which is AWS (1700 MHz), but the difference in signal quality with respect to the establishment with data by the provider Operator 2 and the difference found in the transmission modes determined by the eNodeB in Operator 1 is remarkable.

Keywords: BLER, LTE, network, qualipoc, SNR.

Procedia PDF Downloads 114
23015 Management and Marketing Implications of Tourism Gravity Models

Authors: Clive L. Morley

Abstract:

Gravity models and panel data modelling of tourism flows are receiving renewed attention, after decades of general neglect. Such models have quite different underpinnings from conventional demand models derived from micro-economic theory. They operate at a different level of data and with different theoretical bases. These differences have important consequences for the interpretation of the results and their policy and managerial implications. This review compares and contrasts the two model forms, clarifying the distinguishing features and the estimation requirements of each. In general, gravity models are not recommended for use to address specific management and marketing purposes.

Keywords: gravity models, micro-economics, demand models, marketing

Procedia PDF Downloads 438
23014 The Forensic Swing of Things: The Current Legal and Technical Challenges of IoT Forensics

Authors: Pantaleon Lutta, Mohamed Sedky, Mohamed Hassan

Abstract:

The inability of organizations to put in place management control measures for Internet of Things (IoT) complexities persists to be a risk concern. Policy makers have been left to scamper in finding measures to combat these security and privacy concerns. IoT forensics is a cumbersome process as there is no standardization of the IoT products, no or limited historical data are stored on the devices. This paper highlights why IoT forensics is a unique adventure and brought out the legal challenges encountered in the investigation process. A quadrant model is presented to study the conflicting aspects in IoT forensics. The model analyses the effectiveness of forensic investigation process versus the admissibility of the evidence integrity; taking into account the user privacy and the providers’ compliance with the laws and regulations. Our analysis concludes that a semi-automated forensic process using machine learning, could eliminate the human factor from the profiling and surveillance processes, and hence resolves the issues of data protection (privacy and confidentiality).

Keywords: cloud forensics, data protection Laws, GDPR, IoT forensics, machine Learning

Procedia PDF Downloads 150
23013 Internal and External Overpressure Calculation for Vented Gas Explosion by Using a Combined Computational Fluid Dynamics Approach

Authors: Jingde Li, Hong Hao

Abstract:

Recent oil and gas accidents have reminded us the severe consequences of gas explosion on structure damage and financial loss. In order to protect the structures and personnel, engineers and researchers have been working on numerous different explosion mitigation methods. Amongst, venting is the most economical approach to mitigate gas explosion overpressure. In this paper, venting is used as the overpressure alleviation method. A theoretical method and a numerical technique are presented to predict the internal and external pressure from vented gas explosion in a large enclosure. Under idealized conditions, a number of experiments are used to calibrate the accuracy of the theoretically calculated data. A good agreement between the theoretical results and experimental data is seen. However, for realistic scenarios, the theoretical method over-estimates internal pressures and is incapable of predicting external pressures. Therefore, a CFD simulation procedure is proposed in this study to estimate both the internal and external overpressure from a large-scale vented explosion. Satisfactory agreement between CFD simulation results and experimental data is achieved.

Keywords: vented gas explosion, internal pressure, external pressure, CFD simulation, FLACS, ANSYS Fluent

Procedia PDF Downloads 160
23012 A Deep Learning Approach for the Predictive Quality of Directional Valves in the Hydraulic Final Test

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

The increasing use of deep learning applications in production is becoming a competitive advantage. Predictive quality enables the assurance of product quality by using data-driven forecasts via machine learning models as a basis for decisions on test results. The use of real Bosch production data along the value chain of hydraulic valves is a promising approach to classifying the leakage of directional valves.

Keywords: artificial neural networks, classification, hydraulics, predictive quality, deep learning

Procedia PDF Downloads 243
23011 A Study of Various Ontology Learning Systems from Text and a Look into Future

Authors: Fatima Al-Aswadi, Chan Yong

Abstract:

With the large volume of unstructured data that increases day by day on the web, the motivation of representing the knowledge in this data in the machine processable form is increased. Ontology is one of the major cornerstones of representing the information in a more meaningful way on the semantic Web. The goal of Ontology learning from text is to elicit and represent domain knowledge in the machine readable form. This paper aims to give a follow-up review on the ontology learning systems from text and some of their defects. Furthermore, it discusses how far the ontology learning process will enhance in the future.

Keywords: concept discovery, deep learning, ontology learning, semantic relation, semantic web

Procedia PDF Downloads 521
23010 Stature Prediction from Anthropometry of Extremities among Jordanians

Authors: Amal A. Mashali, Omar Eltaweel, Elerian Ekladious

Abstract:

Stature of an individual has an important role in identification, which is often required in medico-legal practice. The estimation of stature is an important step in the identification of dismembered remains or when only a part of a skeleton is only available as in major disasters or with mutilation. There is no published data on anthropological data among Jordanian population. The present study was designed in order to find out relationship of stature to some anthropometric measures among a sample of Jordanian population and to determine the most accurate and reliable one in predicting the stature of an individual. A cross sectional study was conducted on 336 adult healthy volunteers , free of bone diseases, nutritional diseases and abnormalities in the extremities after taking their consent. Students of Faculty of Medicine, Mutah University helped in collecting the data. The anthropometric measurements (anatomically defined) were stature, humerus length, hand length and breadth, foot length and breadth, foot index and knee height on both right and left sides of the body. The measurements were typical on both sides of the bodies of the studied samples. All the anthropologic data showed significant relation with age except the knee height. There was a significant difference between male and female measurements except for the foot index where F= 0.269. There was a significant positive correlation between the different measures and the stature of the individuals. Three equations were developed for estimation of stature. The most sensitive measure for prediction of a stature was found to be the humerus length.

Keywords: foot index, foot length, hand length, humerus length, stature

Procedia PDF Downloads 306
23009 Internalizing and Externalizing Problems as Predictors of Student Wellbeing

Authors: Nai-Jiin Yang, Tyler Renshaw

Abstract:

Prior research has suggested that youth internalizing and externalizing problems significantly correlate with student subjective wellbeing (SSW) and achievement problems (SAP). Yet, only a few studies have used data from mental health screener based on the dual-factor model to explore the empirical relationships among internalizing problems, externalizing problems, academic problems, and student wellbeing. This study was conducted through a secondary analysis of previously collected data in school-wide mental health screening activities across secondary schools within a suburban school district in the western United States. The data set included 1880 student responses from a total of two schools. Findings suggest that both internalizing and externalizing problems are substantial predictors of both student wellbeing and academic problems. However, compared to internalizing problems, externalizing problems were a much stronger predictor of academic problems. Moreover, this study did not support academic problems that moderate the relationship between SSW and youth internalizing problems (YIP) and between youth externalizing problems (YEP) and SSW. Lastly, SAP is the strongest predictor of SSW than YIP and YEP.

Keywords: academic problems, externalizing problems, internalizing problems, school mental health, student wellbeing, universal mental health screening

Procedia PDF Downloads 84
23008 A Generative Adversarial Framework for Bounding Confounded Causal Effects

Authors: Yaowei Hu, Yongkai Wu, Lu Zhang, Xintao Wu

Abstract:

Causal inference from observational data is receiving wide applications in many fields. However, unidentifiable situations, where causal effects cannot be uniquely computed from observational data, pose critical barriers to applying causal inference to complicated real applications. In this paper, we develop a bounding method for estimating the average causal effect (ACE) under unidentifiable situations due to hidden confounders. We propose to parameterize the unknown exogenous random variables and structural equations of a causal model using neural networks and implicit generative models. Then, with an adversarial learning framework, we search the parameter space to explicitly traverse causal models that agree with the given observational distribution and find those that minimize or maximize the ACE to obtain its lower and upper bounds. The proposed method does not make any assumption about the data generating process and the type of the variables. Experiments using both synthetic and real-world datasets show the effectiveness of the method.

Keywords: average causal effect, hidden confounding, bound estimation, generative adversarial learning

Procedia PDF Downloads 191
23007 Measurement of Operational and Environmental Performance of the Coal-Fired Power Plants in India by Using Data Envelopment Analysis

Authors: Vijay Kumar Bajpai, Sudhir Kumar Singh

Abstract:

In this study, the performance analyses of the twenty five coal-fired power plants (CFPPs) used for electricity generation are carried out through various data envelopment analysis (DEA) models. Three efficiency indices are defined and pursued. During the calculation of the operational performance, energy and non-energy variables are used as input, and net electricity produced is used as desired output. CO2 emitted to the environment is used as the undesired output in the computation of the pure environmental performance while in Model-3 CO2 emissions is considered as detrimental input in the calculation of operational and environmental performance. Empirical results show that most of the plants are operating in increasing returns to scale region and Mettur plant is efficient one with regards to energy use and environment. The result also indicates that the undesirable output effect is insignificant in the research sample. The present study will provide clues to plant operators towards raising the operational and environmental performance of CFPPs.

Keywords: coal fired power plants, environmental performance, data envelopment analysis, operational performance

Procedia PDF Downloads 455
23006 Estimation of Maize Yield by Using a Process-Based Model and Remote Sensing Data in the Northeast China Plain

Authors: Jia Zhang, Fengmei Yao, Yanjing Tan

Abstract:

The accurate estimation of crop yield is of great importance for the food security. In this study, a process-based mechanism model was modified to estimate yield of C4 crop by modifying the carbon metabolic pathway in the photosynthesis sub-module of the RS-P-YEC (Remote-Sensing-Photosynthesis-Yield estimation for Crops) model. The yield was calculated by multiplying net primary productivity (NPP) and the harvest index (HI) derived from the ratio of grain to stalk yield. The modified RS-P-YEC model was used to simulate maize yield in the Northeast China Plain during the period 2002-2011. The statistical data of maize yield from study area was used to validate the simulated results at county-level. The results showed that the Pearson correlation coefficient (R) was 0.827 (P < 0.01) between the simulated yield and the statistical data, and the root mean square error (RMSE) was 712 kg/ha with a relative error (RE) of 9.3%. From 2002-2011, the yield of maize planting zone in the Northeast China Plain was increasing with smaller coefficient of variation (CV). The spatial pattern of simulated maize yield was consistent with the actual distribution in the Northeast China Plain, with an increasing trend from the northeast to the southwest. Hence the results demonstrated that the modified process-based model coupled with remote sensing data was suitable for yield prediction of maize in the Northeast China Plain at the spatial scale.

Keywords: process-based model, C4 crop, maize yield, remote sensing, Northeast China Plain

Procedia PDF Downloads 375
23005 Application of Artificial Intelligence to Schedule Operability of Waterfront Facilities in Macro Tide Dominated Wide Estuarine Harbour

Authors: A. Basu, A. A. Purohit, M. M. Vaidya, M. D. Kudale

Abstract:

Mumbai, being traditionally the epicenter of India's trade and commerce, the existing major ports such as Mumbai and Jawaharlal Nehru Ports (JN) situated in Thane estuary are also developing its waterfront facilities. Various developments over the passage of decades in this region have changed the tidal flux entering/leaving the estuary. The intake at Pir-Pau is facing the problem of shortage of water in view of advancement of shoreline, while jetty near Ulwe faces the problem of ship scheduling due to existence of shallower depths between JN Port and Ulwe Bunder. In order to solve these problems, it is inevitable to have information about tide levels over a long duration by field measurements. However, field measurement is a tedious and costly affair; application of artificial intelligence was used to predict water levels by training the network for the measured tide data for one lunar tidal cycle. The application of two layered feed forward Artificial Neural Network (ANN) with back-propagation training algorithms such as Gradient Descent (GD) and Levenberg-Marquardt (LM) was used to predict the yearly tide levels at waterfront structures namely at Ulwe Bunder and Pir-Pau. The tide data collected at Apollo Bunder, Ulwe, and Vashi for a period of lunar tidal cycle (2013) was used to train, validate and test the neural networks. These trained networks having high co-relation coefficients (R= 0.998) were used to predict the tide at Ulwe, and Vashi for its verification with the measured tide for the year 2000 & 2013. The results indicate that the predicted tide levels by ANN give reasonably accurate estimation of tide. Hence, the trained network is used to predict the yearly tide data (2015) for Ulwe. Subsequently, the yearly tide data (2015) at Pir-Pau was predicted by using the neural network which was trained with the help of measured tide data (2000) of Apollo and Pir-Pau. The analysis of measured data and study reveals that: The measured tidal data at Pir-Pau, Vashi and Ulwe indicate that there is maximum amplification of tide by about 10-20 cm with a phase lag of 10-20 minutes with reference to the tide at Apollo Bunder (Mumbai). LM training algorithm is faster than GD and with increase in number of neurons in hidden layer and the performance of the network increases. The predicted tide levels by ANN at Pir-Pau and Ulwe provides valuable information about the occurrence of high and low water levels to plan the operation of pumping at Pir-Pau and improve ship schedule at Ulwe.

Keywords: artificial neural network, back-propagation, tide data, training algorithm

Procedia PDF Downloads 483
23004 Algorithm Development of Individual Lumped Parameter Modelling for Blood Circulatory System: An Optimization Study

Authors: Bao Li, Aike Qiao, Gaoyang Li, Youjun Liu

Abstract:

Background: Lumped parameter model (LPM) is a common numerical model for hemodynamic calculation. LPM uses circuit elements to simulate the human blood circulatory system. Physiological indicators and characteristics can be acquired through the model. However, due to the different physiological indicators of each individual, parameters in LPM should be personalized in order for convincing calculated results, which can reflect the individual physiological information. This study aimed to develop an automatic and effective optimization method to personalize the parameters in LPM of the blood circulatory system, which is of great significance to the numerical simulation of individual hemodynamics. Methods: A closed-loop LPM of the human blood circulatory system that is applicable for most persons were established based on the anatomical structures and physiological parameters. The patient-specific physiological data of 5 volunteers were non-invasively collected as personalized objectives of individual LPM. In this study, the blood pressure and flow rate of heart, brain, and limbs were the main concerns. The collected systolic blood pressure, diastolic blood pressure, cardiac output, and heart rate were set as objective data, and the waveforms of carotid artery flow and ankle pressure were set as objective waveforms. Aiming at the collected data and waveforms, sensitivity analysis of each parameter in LPM was conducted to determine the sensitive parameters that have an obvious influence on the objectives. Simulated annealing was adopted to iteratively optimize the sensitive parameters, and the objective function during optimization was the root mean square error between the collected waveforms and data and simulated waveforms and data. Each parameter in LPM was optimized 500 times. Results: In this study, the sensitive parameters in LPM were optimized according to the collected data of 5 individuals. Results show a slight error between collected and simulated data. The average relative root mean square error of all optimization objectives of 5 samples were 2.21%, 3.59%, 4.75%, 4.24%, and 3.56%, respectively. Conclusions: Slight error demonstrated good effects of optimization. The individual modeling algorithm developed in this study can effectively achieve the individualization of LPM for the blood circulatory system. LPM with individual parameters can output the individual physiological indicators after optimization, which are applicable for the numerical simulation of patient-specific hemodynamics.

Keywords: blood circulatory system, individual physiological indicators, lumped parameter model, optimization algorithm

Procedia PDF Downloads 137
23003 Estimating Water Balance at Beterou Watershed, Benin Using Soil and Water Assessment Tool (SWAT) Model

Authors: Ella Sèdé Maforikan

Abstract:

Sustained water management requires quantitative information and the knowledge of spatiotemporal dynamics of hydrological system within the basin. This can be achieved through the research. Several studies have investigated both surface water and groundwater in Beterou catchment. However, there are few published papers on the application of the SWAT modeling in Beterou catchment. The objective of this study was to evaluate the performance of SWAT to simulate the water balance within the watershed. The inputs data consist of digital elevation model, land use maps, soil map, climatic data and discharge records. The model was calibrated and validated using the Sequential Uncertainty Fitting (SUFI2) approach. The calibrated started from 1989 to 2006 with four years warming up period (1985-1988); and validation was from 2007 to 2020. The goodness of the model was assessed using five indices, i.e., Nash–Sutcliffe efficiency (NSE), the ratio of the root means square error to the standard deviation of measured data (RSR), percent bias (PBIAS), the coefficient of determination (R²), and Kling Gupta efficiency (KGE). Results showed that SWAT model successfully simulated river flow in Beterou catchment with NSE = 0.79, R2 = 0.80 and KGE= 0.83 for the calibration process against validation process that provides NSE = 0.78, R2 = 0.78 and KGE= 0.85 using site-based streamflow data. The relative error (PBIAS) ranges from -12.2% to 3.1%. The parameters runoff curve number (CN2), Moist Bulk Density (SOL_BD), Base Flow Alpha Factor (ALPHA_BF), and the available water capacity of the soil layer (SOL_AWC) were the most sensitive parameter. The study provides further research with uncertainty analysis and recommendations for model improvement and provision of an efficient means to improve rainfall and discharges measurement data.

Keywords: watershed, water balance, SWAT modeling, Beterou

Procedia PDF Downloads 55