Search results for: high-dimensional data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 42069

Search results for: high-dimensional data analysis

41049 Constructing a Bayesian Network for Solar Energy in Egypt Using Life Cycle Analysis and Machine Learning Algorithms

Authors: Rawaa H. El-Bidweihy, Hisham M. Abdelsalam, Ihab A. El-Khodary

Abstract:

In an era where machines run and shape our world, the need for a stable, non-ending source of energy emerges. In this study, the focus was on the solar energy in Egypt as a renewable source, the most important factors that could affect the solar energy’s market share throughout its life cycle production were analyzed and filtered, the relationships between them were derived before structuring a Bayesian network. Also, forecasted models were built for multiple factors to predict the states in Egypt by 2035, based on historical data and patterns, to be used as the nodes’ states in the network. 37 factors were found to might have an impact on the use of solar energy and then were deducted to 12 factors that were chosen to be the most effective to the solar energy’s life cycle in Egypt, based on surveying experts and data analysis, some of the factors were found to be recurring in multiple stages. The presented Bayesian network could be used later for scenario and decision analysis of using solar energy in Egypt, as a stable renewable source for generating any type of energy needed.

Keywords: ARIMA, auto correlation, Bayesian network, forecasting models, life cycle, partial correlation, renewable energy, SARIMA, solar energy

Procedia PDF Downloads 155
41048 Privacy Preservation Concerns and Information Disclosure on Social Networks: An Ongoing Research

Authors: Aria Teimourzadeh, Marc Favier, Samaneh Kakavand

Abstract:

The emergence of social networks has revolutionized the exchange of information. Every behavior on these platforms contributes to the generation of data known as social network data that are processed, stored and published by the social network service providers. Hence, it is vital to investigate the role of these platforms in user data by considering the privacy measures, especially when we observe the increased number of individuals and organizations engaging with the current virtual platforms without being aware that the data related to their positioning, connections and behavior is uncovered and used by third parties. Performing analytics on social network datasets may result in the disclosure of confidential information about the individuals or organizations which are the members of these virtual environments. Analyzing separate datasets can reveal private information about relationships, interests and more, especially when the datasets are analyzed jointly. Intentional breaches of privacy is the result of such analysis. Addressing these privacy concerns requires an understanding of the nature of data being accumulated and relevant data privacy regulations, as well as motivations for disclosure of personal information on social network platforms. Some significant points about how user's online information is controlled by the influence of social factors and to what extent the users are concerned about future use of their personal information by the organizations, are highlighted in this paper. Firstly, this research presents a short literature review about the structure of a network and concept of privacy in Online Social Networks. Secondly, the factors of user behavior related to privacy protection and self-disclosure on these virtual communities are presented. In other words, we seek to demonstrates the impact of identified variables on user information disclosure that could be taken into account to explain the privacy preservation of individuals on social networking platforms. Thirdly, a few research directions are discussed to address this topic for new researchers.

Keywords: information disclosure, privacy measures, privacy preservation, social network analysis, user experience

Procedia PDF Downloads 281
41047 Analytical Slope Stability Analysis Based on the Statistical Characterization of Soil Shear Strength

Authors: Bernardo C. P. Albuquerque, Darym J. F. Campos

Abstract:

Increasing our ability to solve complex engineering problems is directly related to the processing capacity of computers. By means of such equipments, one is able to fast and accurately run numerical algorithms. Besides the increasing interest in numerical simulations, probabilistic approaches are also of great importance. This way, statistical tools have shown their relevance to the modelling of practical engineering problems. In general, statistical approaches to such problems consider that the random variables involved follow a normal distribution. This assumption tends to provide incorrect results when skew data is present since normal distributions are symmetric about their means. Thus, in order to visualize and quantify this aspect, 9 statistical distributions (symmetric and skew) have been considered to model a hypothetical slope stability problem. The data modeled is the friction angle of a superficial soil in Brasilia, Brazil. Despite the apparent universality, the normal distribution did not qualify as the best fit. In the present effort, data obtained in consolidated-drained triaxial tests and saturated direct shear tests have been modeled and used to analytically derive the probability density function (PDF) of the safety factor of a hypothetical slope based on Mohr-Coulomb rupture criterion. Therefore, based on this analysis, it is possible to explicitly derive the failure probability considering the friction angle as a random variable. Furthermore, it is possible to compare the stability analysis when the friction angle is modelled as a Dagum distribution (distribution that presented the best fit to the histogram) and as a Normal distribution. This comparison leads to relevant differences when analyzed in light of the risk management.

Keywords: statistical slope stability analysis, skew distributions, probability of failure, functions of random variables

Procedia PDF Downloads 338
41046 Application of Single Subject Experimental Designs in Adapted Physical Activity Research: A Descriptive Analysis

Authors: Jiabei Zhang, Ying Qi

Abstract:

The purpose of this study was to develop a descriptive profile of the adapted physical activity research using single subject experimental designs. All research articles using single subject experimental designs published in the journal of Adapted Physical Activity Quarterly from 1984 to 2013 were employed as the data source. Each of the articles was coded in a subcategory of seven categories: (a) the size of sample; (b) the age of participants; (c) the type of disabilities; (d) the type of data analysis; (e) the type of designs, (f) the independent variable, and (g) the dependent variable. Frequencies, percentages, and trend inspection were used to analyze the data and develop a profile. The profile developed characterizes a small portion of research articles used single subject designs, in which most researchers used a small sample size, recruited children as subjects, emphasized learning and behavior impairments, selected visual inspection with descriptive statistics, preferred a multiple baseline design, focused on effects of therapy, inclusion, and strategy, and measured desired behaviors more often, with a decreasing trend over years.

Keywords: adapted physical activity research, single subject experimental designs, physical education, sport science

Procedia PDF Downloads 466
41045 Quantitative Analysis of Contract Variations Impact on Infrastructure Project Performance

Authors: Soheila Sadeghi

Abstract:

Infrastructure projects often encounter contract variations that can significantly deviate from the original tender estimates, leading to cost overruns, schedule delays, and financial implications. This research aims to quantitatively assess the impact of changes in contract variations on project performance by conducting an in-depth analysis of a comprehensive dataset from the Regional Airport Car Park project. The dataset includes tender budget, contract quantities, rates, claims, and revenue data, providing a unique opportunity to investigate the effects of variations on project outcomes. The study focuses on 21 specific variations identified in the dataset, which represent changes or additions to the project scope. The research methodology involves establishing a baseline for the project's planned cost and scope by examining the tender budget and contract quantities. Each variation is then analyzed in detail, comparing the actual quantities and rates against the tender estimates to determine their impact on project cost and schedule. The claims data is utilized to track the progress of work and identify deviations from the planned schedule. The study employs statistical analysis using R to examine the dataset, including tender budget, contract quantities, rates, claims, and revenue data. Time series analysis is applied to the claims data to track progress and detect variations from the planned schedule. Regression analysis is utilized to investigate the relationship between variations and project performance indicators, such as cost overruns and schedule delays. The research findings highlight the significance of effective variation management in construction projects. The analysis reveals that variations can have a substantial impact on project cost, schedule, and financial outcomes. The study identifies specific variations that had the most significant influence on the Regional Airport Car Park project's performance, such as PV03 (additional fill, road base gravel, spray seal, and asphalt), PV06 (extension to the commercial car park), and PV07 (additional box out and general fill). These variations contributed to increased costs, schedule delays, and changes in the project's revenue profile. The study also examines the effectiveness of project management practices in managing variations and mitigating their impact. The research suggests that proactive risk management, thorough scope definition, and effective communication among project stakeholders can help minimize the negative consequences of variations. The findings emphasize the importance of establishing clear procedures for identifying, assessing, and managing variations throughout the project lifecycle. The outcomes of this research contribute to the body of knowledge in construction project management by demonstrating the value of analyzing tender, contract, claims, and revenue data in variation impact assessment. However, the research acknowledges the limitations imposed by the dataset, particularly the absence of detailed contract and tender documents. This constraint restricts the depth of analysis possible in investigating the root causes and full extent of variations' impact on the project. Future research could build upon this study by incorporating more comprehensive data sources to further explore the dynamics of variations in construction projects.

Keywords: contract variation impact, quantitative analysis, project performance, claims analysis

Procedia PDF Downloads 40
41044 Supply Chains Resilience within Machine-Made Rug Producers in Iran

Authors: Malihe Shahidan, Azin Madhi, Meisam Shahbaz

Abstract:

In recent decades, the role of supply chains in sustaining businesses and establishing their superiority in the market has been under focus. The realization of the goals and strategies of a business enterprise is largely dependent on the cooperation of the chain, including suppliers, distributors, retailers, etc. Supply chains can potentially be disrupted by both internal and external factors. In this paper, resilience strategies have been identified and analyzed in three levels: sourcing, producing, and distributing by considering economic depression as a current risk factor for the machine-made rugs industry. In this study, semi-structured interviews for data gathering and thematic analysis for data analysis are applied. Supply chain data has been gathered from seven rug factories before and after the economic depression through semi-structured interviews. The identified strategies were derived from literature review and validated by collecting data from a group of eighteen industry and university experts, and the results were analyzed using statistical tests. Finally, the outsourcing of new products and products in the new market, the development and completion of the product portfolio, the flexibility in the composition and volume of products, the expansion of the market to price-sensitive, direct sales, and disintermediation have been determined as strategies affecting supply chain resilience of machine-made rugs' industry during an economic depression.

Keywords: distribution, economic depression, machine-made rug, outsourcing, production, sourcing, supply chain, supply chain resilience

Procedia PDF Downloads 162
41043 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 140
41042 Frequency Recognition Models for Steady State Visual Evoked Potential Based Brain Computer Interfaces (BCIs)

Authors: Zeki Oralhan, Mahmut Tokmakçı

Abstract:

SSVEP based brain computer interface (BCI) systems have been preferred, because of high information transfer rate (ITR) and practical use. ITR is the parameter of BCI overall performance. For high ITR value, one of specification BCI system is that has high accuracy. In this study, we investigated to recognize SSVEP with shorter time and lower error rate. In the experiment, there were 8 flickers on light crystal display (LCD). Participants gazed to flicker which had 12 Hz frequency and 50% duty cycle ratio on the LCD during 10 seconds. During the experiment, EEG signals were acquired via EEG device. The EEG data was filtered in preprocessing session. After that Canonical Correlation Analysis (CCA), Multiset CCA (MsetCCA), phase constrained CCA (PCCA), and Multiway CCA (MwayCCA) methods were applied on data. The highest average accuracy value was reached when MsetCCA was applied.

Keywords: brain computer interface, canonical correlation analysis, human computer interaction, SSVEP

Procedia PDF Downloads 266
41041 Migration, Violent Extremism and Gang Violence in Trinidad and Tobago

Authors: Raghunath Mahabir

Abstract:

This paper provides an analysis of the existing evidence on the relationships between the migration of Venezuelans into Trinidad and Tobago, violent extremism and gang violence. Arguing that there is a dearth of reliable data on the subject matter, the paper fills the gap by providing relevant definitions of terms used, discusses the sources of data and identifies the causes for this migration and the subsequent ramifications for Trinidad and Tobago and for the migrants themselves. A simple but clear classification pointing to the nexus between migration gang violence and violent extremism is developed, following the logic of migration of criminals(gang members), the need to link with local gangs and the view that certain elements within the TnT society has become radicalized to the point where violent extremism is being displayed in different ways. The paper highlights implications for further policy debate:the imperatives for more effective communication between government officials responsible for migration and those personnel who are tasked with countering violent extremism and gang violence: promoting and executing better integration and social inclusion policies which are necessary to minimize social exclusion, and the threat of violent extremist agendas emanating from both Venezuelans and Trinidadians and generally to establish strong analytical framework grounded in stronger definitions, more reliable data and other evidence which can guide further research and analysis and contribute to policy formation.

Keywords: migration, violent extremism, gangs, Venezuela

Procedia PDF Downloads 54
41040 Regional Flood-Duration-Frequency Models for Norway

Authors: Danielle M. Barna, Kolbjørn Engeland, Thordis Thorarinsdottir, Chong-Yu Xu

Abstract:

Design flood values give estimates of flood magnitude within a given return period and are essential to making adaptive decisions around land use planning, infrastructure design, and disaster mitigation. Often design flood values are needed at locations with insufficient data. Additionally, in hydrologic applications where flood retention is important (e.g., floodplain management and reservoir design), design flood values are required at different flood durations. A statistical approach to this problem is a development of a regression model for extremes where some of the parameters are dependent on flood duration in addition to being covariate-dependent. In hydrology, this is called a regional flood-duration-frequency (regional-QDF) model. Typically, the underlying statistical distribution is chosen to be the Generalized Extreme Value (GEV) distribution. However, as the support of the GEV distribution depends on both its parameters and the range of the data, special care must be taken with the development of the regional model. In particular, we find that the GEV is problematic when developing a GAMLSS-type analysis due to the difficulty of proposing a link function that is independent of the unknown parameters and the observed data. We discuss these challenges in the context of developing a regional QDF model for Norway.

Keywords: design flood values, bayesian statistics, regression modeling of extremes, extreme value analysis, GEV

Procedia PDF Downloads 72
41039 Parametric Study for Optimal Design of Hybrid Bridge Joint

Authors: Bongsik Park, Jae Hyun Park, Jae-Yeol Cho

Abstract:

Mixed structure, which is a kind of hybrid system, is incorporating steel beam and prestressed concrete beam. Hybrid bridge adopting mixed structure have some merits. Main span length can be made longer by using steel as main span material. In case of cable-stayed bridge having asymmetric span length, negative reaction at side span can be restrained without extra restraining devices by using weight difference between main span material and side span material. However angle of refraction might happen because of rigidity difference between materials and stress concentration also might happen because of abnormal loading transmission at joint in the hybrid bridge. Therefore the joint might be a weak point of the structural system and it needs to pay attention to design of the joint. However, design codes and standards about the joint in the hybrid-bridge have not been established so the joint designs in most of construction cases have been very conservative or followed previous design without extra verification. In this study parametric study using finite element analysis for optimal design of hybrid bridge joint is conducted. Before parametric study, finite element analysis was conducted based on previous experimental data and it is verified that analysis result approximated experimental data. Based on the finite element analysis results, parametric study was conducted. The parameters were selected as those have influences on joint behavior. Based on the parametric study results, optimal design of hybrid bridge joint has been determined.

Keywords: parametric study, optimal design, hybrid bridge, finite element analysis

Procedia PDF Downloads 425
41038 Identifying Strategies for Improving Railway Services in Bangladesh

Authors: Armana Sabiha Huq, Tahmina Rahman Chowdhury

Abstract:

In this paper, based on the stated preference experiment, the service quality of Bangladesh Railway has been assessed, and particular importance has been given to investigate if there exists a relationship between service quality and safety. For investigation purposes, environmental and organizational factors were assumed to determine the safety performance of the railway. Data collected from the survey has been analyzed by importance-performance analysis (IPA). In this paper, a modification of the well-known importance-performance analysis (IPA) has been done by adopting the importance of the weights determined through a structural equation modeling (SEM) approach and by plotting the gap between importance and performance on a visual graph. It has been found that there exists a relationship between safety and serviceability to some extent. Limited resources are an important factor to improve the safety and serviceability condition of the BD railway. Moreover, it is observed that the limited resources available to monitor and improve the safety performance of railway.

Keywords: importance-performance analysis, GAP-IPA, SEM, serviceability, safety, factor analysis

Procedia PDF Downloads 140
41037 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: hybrid storage system, data mining, recurrent neural network, support vector machine

Procedia PDF Downloads 308
41036 Improvement of Analysis Vertical Oil Exploration Wells (Case Study)

Authors: Azza Hashim Abbas, Wan Rosli Wan Suliman

Abstract:

The old school of study, well testing reservoir engineers used the transient pressure analyses to get certain parameters and variable factors on the reservoir's physical properties, such as, (permeability-thickness). Recently, the difficulties facing the newly discovered areas are the convincing fact that the exploration and production (E&p) team should have sufficiently accurate and appropriate data to work with due to different sources of errors. The well-test analyst does the work without going through well-informed and reliable data from colleagues which may consequently cause immense environmental damage and unnecessary financial losses as well as opportunity losses to the project. In 2003, new potential oil field (Moga) face circulation problem well-22 was safely completed. However the high mud density had caused an extensive damage to the nearer well area which also distracted the hypothetical oil rate of flow that was not representive of the real reservoir characteristics This paper presents methods to analyze and interpret the production rate and pressure data of an oil field. Specifically for Well- 22 using the Deconvolution technique to enhance the transient pressure .Applying deconvolution to get the best range of certainty of results needed for the next subsequent operation. The range determined and analysis of skin factor range was reasonable.

Keywords: well testing, exploration, deconvolution, skin factor, un certainity

Procedia PDF Downloads 445
41035 Discussion on Big Data and One of Its Early Training Application

Authors: Fulya Gokalp Yavuz, Mark Daniel Ward

Abstract:

This study focuses on a contemporary and inevitable topic of Data Science and its exemplary application for early career building: Big Data and Leaving Learning Community (LLC). ‘Academia’ and ‘Industry’ have a common sense on the importance of Big Data. However, both of them are in a threat of missing the training on this interdisciplinary area. Some traditional teaching doctrines are far away being effective on Data Science. Practitioners needs some intuition and real-life examples how to apply new methods to data in size of terabytes. We simply explain the scope of Data Science training and exemplified its early stage application with LLC, which is a National Science Foundation (NSF) founded project under the supervision of Prof. Ward since 2014. Essentially, we aim to give some intuition for professors, researchers and practitioners to combine data science tools for comprehensive real-life examples with the guides of mentees’ feedback. As a result of discussing mentoring methods and computational challenges of Big Data, we intend to underline its potential with some more realization.

Keywords: Big Data, computation, mentoring, training

Procedia PDF Downloads 362
41034 An Analysis of Present Supplier Selection Criteria of State Pharmaceutical Corporation (SPC) Sri Lanka: A Case Study

Authors: Gamalath M. B. P. Abeysekara

Abstract:

Primary objective of any organization is to enhance the bottom line profit. Strategic procurement is one of the prominent aspects in view of receiving this ultimate objective. Strategic procurement is an activity used in each and every organization in their operations. Pharmaceutical procurement is an especially significant task for any organizations, particularly state sector concerned. The whole pharmaceutical procurement requirement of the country is procured through the State Pharmaceutical Corporation (SPC) of Sri Lanka. They follow Pharmaceutical Procurement Guideline of 2006 as the procurement principle. The main objective of this project is to identify the importance of State Pharmaceutical Corporation supplier selection criteria and critical analysis of pharmaceutical procurement procedure. State Pharmaceutical Corporations applied net price, product quality, past performance, and delivery of suppliers’ as main criteria for the selection suppliers. Data collection for this study was taken place through a questionnaire, given to fifty doctors within the Colombo district attached to five main state hospitals. Data analysis is carried out with mean and standard deviation functions. The ultimate outcomes indicated product quality, net price, and delivery of suppliers’ are the most important criteria behind the selection of suppliers. Critical analysis proved State Pharmaceutical Corporation should focus on net price reduction, improving laboratory testing facilities and effective communication between up and down stream of supply chain.

Keywords: government procurement procedure, pharmaceutical procurement supplier selection criteria, importance of SPC supplier selection criteria

Procedia PDF Downloads 451
41033 Linkage between a Plant-based Diet and Visual Impairment: A Systematic Review and Meta-Analysis

Authors: Cristina Cirone, Katrina Cirone, Monali S. Malvankar-Mehta

Abstract:

Purpose: An increased risk of visual impairment has been observed in individuals lacking a balanced diet. The purpose of this paper is to characterize the relationship between plant-based diets and specific ocular outcomes among adults. Design: Systematic review and meta-analysis. Methods: This systematic review and meta-analysis were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement guidelines. The databases MEDLINE, EMBASE, Cochrane, and PubMed, were systematically searched up until May 27, 2021. Of the 503 articles independently screened by two reviewers, 21 were included in this review. Quality assessment and data extraction were performed by both reviewers. Meta-analysis was conducted using STATA 15.0. Fixed-effect and random-effect models were computed based on heterogeneity. Results: A total of 503 studies were identified which then underwent duplicate removal and a title and abstract screen. The remaining 61 studies underwent a full-text screen, 21 progressed to data extraction and fifteen were included in the quantitative analysis. Meta-analysis indicated that regular consumption of fish (OR = 0.70; CI: [0.62-0.79]) and skim milk, poultry, and non-meat animal products (OR = 0.70; CI: [0.61-0.79]) is positively correlated with a reduced risk of visual impairment (age-related macular degeneration, age-related maculopathy, cataract development, and central geographic atrophy) among adults. Consumption of red meat [OR = 1.41; CI: [1.07-1.86]) is associated with an increased risk of visual impairment. Conclusion: Overall, a pescatarian diet is associated with the most favorable visual outcomes among adults, while the consumption of red meat appears to negatively impact vision. Results suggest a need for more local and government-led interventions promoting a healthy and balanced diet.

Keywords: plant-based diet, pescatarian diet, visual impairment, systematic review, meta-analysis

Procedia PDF Downloads 185
41032 Banks Profitability Indicators in CEE Countries

Authors: I. Erins, J. Erina

Abstract:

The aim of the present article is to determine the impact of the external and internal factors of bank performance on the profitability indicators of the CEE countries banks in the period from 2006 to 2012. On the basis of research conducted abroad on bank and macroeconomic profitability indicators, in order to obtain research results, the authors evaluated return on average assets (ROAA) and return on average equity (ROAE) indicators of the CEE countries banks. The authors analyzed profitability indicators of banks using descriptive methods, SPSS data analysis methods as well as data correlation and linear regression analysis. The authors concluded that most internal and external indicators of bank performance have no direct effect on the profitability of the banks in the CEE countries. The only exceptions are credit risk and bank size which affect one of the measures of bank profitability–return on average equity.

Keywords: banks, CEE countries, profitability ROAA, ROAE

Procedia PDF Downloads 367
41031 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 125
41030 A Two-Stage Bayesian Variable Selection Method with the Extension of Lasso for Geo-Referenced Data

Authors: Georgiana Onicescu, Yuqian Shen

Abstract:

Due to the complex nature of geo-referenced data, multicollinearity of the risk factors in public health spatial studies is a commonly encountered issue, which leads to low parameter estimation accuracy because it inflates the variance in the regression analysis. To address this issue, we proposed a two-stage variable selection method by extending the least absolute shrinkage and selection operator (Lasso) to the Bayesian spatial setting, investigating the impact of risk factors to health outcomes. Specifically, in stage I, we performed the variable selection using Bayesian Lasso and several other variable selection approaches. Then, in stage II, we performed the model selection with only the selected variables from stage I and compared again the methods. To evaluate the performance of the two-stage variable selection methods, we conducted a simulation study with different distributions for the risk factors, using geo-referenced count data as the outcome and Michigan as the research region. We considered the cases when all candidate risk factors are independently normally distributed, or follow a multivariate normal distribution with different correlation levels. Two other Bayesian variable selection methods, Binary indicator, and the combination of Binary indicator and Lasso were considered and compared as alternative methods. The simulation results indicated that the proposed two-stage Bayesian Lasso variable selection method has the best performance for both independent and dependent cases considered. When compared with the one-stage approach, and the other two alternative methods, the two-stage Bayesian Lasso approach provides the highest estimation accuracy in all scenarios considered.

Keywords: Lasso, Bayesian analysis, spatial analysis, variable selection

Procedia PDF Downloads 143
41029 Applications of Greenhouse Data in Guatemala in the Analysis of Sustainability Indicators

Authors: Maria A. Castillo H., Andres R. Leandro, Jose F. Bienvenido B.

Abstract:

In 2015, Guatemala officially adopted the Sustainable Development Goals (SDG) according to the 2030 Agenda agreed by the United Nations Organization. In 2016, these objectives and goals were reviewed, and the National Priorities were established within the K'atún 2032 National Development Plan. In 2019 and 2021, progress was evaluated with 120 defined indicators, and the need to improve quality and availability of statistical data necessary for the analysis of sustainability indicators was detected, so the values to be reached in 2024 and 2032 were adjusted. The need for greater agricultural technology is one of the priorities established within SDG 2 "Zero Hunger". Within this area, protected agricultural production provides greater productivity throughout the year, reduces the use of chemical products to control pests and diseases, reduces the negative impact of climate and improves product quality. During the crisis caused by Covid-19, there was an increase in exports of fruits and vegetables produced in greenhouses from Guatemala. However, this information has not been considered in the 2021 revision of the Plan. The objective of this study is to evaluate the information available on Greenhouse Agricultural Production and its integration into the Sustainability Indicators for Guatemala. This study was carried out in four phases: 1. Analysis of the Goals established for SDG 2 and the indicators included in the K'atún Plan. 2. Analysis of Environmental, Social and Economic Indicator Models. 3. Definition of territorial levels in 2 geographic scales: Departments and Municipalities. 4. Diagnosis of the available data on technological agricultural production with emphasis on Greenhouses at the 2 geographical scales. A summary of the results is presented for each phase and finally some recommendations for future research are added. The main contribution of this work is to improve the available data that allow the incorporation of some agricultural technology indicators in the established goals, to evaluate their impact on Food Security and Nutrition, Employment and Investment, Poverty, the use of Water and Natural Resources, and to provide a methodology applicable to other production models and other geographical areas.

Keywords: greenhouses, protected agriculture, sustainable indicators, Guatemala, sustainability, SDG

Procedia PDF Downloads 85
41028 Analysis of Factors Affecting the Number of Infant and Maternal Mortality in East Java with Geographically Weighted Bivariate Generalized Poisson Regression Method

Authors: Luh Eka Suryani, Purhadi

Abstract:

Poisson regression is a non-linear regression model with response variable in the form of count data that follows Poisson distribution. Modeling for a pair of count data that show high correlation can be analyzed by Poisson Bivariate Regression. Data, the number of infant mortality and maternal mortality, are count data that can be analyzed by Poisson Bivariate Regression. The Poisson regression assumption is an equidispersion where the mean and variance values are equal. However, the actual count data has a variance value which can be greater or less than the mean value (overdispersion and underdispersion). Violations of this assumption can be overcome by applying Generalized Poisson Regression. Characteristics of each regency can affect the number of cases occurred. This issue can be overcome by spatial analysis called geographically weighted regression. This study analyzes the number of infant mortality and maternal mortality based on conditions in East Java in 2016 using Geographically Weighted Bivariate Generalized Poisson Regression (GWBGPR) method. Modeling is done with adaptive bisquare Kernel weighting which produces 3 regency groups based on infant mortality rate and 5 regency groups based on maternal mortality rate. Variables that significantly influence the number of infant and maternal mortality are the percentages of pregnant women visit health workers at least 4 times during pregnancy, pregnant women get Fe3 tablets, obstetric complication handled, clean household and healthy behavior, and married women with the first marriage age under 18 years.

Keywords: adaptive bisquare kernel, GWBGPR, infant mortality, maternal mortality, overdispersion

Procedia PDF Downloads 159
41027 Towards a Secure Storage in Cloud Computing

Authors: Mohamed Elkholy, Ahmed Elfatatry

Abstract:

Cloud computing has emerged as a flexible computing paradigm that reshaped the Information Technology map. However, cloud computing brought about a number of security challenges as a result of the physical distribution of computational resources and the limited control that users have over the physical storage. This situation raises many security challenges for data integrity and confidentiality as well as authentication and access control. This work proposes a security mechanism for data integrity that allows a data owner to be aware of any modification that takes place to his data. The data integrity mechanism is integrated with an extended Kerberos authentication that ensures authorized access control. The proposed mechanism protects data confidentiality even if data are stored on an untrusted storage. The proposed mechanism has been evaluated against different types of attacks and proved its efficiency to protect cloud data storage from different malicious attacks.

Keywords: access control, data integrity, data confidentiality, Kerberos authentication, cloud security

Procedia PDF Downloads 335
41026 Effect of Aryl Imidazolium Ionic Liquids as Asphaltene Dispersants

Authors: Raghda Ahmed El-Nagar

Abstract:

Oil spills are one of the most serious environmental issues that have occurred during the production and transportation of petroleum crude oil. Chemical asphaltene dispersants are hazardous to the marine environment, so Ionic liquids (ILs) as asphaltene dispersants are a critical area of study. In this work, different aryl imidazolium ionic liquids were synthesized with high yield and elucidated via tools of analysis (Elemental analysis, FT-IR, and 1H-NMR). Thermogravimetric analysis confirmed that the prepared ILs posses high thermal stability. The critical micelle concentration (CMC), surface tension, and emulsification index were investigated. Evaluation of synthesized ILs as asphaltene dispersants were assessed at various concentrations, and data reveals high dispersion efficiency.

Keywords: ionic liquids, oil spill, asphaltene dispersants, CMC, efficiency

Procedia PDF Downloads 194
41025 Consumer Values in the Perspective of Javanese Mataraman Society: Identification, Meaning, and Application

Authors: Anna Triwijayati, Etsa Astridya Setiyati, Titik Desi Harsoyo

Abstract:

Culture is the important determinant of human behavior and desire. Culture influences the consumer through the norms and values established by the society in which they live and reflect it. The cultural values of Javanese society certainly have united in the Javanese society behavior in consumption. This research is expected to give big enough theoretical benefits in the findings of cultural value in consumption in Javanese society. These can be an incentive in finding the local cultural value in many tribes in Indonesia, so one time, the local cultural value in Indonesia about consumption can be fundamental part in education and consumption practice in Indonesia. The approach used in this research is non positivist research or is known as qualitative approach. The method or type of research used in this research is ethnomethodology. The collection data is done in Central Java region. The research subject or informant is determined by the purposive technique by certain criteria determined by the researcher. The data is collected by deep interview and observation. Before the data analysis, the researcher does the storing method data stage and implements the data validity procedures. Then, the data is analyzed by the theme and interactive analysis technique. The Javanese Mataraman society has such consumption values such as has to be sufficient, be careful, economical, submit to the one who creates the life, the way life flow, and the present problem is thought in the present also. In the financial management for consumption, the consumer should have the simple life principles, has to be sufficient, has to be able to eat, has to be able to self-press, well-managed/diligent/accurate/careful, the open or transparent management, has the struggle effort, like to self-sacrifice and think about the future. The meaning of consumption value in family is centered to the submission and full-trust to God. These consumption values are applied in consumer behavior in self, family, investment and credit need in short term and long term perspective.

Keywords: values, consumer, consumption, Javanese Mataraman, ethnomethodology

Procedia PDF Downloads 392
41024 Use of the Gas Chromatography Method for Hydrocarbons' Quality Evaluation in the Offshore Fields of the Baltic Sea

Authors: Pavel Shcherban, Vlad Golovanov

Abstract:

Currently, there is an active geological exploration and development of the subsoil shelf of the Kaliningrad region. To carry out a comprehensive and accurate assessment of the volumes and degree of extraction of hydrocarbons from open deposits, it is necessary to establish not only a number of geological and lithological characteristics of the structures under study, but also to determine the oil quality, its viscosity, density, fractional composition as accurately as possible. In terms of considered works, gas chromatography is one of the most capacious methods that allow the rapid formation of a significant amount of initial data. The aspects of the application of the gas chromatography method for determining the chemical characteristics of the hydrocarbons of the Kaliningrad shelf fields are observed in the article, as well as the correlation-regression analysis of these parameters in comparison with the previously obtained chemical characteristics of hydrocarbon deposits located on the land of the region. In the process of research, a number of methods of mathematical statistics and computer processing of large data sets have been applied, which makes it possible to evaluate the identity of the deposits, to specify the amount of reserves and to make a number of assumptions about the genesis of the hydrocarbons under analysis.

Keywords: computer processing of large databases, correlation-regression analysis, hydrocarbon deposits, method of gas chromatography

Procedia PDF Downloads 157
41023 Multivariate Control Chart to Determine Efficiency Measurements in Industrial Processes

Authors: J. J. Vargas, N. Prieto, L. A. Toro

Abstract:

Control charts are commonly used to monitor processes involving either variable or attribute of quality characteristics and determining the control limits as a critical task for quality engineers to improve the processes. Nonetheless, in some applications it is necessary to include an estimation of efficiency. In this paper, the ability to define the efficiency of an industrial process was added to a control chart by means of incorporating a data envelopment analysis (DEA) approach. In depth, a Bayesian estimation was performed to calculate the posterior probability distribution of parameters as means and variance and covariance matrix. This technique allows to analyse the data set without the need of using the hypothetical large sample implied in the problem and to be treated as an approximation to the finite sample distribution. A rejection simulation method was carried out to generate random variables from the parameter functions. Each resulting vector was used by stochastic DEA model during several cycles for establishing the distribution of each efficiency measures for each DMU (decision making units). A control limit was calculated with model obtained and if a condition of a low level efficiency of DMU is presented, system efficiency is out of control. In the efficiency calculated a global optimum was reached, which ensures model reliability.

Keywords: data envelopment analysis, DEA, Multivariate control chart, rejection simulation method

Procedia PDF Downloads 373
41022 Urbanization and Income Inequality in Thailand

Authors: Acumsiri Tantikarnpanit

Abstract:

This paper aims to examine the relationship between urbanization and income inequality in Thailand during the period 2002–2020. Using a panel of data for 76 provinces collected from Thailand’s National Statistical Office (Labor Force Survey: LFS), as well as geospatial data from the U.S. Air Force Defense Meteorological Satellite Program (DMSP) and the Visible Infrared Imaging Radiometer Suite Day/Night band (VIIRS-DNB) satellite for nineteen selected years. This paper employs two different definitions to identify urban areas: 1) Urban areas defined by Thailand's National Statistical Office (Labor Force Survey: LFS), and 2) Urban areas estimated using nighttime light data from the DMSP and VIIRS-DNB satellite. The second method includes two sub-categories: 2.1) Determining urban areas by calculating nighttime light density with a population density of 300 people per square kilometer, and 2.2) Calculating urban areas based on nighttime light density corresponding to a population density of 1,500 people per square kilometer. The empirical analysis based on Ordinary Least Squares (OLS), fixed effects, and random effects models reveals a consistent U-shaped relationship between income inequality and urbanization. The findings from the econometric analysis demonstrate that urbanization or population density has a significant and negative impact on income inequality. Moreover, the square of urbanization shows a statistically significant positive impact on income inequality. Additionally, there is a negative association between logarithmically transformed income and income inequality. This paper also proposes the inclusion of satellite imagery, geospatial data, and spatial econometric techniques in future studies to conduct quantitative analysis of spatial relationships.

Keywords: income inequality, nighttime light, population density, Thailand, urbanization

Procedia PDF Downloads 76
41021 Mediation of the Middle Eastern Crises and Economic Growth: An Application of Times Series Analysis

Authors: Gokhan Erkal, Gulsen Aydin, Muge Yuce, Lokman Sahin

Abstract:

This study aims to analyze the impacts of involving in mediation of conflicts in the Middle East from the perspective of the economic growth of the mediators. The Middle East is a highly volatile region of the world with rampant crises whose affects spill beyond its borders. Therefore, management and resolution of the conflicts in the region are of great significance. Mediation is an instrument used for abating violence and settling dispute. The recourse to mediation has grown to an important degree in recent years. However, for mediators, it is a daunting task to involve in the mediation of the deadlocks in the Middle East. This study tries to shed light on the positive correlation between economic growth of the mediator and the successful outcome of the mediation process to provide motivation for mediators. To this end, first, it briefly introduces the conflicts ongoing in the region and their negative impacts. Second, the methodology, time series analysis, and the data to be used, International Crisis Behavior Project Data, are presented. Third, the empirical test is carried out and the findings are evaluated. The conclusion highlights the benefits of successful mediation for the economic growth of the mediators of Middle Eastern crises.

Keywords: international crises, mediation, Middle East, times series analysis

Procedia PDF Downloads 175
41020 Design and Implementation of Security Middleware for Data Warehouse Signature, Framework

Authors: Mayada Al Meghari

Abstract:

Recently, grid middlewares have provided large integrated use of network resources as the shared data and the CPU to become a virtual supercomputer. In this work, we present the design and implementation of the middleware for Data Warehouse Signature, DWS Framework. The aim of using the middleware in our DWS framework is to achieve the high performance by the parallel computing. This middleware is developed on Alchemi.Net framework to increase the security among the network nodes through the authentication and group-key distribution model. This model achieves the key security and prevents any intermediate attacks in the middleware. This paper presents the flow process structures of the middleware design. In addition, the paper ensures the implementation of security for DWS middleware enhancement with the authentication and group-key distribution model. Finally, from the analysis of other middleware approaches, the developed middleware of DWS framework is the optimal solution of a complete covering of security issues.

Keywords: middleware, parallel computing, data warehouse, security, group-key, high performance

Procedia PDF Downloads 119