Search results for: data combining
23616 Thick Data Analytics for Learning Cataract Severity: A Triplet Loss Siamese Neural Network Model
Authors: Jinan Fiaidhi, Sabah Mohammed
Abstract:
Diagnosing cataract severity is an important factor in deciding to undertake surgery. It is usually conducted by an ophthalmologist or through taking a variety of fundus photography that needs to be examined by the ophthalmologist. This paper carries out an investigation using a Siamese neural net that can be trained with small anchor samples to score cataract severity. The model used in this paper is based on a triplet loss function that takes the ophthalmologist best experience in rating positive and negative anchors to a specific cataract scaling system. This approach that takes the heuristics of the ophthalmologist is generally called the thick data approach, which is a kind of machine learning approach that learn from a few shots. Clinical Relevance: The lens of the eye is mostly made up of water and proteins. A cataract occurs when these proteins at the eye lens start to clump together and block lights causing impair vision. This research aims at employing thick data machine learning techniques to rate the severity of the cataract using Siamese neural network.Keywords: thick data analytics, siamese neural network, triplet-loss model, few shot learning
Procedia PDF Downloads 10923615 Case Study Analysis for Driver's Company in the Transport Sector with the Help of Data Mining
Authors: Diana Katherine Gonzalez Galindo, David Rolando Suarez Mora
Abstract:
With this study, we used data mining as a new alternative of the solution to evaluate the comments of the customers in order to find a pattern that helps us to determine some behaviors to reduce the deactivation of the partners of the LEVEL app. In one of the greatest business created in the last times, the partners are being affected due to an internal process that compensates the customer for a bad experience, but these comments could be false towards the driver, that’s why we made an investigation to collect information to restructure this process, many partners have been disassociated due to this internal process and many of them refuse the comments given by the customer. The main methodology used in this case study is the observation, we recollect information in real time what gave us the opportunity to see the most common issues to get the most accurate solution. With this new process helped by data mining, we could get a prediction based on the behaviors of the customer and some basic data recollected such as the age, the gender, and others; this could help us in future to improve another process. This investigation gives more opportunities to the partner to keep his account active even if the customer writes a message through the app. The term is trying to avoid a recession of drivers in the future offering improving in the processes, at the same time we are in search of stablishing a strategy which benefits both the app’s managers and the associated driver.Keywords: agent, driver, deactivation, rider
Procedia PDF Downloads 27823614 Image Compression Using Block Power Method for SVD Decomposition
Authors: El Asnaoui Khalid, Chawki Youness, Aksasse Brahim, Ouanan Mohammed
Abstract:
In these recent decades, the important and fast growth in the development and demand of multimedia products is contributing to an insufficient in the bandwidth of device and network storage memory. Consequently, the theory of data compression becomes more significant for reducing the data redundancy in order to save more transfer and storage of data. In this context, this paper addresses the problem of the lossless and the near-lossless compression of images. This proposed method is based on Block SVD Power Method that overcomes the disadvantages of Matlab's SVD function. The experimental results show that the proposed algorithm has a better compression performance compared with the existing compression algorithms that use the Matlab's SVD function. In addition, the proposed approach is simple and can provide different degrees of error resilience, which gives, in a short execution time, a better image compression.Keywords: image compression, SVD, block SVD power method, lossless compression, near lossless
Procedia PDF Downloads 38423613 Real-Time Pedestrian Detection Method Based on Improved YOLOv3
Authors: Jingting Luo, Yong Wang, Ying Wang
Abstract:
Pedestrian detection in image or video data is a very important and challenging task in security surveillance. The difficulty of this task is to locate and detect pedestrians of different scales in complex scenes accurately. To solve these problems, a deep neural network (RT-YOLOv3) is proposed to realize real-time pedestrian detection at different scales in security monitoring. RT-YOLOv3 improves the traditional YOLOv3 algorithm. Firstly, the deep residual network is added to extract vehicle features. Then six convolutional neural networks with different scales are designed and fused with the corresponding scale feature maps in the residual network to form the final feature pyramid to perform pedestrian detection tasks. This method can better characterize pedestrians. In order to further improve the accuracy and generalization ability of the model, a hybrid pedestrian data set training method is used to extract pedestrian data from the VOC data set and train with the INRIA pedestrian data set. Experiments show that the proposed RT-YOLOv3 method achieves 93.57% accuracy of mAP (mean average precision) and 46.52f/s (number of frames per second). In terms of accuracy, RT-YOLOv3 performs better than Fast R-CNN, Faster R-CNN, YOLO, SSD, YOLOv2, and YOLOv3. This method reduces the missed detection rate and false detection rate, improves the positioning accuracy, and meets the requirements of real-time detection of pedestrian objects.Keywords: pedestrian detection, feature detection, convolutional neural network, real-time detection, YOLOv3
Procedia PDF Downloads 14123612 Multichannel Analysis of the Surface Waves of Earth Materials in Some Parts of Lagos State, Nigeria
Authors: R. B. Adegbola, K. F. Oyedele, L. Adeoti
Abstract:
We present a method that utilizes Multi-channel Analysis of Surface Waves, which was used to measure shear wave velocities with a view to establishing the probable causes of road failure, subsidence and weakening of structures in some Local Government Area, Lagos, Nigeria. Multi channel Analysis of Surface waves (MASW) data were acquired using 24-channel seismograph. The acquired data were processed and transformed into two-dimensional (2-D) structure reflective of depth and surface wave velocity distribution within a depth of 0–15m beneath the surface using SURFSEIS software. The shear wave velocity data were compared with other geophysical/borehole data that were acquired along the same profile. The comparison and correlation illustrates the accuracy and consistency of MASW derived-shear wave velocity profiles. Rigidity modulus and N-value were also generated. The study showed that the low velocity/very low velocity are reflective of organic clay/peat materials and thus likely responsible for the failed, subsidence/weakening of structures within the study areas.Keywords: seismograph, road failure, rigidity modulus, N-value, subsidence
Procedia PDF Downloads 36023611 Eco-Products in Day-to-Day Life: A Catalyst for Achieving Sustainability
Authors: Rani Fernandez
Abstract:
As global concerns regarding environmental degradation and climate change intensify, the imperative for sustainable living has never been more critical. This research delves into the role of eco-products in everyday life as a pivotal strategy for achieving sustainability. The study investigates the awareness, adoption, and impact of eco-friendly products on individual and community levels. The research employs a mixed-methods approach, combining surveys, interviews, and case studies to explore consumer perceptions, behaviours, and motivations surrounding the use of eco-products. Additionally, life cycle assessments are conducted to evaluate the environmental footprint of selected eco-products, shedding light on their tangible contributions to sustainability. The findings reveal the diverse range of eco-products available in the market, from biodegradable packaging to energy-efficient appliances, and the extent to which consumers integrate these products into their daily routines. Moreover, the research examines the challenges and opportunities associated with widespread adoption, considering factors such as cost, accessibility, and efficacy. In addition to individual consumption patterns, the study investigates the broader societal impact of eco-product integration. It explores the potential for eco-products to drive systemic change by influencing supply chains, corporate practices, and government policies. The research highlights successful case studies of communities or businesses that have effectively incorporated eco-products, providing valuable insights into scalable models for sustainability. Ultimately, this research contributes to the discourse on sustainable living by elucidating the pivotal role of eco-products in shaping environmentally conscious behaviours. By understanding the dynamics of eco-product adoption, policymakers, businesses, and individuals can collaboratively work towards a more sustainable future. The implications of this study extend beyond academia, informing practical strategies for fostering a global shift towards sustainable consumption and production.Keywords: eco-friendly, sustainablity, environment, climate change
Procedia PDF Downloads 4023610 Use of Statistical Correlations for the Estimation of Shear Wave Velocity from Standard Penetration Test-N-Values: Case Study of Algiers Area
Authors: Soumia Merat, Lynda Djerbal, Ramdane Bahar, Mohammed Amin Benbouras
Abstract:
Along with shear wave, many soil parameters are associated with the standard penetration test (SPT) as a dynamic in situ experiment. Both SPT-N data and geophysical data do not often exist in the same area. Statistical analysis of correlation between these parameters is an alternate method to estimate Vₛ conveniently and without additional investigations or data acquisition. Shear wave velocity is a basic engineering tool required to define dynamic properties of soils. In many instances, engineers opt for empirical correlations between shear wave velocity (Vₛ) and reliable static field test data like standard penetration test (SPT) N value, CPT (Cone Penetration Test) values, etc., to estimate shear wave velocity or dynamic soil parameters. The relation between Vs and SPT- N values of Algiers area is predicted using the collected data, and it is also compared with the previously suggested formulas of Vₛ determination by measuring Root Mean Square Error (RMSE) of each model. Algiers area is situated in high seismic zone (Zone III [RPA 2003: réglement parasismique algerien]), therefore the study is important for this region. The principal aim of this paper is to compare the field measurements of Down-hole test and the empirical models to show which one of these proposed formulas are applicable to predict and deduce shear wave velocity values.Keywords: empirical models, RMSE, shear wave velocity, standard penetration test
Procedia PDF Downloads 33723609 A New Authenticable Steganographic Method via the Use of Numeric Data on Public Websites
Authors: Che-Wei Lee, Bay-Erl Lai
Abstract:
A new steganographic method via the use of numeric data on public websites with self-authentication capability is proposed. The proposed technique transforms a secret message into partial shares by Shamir’s (k, n)-threshold secret sharing scheme with n = k + 1. The generated k+1 partial shares then are embedded into the selected numeric items in a website as if they are part of the website’s numeric content. Afterward, a receiver links to the website and extracts every k shares among the k+1 ones from the stego-numeric-content to compute k+1 copies of the secret, and the phenomenon of value consistency of the computed k+1 copies is taken as an evidence to determine whether the extracted message is authentic or not, attaining the goal of self-authentication of the extracted secret message. Experimental results and discussions are provided to show the feasibility and effectiveness of the proposed method.Keywords: steganography, data hiding, secret authentication, secret sharing
Procedia PDF Downloads 24323608 A Novel Approach to Design of EDDR Architecture for High Speed Motion Estimation Testing Applications
Authors: T. Gangadhararao, K. Krishna Kishore
Abstract:
Motion Estimation (ME) plays a critical role in a video coder, testing such a module is of priority concern. While focusing on the testing of ME in a video coding system, this work presents an error detection and data recovery (EDDR) design, based on the residue-and-quotient (RQ) code, to embed into ME for video coding testing applications. An error in processing Elements (PEs), i.e. key components of a ME, can be detected and recovered effectively by using the proposed EDDR design. The proposed EDDR design for ME testing can detect errors and recover data with an acceptable area overhead and timing penalty.Keywords: area overhead, data recovery, error detection, motion estimation, reliability, residue-and-quotient (RQ) code
Procedia PDF Downloads 43023607 An Effective Route to Control of the Safety of Accessing and Storing Data in the Cloud-Based Data Base
Authors: Omid Khodabakhshi, Amir Rozdel
Abstract:
The subject of cloud computing security research has allocated a number of challenges and competitions because the data center is comprised of complex private information and are always faced various risks of information disclosure by hacker attacks or internal enemies. Accordingly, the security of virtual machines in the cloud computing infrastructure layer is very important. So far, there are many software solutions to develop security in virtual machines. But using software alone is not enough to solve security problems. The purpose of this article is to examine the challenges and security requirements for accessing and storing data in an insecure cloud environment. In other words, in this article, a structure is proposed for the implementation of highly isolated security-sensitive codes using secure computing hardware in virtual environments. It also allows remote code validation with inputs and outputs. We provide these security features even in situations where the BIOS, the operating system, and even the super-supervisor are infected. To achieve these goals, we will use the hardware support provided by the new Intel and AMD processors, as well as the TPM security chip. In conclusion, the use of these technologies ultimately creates a root of dynamic trust and reduces TCB to security-sensitive codes.Keywords: code, cloud computing, security, virtual machines
Procedia PDF Downloads 18923606 Ta(l)king Pictures: Development of an Educational Program (SELVEs) for Adolescents Combining Social-Emotional Learning and Photography Taking
Authors: Adi Gielgun-Katz, Alina S. Rusu
Abstract:
In the last two decades, education systems worldwide have integrated new pedagogical methods and strategies in lesson plans, such as innovative technologies, social-emotional learning (SEL), gamification, mixed learning, multiple literacies, and many others. Visual language, such as photographs, is known to transcend cultures and languages, and it is commonly used by youth to express positions and affective states in social networks. Therefore, visual language needs more educational attention as a linguistic and communicative component that can create connectedness among the students and their teachers. Nowadays, when SEL is gaining more and more space and meaning in the area of academic improvement in relation to social well-being, and taking and sharing pictures is part of the everyday life of the majority of people, it becomes natural to add the visual language to SEL approach as a reinforcement strategy for connecting education to the contemporary culture and language of the youth. This article presents a program conducted in a high school class in Israel, which combines the five SEL with photography techniques, i.e., Social-Emotional Learning Visual Empowerments (SELVEs) program (experimental group). Another class of students from the same institution represents the control group, which is participating in the SEL program without the photography component. The SEL component of the programs addresses skills such as: troubleshooting, uncertainty, personal strengths and collaboration, accepting others, control of impulses, communication, self-perception, and conflict resolution. The aim of the study is to examine the effects of programs on the level of the five SEL aspects in the two groups of high school students: Self-Awareness, Social Awareness, Self-Management, Responsible Decision Making, and Relationship Skills. The study presents a quantitative assessment of the SEL programs’ impact on the students. The main hypothesis is that the students’ questionnaires' analysis will reveal a better understanding and improvement of the five aspects of the SEL in the group of students involved in the photography-enhanced SEL program.Keywords: social-emotional learning, photography, education program, adolescents
Procedia PDF Downloads 8423605 Identifying the Factors affecting on the Success of Energy Usage Saving in Municipality of Tehran
Authors: Rojin Bana Derakhshan, Abbas Toloie
Abstract:
For the purpose of optimizing and developing energy efficiency in building, it is required to recognize key elements of success in optimization of energy consumption before performing any actions. Surveying Principal Components is one of the most valuable result of Linear Algebra because the simple and non-parametric methods are become confusing. So that energy management system implemented according to energy management system international standard ISO50001:2011 and all energy parameters in building to be measured through performing energy auditing. In this essay by simulating used of data mining, the key impressive elements on energy saving in buildings to be determined. This approach is based on data mining statistical techniques using feature selection method and fuzzy logic and convert data from massive to compressed type and used to increase the selected feature. On the other side, influence portion and amount of each energy consumption elements in energy dissipation in percent are recognized as separated norm while using obtained results from energy auditing and after measurement of all energy consuming parameters and identified variables. Accordingly, energy saving solution divided into 3 categories, low, medium and high expense solutions.Keywords: energy saving, key elements of success, optimization of energy consumption, data mining
Procedia PDF Downloads 46723604 Analyzing the Evolution of Adverse Events in Pharmacovigilance: A Data-Driven Approach
Authors: Kwaku Damoah
Abstract:
This study presents a comprehensive data-driven analysis to understand the evolution of adverse events (AEs) in pharmacovigilance. Utilizing data from the FDA Adverse Event Reporting System (FAERS), we employed three analytical methods: rank-based, frequency-based, and percentage change analyses. These methods assessed temporal trends and patterns in AE reporting, focusing on various drug-active ingredients and patient demographics. Our findings reveal significant trends in AE occurrences, with both increasing and decreasing patterns from 2000 to 2023. This research highlights the importance of continuous monitoring and advanced analysis in pharmacovigilance, offering valuable insights for healthcare professionals and policymakers to enhance drug safety.Keywords: event analysis, FDA adverse event reporting system, pharmacovigilance, temporal trend analysis
Procedia PDF Downloads 4823603 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures
Authors: Salima Kouici, Abdelkader Khelladi
Abstract:
In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering
Procedia PDF Downloads 47923602 High Resolution Sandstone Connectivity Modelling: Implications for Outcrop Geological and Its Analog Studies
Authors: Numair Ahmed Siddiqui, Abdul Hadi bin Abd Rahman, Chow Weng Sum, Wan Ismail Wan Yousif, Asif Zameer, Joel Ben-Awal
Abstract:
Advances in data capturing from outcrop studies have made possible the acquisition of high-resolution digital data, offering improved and economical reservoir modelling methods. Terrestrial laser scanning utilizing LiDAR (Light detection and ranging) provides a new method to build outcrop based reservoir models, which provide a crucial piece of information to understand heterogeneities in sandstone facies with high-resolution images and data set. This study presents the detailed application of outcrop based sandstone facies connectivity model by acquiring information gathered from traditional fieldwork and processing detailed digital point-cloud data from LiDAR to develop an intermediate small-scale reservoir sandstone facies model of the Miocene Sandakan Formation, Sabah, East Malaysia. The software RiScan pro (v1.8.0) was used in digital data collection and post-processing with an accuracy of 0.01 m and point acquisition rate of up to 10,000 points per second. We provide an accurate and descriptive workflow to triangulate point-clouds of different sets of sandstone facies with well-marked top and bottom boundaries in conjunction with field sedimentology. This will provide highly accurate qualitative sandstone facies connectivity model which is a challenge to obtain from subsurface datasets (i.e., seismic and well data). Finally, by applying this workflow, we can build an outcrop based static connectivity model, which can be an analogue to subsurface reservoir studies.Keywords: LiDAR, outcrop, high resolution, sandstone faceis, connectivity model
Procedia PDF Downloads 22323601 Spatial-Temporal Clustering Characteristics of Dengue in the Northern Region of Sri Lanka, 2010-2013
Authors: Sumiko Anno, Keiji Imaoka, Takeo Tadono, Tamotsu Igarashi, Subramaniam Sivaganesh, Selvam Kannathasan, Vaithehi Kumaran, Sinnathamby Noble Surendran
Abstract:
Dengue outbreaks are affected by biological, ecological, socio-economic and demographic factors that vary over time and space. These factors have been examined separately and still require systematic clarification. The present study aimed to investigate the spatial-temporal clustering relationships between these factors and dengue outbreaks in the northern region of Sri Lanka. Remote sensing (RS) data gathered from a plurality of satellites were used to develop an index comprising rainfall, humidity and temperature data. RS data gathered by ALOS/AVNIR-2 were used to detect urbanization, and a digital land cover map was used to extract land cover information. Other data on relevant factors and dengue outbreaks were collected through institutions and extant databases. The analyzed RS data and databases were integrated into geographic information systems, enabling temporal analysis, spatial statistical analysis and space-time clustering analysis. Our present results showed that increases in the number of the combination of ecological factor and socio-economic and demographic factors with above the average or the presence contribute to significantly high rates of space-time dengue clusters.Keywords: ALOS/AVNIR-2, dengue, space-time clustering analysis, Sri Lanka
Procedia PDF Downloads 47623600 Statistical Inferences for GQARCH-It\^{o} - Jumps Model Based on The Realized Range Volatility
Authors: Fu Jinyu, Lin Jinguan
Abstract:
This paper introduces a novel approach that unifies two types of models: one is the continuous-time jump-diffusion used to model high-frequency data, and the other is discrete-time GQARCH employed to model low-frequency financial data by embedding the discrete GQARCH structure with jumps in the instantaneous volatility process. This model is named “GQARCH-It\^{o} -Jumps mode.” We adopt the realized range-based threshold estimation for high-frequency financial data rather than the realized return-based volatility estimators, which entail the loss of intra-day information of the price movement. Meanwhile, a quasi-likelihood function for the low-frequency GQARCH structure with jumps is developed for the parametric estimate. The asymptotic theories are mainly established for the proposed estimators in the case of finite activity jumps. Moreover, simulation studies are implemented to check the finite sample performance of the proposed methodology. Specifically, it is demonstrated that how our proposed approaches can be practically used on some financial data.Keywords: It\^{o} process, GQARCH, leverage effects, threshold, realized range-based volatility estimator, quasi-maximum likelihood estimate
Procedia PDF Downloads 15423599 Exchanging Radiology Reporting System with Electronic Health Record: Designing a Conceptual Model
Authors: Azadeh Bashiri
Abstract:
Introduction: In order to better designing of electronic health record system in Iran, integration of health information systems based on a common language must be done to interpret and exchange this information with this system is required. Background: This study, provides a conceptual model of radiology reporting system using unified modeling language. The proposed model can solve the problem of integration this information system with electronic health record system. By using this model and design its service based, easily connect to electronic health record in Iran and facilitate transfer radiology report data. Methods: This is a cross-sectional study that was conducted in 2013. The student community was 22 experts that working at the Imaging Center in Imam Khomeini Hospital in Tehran and the sample was accorded with the community. Research tool was a questionnaire that prepared by the researcher to determine the information requirements. Content validity and test-retest method was used to measure validity and reliability of questioner respectively. Data analyzed with average index, using SPSS. Also, Visual Paradigm software was used to design a conceptual model. Result: Based on the requirements assessment of experts and related texts, administrative, demographic and clinical data and radiological examination results and if the anesthesia procedure performed, anesthesia data suggested as minimum data set for radiology report and based it class diagram designed. Also by identifying radiology reporting system process, use case was drawn. Conclusion: According to the application of radiology reports in electronic health record system for diagnosing and managing of clinical problem of the patient, provide the conceptual Model for radiology reporting system; in order to systematically design it, the problem of data sharing between these systems and electronic health records system would eliminate.Keywords: structured radiology report, information needs, minimum data set, electronic health record system in Iran
Procedia PDF Downloads 25223598 Exploring Smartphone Applications for Enhancing Second Language Vocabulary Learning
Authors: Abdulmajeed Almansour
Abstract:
Learning a foreign language with the assistant of technological tools has become an interest of learners and educators. Increased use of smartphones among undergraduate students has made them popular for not only social communication but also for entertainment and educational purposes. Smartphones have provided remarkable advantages in language learning process. Learning vocabulary is an important part of learning a language. The use of smartphone applications for English vocabulary learning provides an opportunity for learners to improve vocabulary knowledge beyond the classroom wall anytime anywhere. Recently, various smartphone applications were created specifically for vocabulary learning. This paper aims to explore the use of smartphone application Memrise designed for vocabulary learning to enhance academic vocabulary among undergraduate students. It examines whether the use of a Memrise smartphone application designed course enhances the academic vocabulary learning among ESL learners. The research paradigm used in this paper followed a mixed research model combining quantitative and qualitative research. The study included two hundred undergraduate students randomly assigned to the experimental and controlled group during the first academic year at the Faculty of English Language, Imam University. The research instruments included an attitudinal questionnaire and an English vocabulary pre-test administered to students at the beginning of the semester whereas post-test and semi-structured interviews administered at the end of the semester. The findings of the attitudinal questionnaire revealed a positive attitude towards using smartphones in learning vocabulary. The post-test scores showed a significant difference in the experimental group performance. The results from the semi-structure interviews showed that there were positive attitudes towards Memrise smartphone application. The students found the application enjoyable, convenient and efficient learning tool. From the study, the use of the Memrise application is seen to have long-term and motivational benefits to students. For this reason, there is a need for further research to identify the long-term optimal effects of learning a language using smartphone applications.Keywords: second language vocabulary learning, academic vocabulary, mobile learning technologies, smartphone applications
Procedia PDF Downloads 16023597 An Application of Remote Sensing for Modeling Local Warming Trend
Authors: Khan R. Rahaman, Quazi K. Hassan
Abstract:
Global changes in climate, environment, economies, populations, governments, institutions, and cultures converge in localities. Changes at a local scale, in turn, contribute to global changes as well as being affected by them. Our hypothesis is built on a consideration that temperature does vary at local level (i.e., termed as local warming) in comparison to the predicted models at the regional and/or global scale. To date, the bulk of the research relating local places to global climate change has been top-down, from the global toward the local, concentrating on methods of impact analysis that use as a starting point climate change scenarios derived from global models, even though these have little regional or local specificity. Thus, our focus is to understand such trends over the southern Alberta, which will enable decision makers, scientists, researcher community, and local people to adapt their policies based on local level temperature variations and to act accordingly. Specific objectives in this study are: (i) to understand the local warming (temperature in particular) trend in context of temperature normal during the period 1961-2010 at point locations using meteorological data; (ii) to validate the data by using specific yearly data, and (iii) to delineate the spatial extent of the local warming trends and understanding influential factors to adopt situation by local governments. Existing data has brought the evidence of such changes and future research emphasis will be given to validate this hypothesis based on remotely sensed data (i.e. MODIS product by NASA).Keywords: local warming, climate change, urban area, Alberta, Canada
Procedia PDF Downloads 33823596 The Systems Biology Verification Endeavor: Harness the Power of the Crowd to Address Computational and Biological Challenges
Authors: Stephanie Boue, Nicolas Sierro, Julia Hoeng, Manuel C. Peitsch
Abstract:
Systems biology relies on large numbers of data points and sophisticated methods to extract biologically meaningful signal and mechanistic understanding. For example, analyses of transcriptomics and proteomics data enable to gain insights into the molecular differences in tissues exposed to diverse stimuli or test items. Whereas the interpretation of endpoints specifically measuring a mechanism is relatively straightforward, the interpretation of big data is more complex and would benefit from comparing results obtained with diverse analysis methods. The sbv IMPROVER project was created to implement solutions to verify systems biology data, methods, and conclusions. Computational challenges leveraging the wisdom of the crowd allow benchmarking methods for specific tasks, such as signature extraction and/or samples classification. Four challenges have already been successfully conducted and confirmed that the aggregation of predictions often leads to better results than individual predictions and that methods perform best in specific contexts. Whenever the scientific question of interest does not have a gold standard, but may greatly benefit from the scientific community to come together and discuss their approaches and results, datathons are set up. The inaugural sbv IMPROVER datathon was held in Singapore on 23-24 September 2016. It allowed bioinformaticians and data scientists to consolidate their ideas and work on the most promising methods as teams, after having initially reflected on the problem on their own. The outcome is a set of visualization and analysis methods that will be shared with the scientific community via the Garuda platform, an open connectivity platform that provides a framework to navigate through different applications, databases and services in biology and medicine. We will present the results we obtained when analyzing data with our network-based method, and introduce a datathon that will take place in Japan to encourage the analysis of the same datasets with other methods to allow for the consolidation of conclusions.Keywords: big data interpretation, datathon, systems toxicology, verification
Procedia PDF Downloads 27723595 Scalable Learning of Tree-Based Models on Sparsely Representable Data
Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou
Abstract:
Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.Keywords: big data, sparsely representable data, tree-based models, scalable learning
Procedia PDF Downloads 26123594 Performance and Nutritional Evaluation of Moringa Leaves Dried in a Solar-Assisted Heat Pump Dryer Integrated with Thermal Energy Storage
Authors: Aldé Belgard Tchicaya Loemba, Baraka Kichonge, Thomas Kivevele, Juma Rajabu Selemani
Abstract:
Plants used for medicinal purposes are extremely perishable, owing to moisture-enhanced enzymatic and microorganism activity, climate change, and improper handling and storage. Experiments have shown that drying the medicinal plant without affecting the active nutrients and controlling the moisture content as much as possible can extend its shelf life. Different traditional and modern drying techniques for preserving medicinal plants have been developed, with some still being improved in Sub-Saharan Africa. However, many of these methods fail to address the most common issues encountered when drying medicinal plants, such as nutrient loss, long drying times, and a limited capacity to dry during the evening or cloudy hours. Heat pump drying is an alternate drying method that results in no nutritional loss. Furthermore, combining a heat pump dryer with a solar energy storage system appears to be a viable option for all-weather drying without affecting the nutritional values of dried products. In this study, a solar-assisted heat pump dryer integrated with thermal energy storage is developed for drying moringa leaves. The study also discusses the performance analysis of the developed dryer as well as the proximate analysis of the dried moringa leaves. All experiments were conducted from 11 a.m. to 4 p.m. to assess the dryer's performance in “daytime mode”. Experiment results show that the drying time was significantly reduced, and the dryer demonstrated high performance in preserving all of the nutrients. In 5 hours of the drying process, the moisture content was reduced from 75.7 to 3.3%. The average COP value was 3.36, confirming the dryer's low energy consumption. The findings also revealed that after drying, the content of protein, carbohydrates, fats, fiber, and ash greatly increased.Keywords: heat pump dryer, efficiency, moringa leaves, proximate analysis
Procedia PDF Downloads 8123593 On Estimating the Low Income Proportion with Several Auxiliary Variables
Authors: Juan F. Muñoz-Rosas, Rosa M. García-Fernández, Encarnación Álvarez-Verdejo, Pablo J. Moya-Fernández
Abstract:
Poverty measurement is a very important topic in many studies in social sciences. One of the most important indicators when measuring poverty is the low income proportion. This indicator gives the proportion of people of a population classified as poor. This indicator is generally unknown, and for this reason, it is estimated by using survey data, which are obtained by official surveys carried out by many statistical agencies such as Eurostat. The main feature of the mentioned survey data is the fact that they contain several variables. The variable used to estimate the low income proportion is called as the variable of interest. The survey data may contain several additional variables, also named as the auxiliary variables, related to the variable of interest, and if this is the situation, they could be used to improve the estimation of the low income proportion. In this paper, we use Monte Carlo simulation studies to analyze numerically the performance of estimators based on several auxiliary variables. In this simulation study, we considered real data sets obtained from the 2011 European Union Survey on Income and Living Condition. Results derived from this study indicate that the estimators based on auxiliary variables are more accurate than the naive estimator.Keywords: inclusion probability, poverty, poverty line, survey sampling
Procedia PDF Downloads 45623592 TessPy – Spatial Tessellation Made Easy
Authors: Jonas Hamann, Siavash Saki, Tobias Hagen
Abstract:
Discretization of urban areas is a crucial aspect in many spatial analyses. The process of discretization of space into subspaces without overlaps and gaps is called tessellation. It helps understanding spatial space and provides a framework for analyzing geospatial data. Tessellation methods can be divided into two groups: regular tessellations and irregular tessellations. While regular tessellation methods, like squares-grids or hexagons-grids, are suitable for addressing pure geometry problems, they cannot take the unique characteristics of different subareas into account. However, irregular tessellation methods allow the border between the subareas to be defined more realistically based on urban features like a road network or Points of Interest (POI). Even though Python is one of the most used programming languages when it comes to spatial analysis, there is currently no library that combines different tessellation methods to enable users and researchers to compare different techniques. To close this gap, we are proposing TessPy, an open-source Python package, which combines all above-mentioned tessellation methods and makes them easily accessible to everyone. The core functions of TessPy represent the five different tessellation methods: squares, hexagons, adaptive squares, Voronoi polygons, and city blocks. By using regular methods, users can set the resolution of the tessellation which defines the finesse of the discretization and the desired number of tiles. Irregular tessellation methods allow users to define which spatial data to consider (e.g., amenity, building, office) and how fine the tessellation should be. The spatial data used is open-source and provided by OpenStreetMap. This data can be easily extracted and used for further analyses. Besides the methodology of the different techniques, the state-of-the-art, including examples and future work, will be discussed. All dependencies can be installed using conda or pip; however, the former is more recommended.Keywords: geospatial data science, geospatial data analysis, tessellations, urban studies
Procedia PDF Downloads 12623591 A CFD Analysis of Hydraulic Characteristics of the Rod Bundles in the BREST-OD-300 Wire-Spaced Fuel Assemblies
Authors: Dmitry V. Fomichev, Vladimir V. Solonin
Abstract:
This paper presents the findings from a numerical simulation of the flow in 37-rod fuel assembly models spaced by a double-wire trapezoidal wrapping as applied to the BREST-OD-300 experimental nuclear reactor. Data on a high static pressure distribution within the models, and equations for determining the fuel bundle flow friction factors have been obtained. Recommendations are provided on using the closing turbulence models available in the ANSYS Fluent. A comparative analysis has been performed against the existing empirical equations for determining the flow friction factors. The calculated and experimental data fit has been shown. An analysis into the experimental data and results of the numerical simulation of the BREST-OD-300 fuel rod assembly hydrodynamic performance are presented.Keywords: BREST-OD-300, ware-spaces, fuel assembly, computation fluid dynamics
Procedia PDF Downloads 38023590 Approaches to Reduce the Complexity of Mathematical Models for the Operational Optimization of Large-Scale Virtual Power Plants in Public Energy Supply
Authors: Thomas Weber, Nina Strobel, Thomas Kohne, Eberhard Abele
Abstract:
In context of the energy transition in Germany, the importance of so-called virtual power plants in the energy supply continues to increase. The progressive dismantling of the large power plants and the ongoing construction of many new decentralized plants result in great potential for optimization through synergies between the individual plants. These potentials can be exploited by mathematical optimization algorithms to calculate the optimal application planning of decentralized power and heat generators and storage systems. This also includes linear or linear mixed integer optimization. In this paper, procedures for reducing the number of decision variables to be calculated are explained and validated. On the one hand, this includes combining n similar installation types into one aggregated unit. This aggregated unit is described by the same constraints and target function terms as a single plant. This reduces the number of decision variables per time step and the complexity of the problem to be solved by a factor of n. The exact operating mode of the individual plants can then be calculated in a second optimization in such a way that the output of the individual plants corresponds to the calculated output of the aggregated unit. Another way to reduce the number of decision variables in an optimization problem is to reduce the number of time steps to be calculated. This is useful if a high temporal resolution is not necessary for all time steps. For example, the volatility or the forecast quality of environmental parameters may justify a high or low temporal resolution of the optimization. Both approaches are examined for the resulting calculation time as well as for optimality. Several optimization models for virtual power plants (combined heat and power plants, heat storage, power storage, gas turbine) with different numbers of plants are used as a reference for the investigation of both processes with regard to calculation duration and optimality.Keywords: CHP, Energy 4.0, energy storage, MILP, optimization, virtual power plant
Procedia PDF Downloads 17523589 Analysis of Lead Time Delays in Supply Chain: A Case Study
Authors: Abdel-Aziz M. Mohamed, Nermeen Coutry
Abstract:
Lead time is an important measure of supply chain performance. It impacts both customer satisfactions as well as the total cost of inventory. This paper presents the result of a study on the analysis of the customer order lead-time for a multinational company. In the study, the lead time was divided into three stages: order entry, order fulfillment, and order delivery. A sample of size 2,425 order lines from the company records were considered for this study. The sample data includes information regarding customer orders from the time of order entry until order delivery. Data regarding the lead time of each sage for different orders were also provided. Summary statistics on lead time data reveals that about 30% of the orders were delivered after the scheduled due date. The result of the multiple linear regression analysis technique revealed that component type, logistics parameter, order size and the customer type have significant impact on lead time. Data analysis on the stages of lead time indicates that stage 2 consumes over 50% of the lead time. Pareto analysis was made to study the reasons for the customer order delay in each of the 3 stages. Recommendation was given to resolve the problem.Keywords: lead time reduction, customer satisfaction, service quality, statistical analysis
Procedia PDF Downloads 72923588 A Unified Approach for Digital Forensics Analysis
Authors: Ali Alshumrani, Nathan Clarke, Bogdan Ghite, Stavros Shiaeles
Abstract:
Digital forensics has become an essential tool in the investigation of cyber and computer-assisted crime. Arguably, given the prevalence of technology and the subsequent digital footprints that exist, it could have a significant role across almost all crimes. However, the variety of technology platforms (such as computers, mobiles, Closed-Circuit Television (CCTV), Internet of Things (IoT), databases, drones, cloud computing services), heterogeneity and volume of data, forensic tool capability, and the investigative cost make investigations both technically challenging and prohibitively expensive. Forensic tools also tend to be siloed into specific technologies, e.g., File System Forensic Analysis Tools (FS-FAT) and Network Forensic Analysis Tools (N-FAT), and a good deal of data sources has little to no specialist forensic tools. Increasingly it also becomes essential to compare and correlate evidence across data sources and to do so in an efficient and effective manner enabling an investigator to answer high-level questions of the data in a timely manner without having to trawl through data and perform the correlation manually. This paper proposes a Unified Forensic Analysis Tool (U-FAT), which aims to establish a common language for electronic information and permit multi-source forensic analysis. Core to this approach is the identification and development of forensic analyses that automate complex data correlations, enabling investigators to investigate cases more efficiently. The paper presents a systematic analysis of major crime categories and identifies what forensic analyses could be used. For example, in a child abduction, an investigation team might have evidence from a range of sources including computing devices (mobile phone, PC), CCTV (potentially a large number), ISP records, and mobile network cell tower data, in addition to third party databases such as the National Sex Offender registry and tax records, with the desire to auto-correlate and across sources and visualize in a cognitively effective manner. U-FAT provides a holistic, flexible, and extensible approach to providing digital forensics in technology, application, and data-agnostic manner, providing powerful and automated forensic analysis.Keywords: digital forensics, evidence correlation, heterogeneous data, forensics tool
Procedia PDF Downloads 19423587 Analyzing Medical Workflows Using Market Basket Analysis
Authors: Mohit Kumar, Mayur Betharia
Abstract:
Healthcare domain, with the emergence of Electronic Medical Record (EMR), collects a lot of data which have been attracting Data Mining expert’s interest. In the past, doctors have relied on their intuition while making critical clinical decisions. This paper presents the means to analyze the Medical workflows to get business insights out of huge dumped medical databases. Market Basket Analysis (MBA) which is a special data mining technique, has been widely used in marketing and e-commerce field to discover the association between products bought together by customers. It helps businesses in increasing their sales by analyzing the purchasing behavior of customers and pitching the right customer with the right product. This paper is an attempt to demonstrate Market Basket Analysis applications in healthcare. In particular, it discusses the Market Basket Analysis Algorithm ‘Apriori’ applications within healthcare in major areas such as analyzing the workflow of diagnostic procedures, Up-selling and Cross-selling of Healthcare Systems, designing healthcare systems more user-friendly. In the paper, we have demonstrated the MBA applications using Angiography Systems, but can be extrapolated to other modalities as well.Keywords: data mining, market basket analysis, healthcare applications, knowledge discovery in healthcare databases, customer relationship management, healthcare systems
Procedia PDF Downloads 171