Search results for: location based data
44001 Total Organic Carbon, Porosity and Permeability Correlation: A Tool for Carbon Dioxide Storage Potential Evaluation in Irati Formation of the Parana Basin, Brazil
Authors: Richardson M. Abraham-A., Colombo Celso Gaeta Tassinari
Abstract:
The correlation between Total Organic Carbon (TOC) and flow units have been carried out to predict and compare the carbon dioxide (CO2) storage potential of the shale and carbonate rocks in Irati Formation of the Parana Basin. The equations for permeability (K), reservoir quality index (RQI) and flow zone indicator (FZI) are redefined and engaged to evaluate the flow units in both potential reservoir rocks. Shales show higher values of TOC compared to carbonates, as such, porosity (Ф) is most likely to be higher in shales compared to carbonates. The increase in Ф corresponds to the increase in K (in both rocks). Nonetheless, at lower values of Ф, K is higher in carbonates compared to shales. This shows that at lower values of TOC in carbonates, Ф is low, yet, K is likely to be high compared to shale. In the same vein, at higher values of TOC in shales, Ф is high, yet, K is expected to be low compared to carbonates. Overall, the flow unit factors (RQI and FZI) are better in the carbonates compared to the shales. Moreso, within the study location, there are some portions where the thicknesses of the carbonate units are higher compared to the shale units. Most parts of the carbonate strata in the study location are fractured in situ, hence, this could provide easy access for the storage of CO2. Therefore, based on these points and the disparities between the flow units in the evaluated rock types, the carbonate units are expected to show better potentials for the storage of CO2. The shale units may be considered as potential cap rocks or seals.Keywords: total organic content, flow units, carbon dioxide storage, geologic structures
Procedia PDF Downloads 16444000 Association of Social Data as a Tool to Support Government Decision Making
Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias
Abstract:
Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.Keywords: social data, government decision making, association of social data, data mining
Procedia PDF Downloads 36943999 Microgravity, Hydrological and Metrological Monitoring of Shallow Ground Water Aquifer in Al-Ain, UAE
Authors: Serin Darwish, Hakim Saibi, Amir Gabr
Abstract:
The United Arab Emirates (UAE) is situated within an arid zone where the climate is arid and the recharge of the groundwater is very low. Groundwater is the primary source of water in the United Arab Emirates. However, rapid expansion, population growth, agriculture, and industrial activities have negatively affected these limited water resources. The shortage of water resources has become a serious concern due to the over-pumping of groundwater to meet demand. In addition to the deficit of groundwater, the UAE has one of the highest per capita water consumption rates in the world. In this study, a combination of time-lapse measurements of microgravity and depth to groundwater level in selected wells in Al Ain city was used to estimate the variations in groundwater storage. Al-Ain is the second largest city in Abu Dhabi Emirates and the third largest city in the UAE. The groundwater in this region has been overexploited. Relative gravity measurements were acquired using the Scintrex CG-6 Autograv. This latest generation gravimeter from Scintrex Ltd provides fast, precise gravity measurements and automated corrections for temperature, tide, instrument tilt and rejection of data noise. The CG-6 gravimeter has a resolution of 0.1μGal. The purpose of this study is to measure the groundwater storage changes in the shallow aquifers based on the application of microgravity method. The gravity method is a nondestructive technique that allows collection of data at almost any location over the aquifer. Preliminary results indicate a possible relationship between microgravity and water levels, but more work needs to be done to confirm this. The results will help to develop the relationship between monthly microgravity changes with hydrological and hydrogeological changes of shallow phreatic. The study will be useful in water management considerations and additional future investigations.Keywords: Al-Ain, arid region, groundwater, microgravity
Procedia PDF Downloads 15343998 Sparsity-Based Unsupervised Unmixing of Hyperspectral Imaging Data Using Basis Pursuit
Authors: Ahmed Elrewainy
Abstract:
Mixing in the hyperspectral imaging occurs due to the low spatial resolutions of the used cameras. The existing pure materials “endmembers” in the scene share the spectra pixels with different amounts called “abundances”. Unmixing of the data cube is an important task to know the present endmembers in the cube for the analysis of these images. Unsupervised unmixing is done with no information about the given data cube. Sparsity is one of the recent approaches used in the source recovery or unmixing techniques. The l1-norm optimization problem “basis pursuit” could be used as a sparsity-based approach to solve this unmixing problem where the endmembers is assumed to be sparse in an appropriate domain known as dictionary. This optimization problem is solved using proximal method “iterative thresholding”. The l1-norm basis pursuit optimization problem as a sparsity-based unmixing technique was used to unmix real and synthetic hyperspectral data cubes.Keywords: basis pursuit, blind source separation, hyperspectral imaging, spectral unmixing, wavelets
Procedia PDF Downloads 19543997 Autism Spectrum Disorder Classification Algorithm Using Multimodal Data Based on Graph Convolutional Network
Authors: Yuntao Liu, Lei Wang, Haoran Xia
Abstract:
Machine learning has shown extensive applications in the development of classification models for autism spectrum disorder (ASD) using neural image data. This paper proposes a fusion multi-modal classification network based on a graph neural network. First, the brain is segmented into 116 regions of interest using a medical segmentation template (AAL, Anatomical Automatic Labeling). The image features of sMRI and the signal features of fMRI are extracted, which build the node and edge embedding representations of the brain map. Then, we construct a dynamically updated brain map neural network and propose a method based on a dynamic brain map adjacency matrix update mechanism and learnable graph to further improve the accuracy of autism diagnosis and recognition results. Based on the Autism Brain Imaging Data Exchange I dataset(ABIDE I), we reached a prediction accuracy of 74% between ASD and TD subjects. Besides, to study the biomarkers that can help doctors analyze diseases and interpretability, we used the features by extracting the top five maximum and minimum ROI weights. This work provides a meaningful way for brain disorder identification.Keywords: autism spectrum disorder, brain map, supervised machine learning, graph network, multimodal data, model interpretability
Procedia PDF Downloads 6743996 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data
Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim
Abstract:
Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.Keywords: activity pattern, data fusion, smart-card, XGboost
Procedia PDF Downloads 24643995 Predictors of School Safety Awareness among Malaysian Primary School Teachers
Authors: Ssekamanya, Mastura Badzis, Khamsiah Ismail, Dayang Shuzaidah Bt Abduludin
Abstract:
With rising incidents of school violence worldwide, educators and researchers are trying to understand and find ways to enhance the safety of children at school. The purpose of this study was to investigate the extent to which the demographic variables of gender, age, length of service, position, academic qualification, and school location predicted teachers’ awareness about school safety practices in Malaysian primary schools. A stratified random sample of 380 teachers was selected in the central Malaysian states of Kuala Lumpur and Selangor. Multiple regression analysis revealed that none of the factors was a good predictor of awareness about school safety training, delivery methods of school safety information, and available school safety programs. Awareness about school safety activities was significantly predicted by school location (whether the school was located in a rural or urban area). While these results may reflect a general lack of awareness about school safety among primary school teachers in the selected locations, a national study needs to be conducted for the whole country.Keywords: school safety awareness, predictors of school safety, multiple regression analysis, malaysian primary schools
Procedia PDF Downloads 46843994 Hysteretic Behavior of the Precast Concrete Column with Head Splice Sleeve Connection
Authors: Seo Soo-Yeon, Kim Sang-Ku, Noh Sang-Hyun, Lee Ji-Eun, Kim Seol-Ki, Lim Jong-Wook
Abstract:
This paper presents a test result to find the structural capacity of Hollow-Precast Concrete (HPC) column with Head-Splice Sleeve (HSS) for the connection of bars under horizontal cyclic load. Two Half-scaled HPC column specimens were made with the consideration of construction process in site. The difference between the HPC specimens is the location of HSS for bar connection. The location of the first one is on the bottom slab or foundation while the other is above the bottom slab or foundation. Reinforced concrete (RC) column was also made for the comparison. In order to evaluate the hysteretic behavior of the specimens, horizontal cyclic load was applied to the top of specimen under constant axial load. From the test, it is confirmed that the HPC columns with HSS have enough structural capacity that can be emulated to RC column. This means that the HPC column with HSS can be used in the moment resisting frame system.Keywords: structural capacity, hollow-precast concrete column, head-splice sleeve, horizontal cyclic load
Procedia PDF Downloads 37343993 Gene Prediction in DNA Sequences Using an Ensemble Algorithm Based on Goertzel Algorithm and Anti-Notch Filter
Authors: Hamidreza Saberkari, Mousa Shamsi, Hossein Ahmadi, Saeed Vaali, , MohammadHossein Sedaaghi
Abstract:
In the recent years, using signal processing tools for accurate identification of the protein coding regions has become a challenge in bioinformatics. Most of the genomic signal processing methods is based on the period-3 characteristics of the nucleoids in DNA strands and consequently, spectral analysis is applied to the numerical sequences of DNA to find the location of periodical components. In this paper, a novel ensemble algorithm for gene selection in DNA sequences has been presented which is based on the combination of Goertzel algorithm and anti-notch filter (ANF). The proposed algorithm has many advantages when compared to other conventional methods. Firstly, it leads to identify the coding protein regions more accurate due to using the Goertzel algorithm which is tuned at the desired frequency. Secondly, faster detection time is achieved. The proposed algorithm is applied on several genes, including genes available in databases BG570 and HMR195 and their results are compared to other methods based on the nucleotide level evaluation criteria. Implementation results show the excellent performance of the proposed algorithm in identifying protein coding regions, specifically in identification of small-scale gene areas.Keywords: protein coding regions, period-3, anti-notch filter, Goertzel algorithm
Procedia PDF Downloads 38743992 Imaging of Underground Targets with an Improved Back-Projection Algorithm
Authors: Alireza Akbari, Gelareh Babaee Khou
Abstract:
Ground Penetrating Radar (GPR) is an important nondestructive remote sensing tool that has been used in both military and civilian fields. Recently, GPR imaging has attracted lots of attention in detection of subsurface shallow small targets such as landmines and unexploded ordnance and also imaging behind the wall for security applications. For the monostatic arrangement in the space-time GPR image, a single point target appears as a hyperbolic curve because of the different trip times of the EM wave when the radar moves along a synthetic aperture and collects reflectivity of the subsurface targets. With this hyperbolic curve, the resolution along the synthetic aperture direction shows undesired low resolution features owing to the tails of hyperbola. However, highly accurate information about the size, electromagnetic (EM) reflectivity, and depth of the buried objects is essential in most GPR applications. Therefore hyperbolic curve behavior in the space-time GPR image is often willing to be transformed to a focused pattern showing the object's true location and size together with its EM scattering. The common goal in a typical GPR image is to display the information of the spatial location and the reflectivity of an underground object. Therefore, the main challenge of GPR imaging technique is to devise an image reconstruction algorithm that provides high resolution and good suppression of strong artifacts and noise. In this paper, at first, the standard back-projection (BP) algorithm that was adapted to GPR imaging applications used for the image reconstruction. The standard BP algorithm was limited with against strong noise and a lot of artifacts, which have adverse effects on the following work like detection targets. Thus, an improved BP is based on cross-correlation between the receiving signals proposed for decreasing noises and suppression artifacts. To improve the quality of the results of proposed BP imaging algorithm, a weight factor was designed for each point in region imaging. Compared to a standard BP algorithm scheme, the improved algorithm produces images of higher quality and resolution. This proposed improved BP algorithm was applied on the simulation and the real GPR data and the results showed that the proposed improved BP imaging algorithm has a superior suppression artifacts and produces images with high quality and resolution. In order to quantitatively describe the imaging results on the effect of artifact suppression, focusing parameter was evaluated.Keywords: algorithm, back-projection, GPR, remote sensing
Procedia PDF Downloads 45243991 Generation of Quasi-Measurement Data for On-Line Process Data Analysis
Authors: Hyun-Woo Cho
Abstract:
For ensuring the safety of a manufacturing process one should quickly identify an assignable cause of a fault in an on-line basis. To this end, many statistical techniques including linear and nonlinear methods have been frequently utilized. However, such methods possessed a major problem of small sample size, which is mostly attributed to the characteristics of empirical models used for reference models. This work presents a new method to overcome the insufficiency of measurement data in the monitoring and diagnosis tasks. Some quasi-measurement data are generated from existing data based on the two indices of similarity and importance. The performance of the method is demonstrated using a real data set. The results turn out that the presented methods are able to handle the insufficiency problem successfully. In addition, it is shown to be quite efficient in terms of computational speed and memory usage, and thus on-line implementation of the method is straightforward for monitoring and diagnosis purposes.Keywords: data analysis, diagnosis, monitoring, process data, quality control
Procedia PDF Downloads 48243990 Quick Sequential Search Algorithm Used to Decode High-Frequency Matrices
Authors: Mohammed M. Siddeq, Mohammed H. Rasheed, Omar M. Salih, Marcos A. Rodrigues
Abstract:
This research proposes a data encoding and decoding method based on the Matrix Minimization algorithm. This algorithm is applied to high-frequency coefficients for compression/encoding. The algorithm starts by converting every three coefficients to a single value; this is accomplished based on three different keys. The decoding/decompression uses a search method called QSS (Quick Sequential Search) Decoding Algorithm presented in this research based on the sequential search to recover the exact coefficients. In the next step, the decoded data are saved in an auxiliary array. The basic idea behind the auxiliary array is to save all possible decoded coefficients; this is because another algorithm, such as conventional sequential search, could retrieve encoded/compressed data independently from the proposed algorithm. The experimental results showed that our proposed decoding algorithm retrieves original data faster than conventional sequential search algorithms.Keywords: matrix minimization algorithm, decoding sequential search algorithm, image compression, DCT, DWT
Procedia PDF Downloads 15043989 Probabilistic Study of Impact Threat to Civil Aircraft and Realistic Impact Energy
Authors: Ye Zhang, Chuanjun Liu
Abstract:
In-service aircraft is exposed to different types of threaten, e.g. bird strike, ground vehicle impact, and run-way debris, or even lightning strike, etc. To satisfy the aircraft damage tolerance design requirements, the designer has to understand the threatening level for different types of the aircraft structures, either metallic or composite. Exposing to low-velocity impacts may produce very serious internal damages such as delaminations and matrix cracks without leaving visible mark onto the impacted surfaces for composite structures. This internal damage can cause significant reduction in the load carrying capacity of structures. The semi-probabilistic method provides a practical and proper approximation to establish the impact-threat based energy cut-off level for the damage tolerance evaluation of the aircraft components. Thus, the probabilistic distribution of impact threat and the realistic impact energy level cut-offs are the essential establishments required for the certification of aircraft composite structures. A new survey of impact threat to civil aircraft in-service has recently been carried out based on field records concerning around 500 civil aircrafts (mainly single aisles) and more than 4.8 million flight hours. In total 1,006 damages caused by low-velocity impact events had been screened out from more than 8,000 records including impact dents, scratches, corrosions, delaminations, cracks etc. The impact threat dependency on the location of the aircraft structures and structural configuration was analyzed. Although the survey was mainly focusing on the metallic structures, the resulting low-energy impact data are believed likely representative to general civil aircraft, since the service environments and the maintenance operations are independent of the materials of the structures. The probability of impact damage occurrence (Po) and impact energy exceedance (Pe) are the two key parameters for describing the statistic distribution of impact threat. With the impact damage events from the survey, Po can be estimated as 2.1x10-4 per flight hour. Concerning the calculation of Pe, a numerical model was developed using the commercial FEA software ABAQUS to backward estimate the impact energy based on the visible damage characteristics. The relationship between the visible dent depth and impact energy was established and validated by drop-weight impact experiments. Based on survey results, Pe was calculated and assumed having a log-linear relationship versus the impact energy. As the product of two aforementioned probabilities, Po and Pe, it is reasonable and conservative to assume Pa=PoxPe=10-5, which indicates that the low-velocity impact events are similarly likely as the Limit Load events. Combing Pa with two probabilities Po and Pe obtained based on the field survey, the cutoff level of realistic impact energy was estimated and valued as 34 J. In summary, a new survey was recently done on field records of civil aircraft to investigate the probabilistic distribution of impact threat. Based on the data, two probabilities, Po and Pe, were obtained. Considering a conservative assumption of Pa, the cutoff energy level for the realistic impact energy has been determined, which provides potential applicability in damage tolerance certification of future civil aircraft.Keywords: composite structure, damage tolerance, impact threat, probabilistic
Procedia PDF Downloads 30843988 Optimal Pricing Based on Real Estate Demand Data
Authors: Vanessa Kummer, Maik Meusel
Abstract:
Real estate demand estimates are typically derived from transaction data. However, in regions with excess demand, transactions are driven by supply and therefore do not indicate what people are actually looking for. To estimate the demand for housing in Switzerland, search subscriptions from all important Swiss real estate platforms are used. These data do, however, suffer from missing information—for example, many users do not specify how many rooms they would like or what price they would be willing to pay. In economic analyses, it is often the case that only complete data is used. Usually, however, the proportion of complete data is rather small which leads to most information being neglected. Also, the data might have a strong distortion if it is complete. In addition, the reason that data is missing might itself also contain information, which is however ignored with that approach. An interesting issue is, therefore, if for economic analyses such as the one at hand, there is an added value by using the whole data set with the imputed missing values compared to using the usually small percentage of complete data (baseline). Also, it is interesting to see how different algorithms affect that result. The imputation of the missing data is done using unsupervised learning. Out of the numerous unsupervised learning approaches, the most common ones, such as clustering, principal component analysis, or neural networks techniques are applied. By training the model iteratively on the imputed data and, thereby, including the information of all data into the model, the distortion of the first training set—the complete data—vanishes. In a next step, the performances of the algorithms are measured. This is done by randomly creating missing values in subsets of the data, estimating those values with the relevant algorithms and several parameter combinations, and comparing the estimates to the actual data. After having found the optimal parameter set for each algorithm, the missing values are being imputed. Using the resulting data sets, the next step is to estimate the willingness to pay for real estate. This is done by fitting price distributions for real estate properties with certain characteristics, such as the region or the number of rooms. Based on these distributions, survival functions are computed to obtain the functional relationship between characteristics and selling probabilities. Comparing the survival functions shows that estimates which are based on imputed data sets do not differ significantly from each other; however, the demand estimate that is derived from the baseline data does. This indicates that the baseline data set does not include all available information and is therefore not representative for the entire sample. Also, demand estimates derived from the whole data set are much more accurate than the baseline estimation. Thus, in order to obtain optimal results, it is important to make use of all available data, even though it involves additional procedures such as data imputation.Keywords: demand estimate, missing-data imputation, real estate, unsupervised learning
Procedia PDF Downloads 28543987 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values
Authors: Daniel Fundi Murithi
Abstract:
Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.Keywords: finite population total, missing data, model-based imputation, two-phase sampling
Procedia PDF Downloads 13143986 Investigation of Delivery of Triple Play Data in GE-PON Fiber to the Home Network
Authors: Ashima Anurag Sharma
Abstract:
Optical fiber based networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This research paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparison between various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be decreases due to increase in bit error rate.Keywords: BER, PON, TDMPON, GPON, CWDM, OLT, ONT
Procedia PDF Downloads 52743985 Attribution Theory and Perceived Reliability of Cellphones for Teaching and Learning
Authors: Mayowa A. Sofowora, Seraphin D. Eyono Obono
Abstract:
The use of information and communication technologies such as computers, mobile phones and the internet is becoming prevalent in today’s world; and it is facilitating access to a vast amount of data, services, and applications for the improvement of people’s lives. However, this prevalence of ICTs is hampered by the problem of low income levels in developing countries to the point where people cannot timeously replace or repair their ICT devices when damaged or lost; and this problem serves as a motivation for this study whose aim is to examine the perceptions of teachers on the reliability of cellphones when used for teaching and learning purposes. The research objectives unfolding this aim are of two types: objectives on the selection and design of theories and models, and objectives on the empirical testing of these theories and models. The first type of objectives is achieved using content analysis in an extensive literature survey, and the second type of objectives is achieved through a survey of high school teachers from the ILembe and Umgungudlovu districts in the KwaZuluNatal province of South Africa. Data collected from this questionnaire based survey is analysed in SPSS using descriptive statistics and Pearson correlations after checking the reliability and validity of the questionnaire. The main hypothesis driving this study is that there is a relationship between the demographics and the attribution identity of teachers on one hand, and their perceptions on the reliability of cellphones on the other hand, as suggested by existing literature; except that attribution identities are considered in this study under three angles: intention, knowledge and ability, and action. The results of this study confirm that the perceptions of teachers on the reliability of cellphones for teaching and learning are affected by the school location of these teachers, and by their perceptions on learners’ cellphones usage intentions and actual use.Keywords: attribution, cellphones, e-learning, reliability
Procedia PDF Downloads 40243984 Analyzing Large Scale Recurrent Event Data with a Divide-And-Conquer Approach
Authors: Jerry Q. Cheng
Abstract:
Currently, in analyzing large-scale recurrent event data, there are many challenges such as memory limitations, unscalable computing time, etc. In this research, a divide-and-conquer method is proposed using parametric frailty models. Specifically, the data is randomly divided into many subsets, and the maximum likelihood estimator from each individual data set is obtained. Then a weighted method is proposed to combine these individual estimators as the final estimator. It is shown that this divide-and-conquer estimator is asymptotically equivalent to the estimator based on the full data. Simulation studies are conducted to demonstrate the performance of this proposed method. This approach is applied to a large real dataset of repeated heart failure hospitalizations.Keywords: big data analytics, divide-and-conquer, recurrent event data, statistical computing
Procedia PDF Downloads 16643983 Enhanced Disk-Based Databases towards Improved Hybrid in-Memory Systems
Authors: Samuel Kaspi, Sitalakshmi Venkatraman
Abstract:
In-memory database systems are becoming popular due to the availability and affordability of sufficiently large RAM and processors in modern high-end servers with the capacity to manage large in-memory database transactions. While fast and reliable in-memory systems are still being developed to overcome cache misses, CPU/IO bottlenecks and distributed transaction costs, disk-based data stores still serve as the primary persistence. In addition, with the recent growth in multi-tenancy cloud applications and associated security concerns, many organisations consider the trade-offs and continue to require fast and reliable transaction processing of disk-based database systems as an available choice. For these organizations, the only way of increasing throughput is by improving the performance of disk-based concurrency control. This warrants a hybrid database system with the ability to selectively apply an enhanced disk-based data management within the context of in-memory systems that would help improve overall throughput. The general view is that in-memory systems substantially outperform disk-based systems. We question this assumption and examine how a modified variation of access invariance that we call enhanced memory access, (EMA) can be used to allow very high levels of concurrency in the pre-fetching of data in disk-based systems. We demonstrate how this prefetching in disk-based systems can yield close to in-memory performance, which paves the way for improved hybrid database systems. This paper proposes a novel EMA technique and presents a comparative study between disk-based EMA systems and in-memory systems running on hardware configurations of equivalent power in terms of the number of processors and their speeds. The results of the experiments conducted clearly substantiate that when used in conjunction with all concurrency control mechanisms, EMA can increase the throughput of disk-based systems to levels quite close to those achieved by in-memory system. The promising results of this work show that enhanced disk-based systems facilitate in improving hybrid data management within the broader context of in-memory systems.Keywords: in-memory database, disk-based system, hybrid database, concurrency control
Procedia PDF Downloads 41743982 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases
Authors: Hao-Hsiang Ku, Ching-Ho Chi
Abstract:
Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.Keywords: Hadoop, NoSQL, ontology, back propagation neural network, high distributed file system
Procedia PDF Downloads 26243981 A NoSQL Based Approach for Real-Time Managing of Robotics's Data
Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir
Abstract:
This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.Keywords: NoSQL databases, database management systems, robotics, big data
Procedia PDF Downloads 35543980 SPBAC: A Semantic Policy-Based Access Control for Database Query
Authors: Aaron Zhang, Alimire Kahaer, Gerald Weber, Nalin Arachchilage
Abstract:
Access control is an essential safeguard for the security of enterprise data, which controls users’ access to information resources and ensures the confidentiality and integrity of information resources [1]. Research shows that the more common types of access control now have shortcomings [2]. In this direction, to improve the existing access control, we have studied the current technologies in the field of data security, deeply investigated the previous data access control policies and their problems, identified the existing deficiencies, and proposed a new extension structure of SPBAC. SPBAC extension proposed in this paper aims to combine Policy-Based Access Control (PBAC) with semantics to provide logically connected, real-time data access functionality by establishing associations between enterprise data through semantics. Our design combines policies with linked data through semantics to create a "Semantic link" so that access control is no longer per-database and determines that users in each role should be granted access based on the instance policy, and improves the SPBAC implementation by constructing policies and defined attributes through the XACML specification, which is designed to extend on the original XACML model. While providing relevant design solutions, this paper hopes to continue to study the feasibility and subsequent implementation of related work at a later stage.Keywords: access control, semantic policy-based access control, semantic link, access control model, instance policy, XACML
Procedia PDF Downloads 9243979 Nonlinear Estimation Model for Rail Track Deterioration
Authors: M. Karimpour, L. Hitihamillage, N. Elkhoury, S. Moridpour, R. Hesami
Abstract:
Rail transport authorities around the world have been facing a significant challenge when predicting rail infrastructure maintenance work for a long period of time. Generally, maintenance monitoring and prediction is conducted manually. With the restrictions in economy, the rail transport authorities are in pursuit of improved modern methods, which can provide precise prediction of rail maintenance time and location. The expectation from such a method is to develop models to minimize the human error that is strongly related to manual prediction. Such models will help them in understanding how the track degradation occurs overtime under the change in different conditions (e.g. rail load, rail type, rail profile). They need a well-structured technique to identify the precise time that rail tracks fail in order to minimize the maintenance cost/time and secure the vehicles. The rail track characteristics that have been collected over the years will be used in developing rail track degradation prediction models. Since these data have been collected in large volumes and the data collection is done both electronically and manually, it is possible to have some errors. Sometimes these errors make it impossible to use them in prediction model development. This is one of the major drawbacks in rail track degradation prediction. An accurate model can play a key role in the estimation of the long-term behavior of rail tracks. Accurate models increase the track safety and decrease the cost of maintenance in long term. In this research, a short review of rail track degradation prediction models has been discussed before estimating rail track degradation for the curve sections of Melbourne tram track system using Adaptive Network-based Fuzzy Inference System (ANFIS) model.Keywords: ANFIS, MGT, prediction modeling, rail track degradation
Procedia PDF Downloads 33643978 The Effect of Institutions on Economic Growth: An Analysis Based on Bayesian Panel Data Estimation
Authors: Mohammad Anwar, Shah Waliullah
Abstract:
This study investigated panel data regression models. This paper used Bayesian and classical methods to study the impact of institutions on economic growth from data (1990-2014), especially in developing countries. Under the classical and Bayesian methodology, the two-panel data models were estimated, which are common effects and fixed effects. For the Bayesian approach, the prior information is used in this paper, and normal gamma prior is used for the panel data models. The analysis was done through WinBUGS14 software. The estimated results of the study showed that panel data models are valid models in Bayesian methodology. In the Bayesian approach, the effects of all independent variables were positively and significantly affected by the dependent variables. Based on the standard errors of all models, we must say that the fixed effect model is the best model in the Bayesian estimation of panel data models. Also, it was proved that the fixed effect model has the lowest value of standard error, as compared to other models.Keywords: Bayesian approach, common effect, fixed effect, random effect, Dynamic Random Effect Model
Procedia PDF Downloads 6843977 Quantification of Soft Tissue Artefacts Using Motion Capture Data and Ultrasound Depth Measurements
Authors: Azadeh Rouhandeh, Chris Joslin, Zhen Qu, Yuu Ono
Abstract:
The centre of rotation of the hip joint is needed for an accurate simulation of the joint performance in many applications such as pre-operative planning simulation, human gait analysis, and hip joint disorders. In human movement analysis, the hip joint center can be estimated using a functional method based on the relative motion of the femur to pelvis measured using reflective markers attached to the skin surface. The principal source of errors in estimation of hip joint centre location using functional methods is soft tissue artefacts due to the relative motion between the markers and bone. One of the main objectives in human movement analysis is the assessment of soft tissue artefact as the accuracy of functional methods depends upon it. Various studies have described the movement of soft tissue artefact invasively, such as intra-cortical pins, external fixators, percutaneous skeletal trackers, and Roentgen photogrammetry. The goal of this study is to present a non-invasive method to assess the displacements of the markers relative to the underlying bone using optical motion capture data and tissue thickness from ultrasound measurements during flexion, extension, and abduction (all with knee extended) of the hip joint. Results show that the artefact skin marker displacements are non-linear and larger in areas closer to the hip joint. Also marker displacements are dependent on the movement type and relatively larger in abduction movement. The quantification of soft tissue artefacts can be used as a basis for a correction procedure for hip joint kinematics.Keywords: hip joint center, motion capture, soft tissue artefact, ultrasound depth measurement
Procedia PDF Downloads 28143976 Clustering Performance Analysis using New Correlation-Based Cluster Validity Indices
Authors: Nathakhun Wiroonsri
Abstract:
There are various cluster validity measures used for evaluating clustering results. One of the main objectives of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weaknesses that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal option that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points are located in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios, including the well-known iris data set and a real-world marketing application, have been conducted to compare the proposed validity indices with several well-known ones.Keywords: clustering algorithm, cluster validity measure, correlation, data partitions, iris data set, marketing, pattern recognition
Procedia PDF Downloads 10343975 Recommendations for Data Quality Filtering of Opportunistic Species Occurrence Data
Authors: Camille Van Eupen, Dirk Maes, Marc Herremans, Kristijn R. R. Swinnen, Ben Somers, Stijn Luca
Abstract:
In ecology, species distribution models are commonly implemented to study species-environment relationships. These models increasingly rely on opportunistic citizen science data when high-quality species records collected through standardized recording protocols are unavailable. While these opportunistic data are abundant, uncertainty is usually high, e.g., due to observer effects or a lack of metadata. Data quality filtering is often used to reduce these types of uncertainty in an attempt to increase the value of studies relying on opportunistic data. However, filtering should not be performed blindly. In this study, recommendations are built for data quality filtering of opportunistic species occurrence data that are used as input for species distribution models. Using an extensive database of 5.7 million citizen science records from 255 species in Flanders, the impact on model performance was quantified by applying three data quality filters, and these results were linked to species traits. More specifically, presence records were filtered based on record attributes that provide information on the observation process or post-entry data validation, and changes in the area under the receiver operating characteristic (AUC), sensitivity, and specificity were analyzed using the Maxent algorithm with and without filtering. Controlling for sample size enabled us to study the combined impact of data quality filtering, i.e., the simultaneous impact of an increase in data quality and a decrease in sample size. Further, the variation among species in their response to data quality filtering was explored by clustering species based on four traits often related to data quality: commonness, popularity, difficulty, and body size. Findings show that model performance is affected by i) the quality of the filtered data, ii) the proportional reduction in sample size caused by filtering and the remaining absolute sample size, and iii) a species ‘quality profile’, resulting from a species classification based on the four traits related to data quality. The findings resulted in recommendations on when and how to filter volunteer generated and opportunistically collected data. This study confirms that correctly processed citizen science data can make a valuable contribution to ecological research and species conservation.Keywords: citizen science, data quality filtering, species distribution models, trait profiles
Procedia PDF Downloads 20343974 Imaging Based On Bi-Static SAR Using GPS L5 Signal
Authors: Tahir Saleem, Mohammad Usman, Nadeem Khan
Abstract:
GPS signals are used for navigation and positioning purposes by a diverse set of users. However, this project intends to utilize the reflected GPS L5 signals for location of target in a region of interest by generating an image that highlights the positions of targets in the area of interest. The principle of bi-static radar is used to detect the targets or any movement or changes. The idea is confirmed by the results obtained during MATLAB simulations. A matched filter based technique is employed in the signal processing to improve the system resolution. The simulation is carried out under different conditions with moving receiver and targets. Noise and attenuation is also induced and atmospheric conditions that affect the direct and reflected GPS signals have been simulated to generate a more practical scenario. A realistic GPS L5 signal has been simulated, the simulation results verify that the detection and imaging of targets is possible by employing reflected GPS using L5 signals and matched filter processing technique with acceptable spatial resolution.Keywords: GPS, L5 Signal, SAR, spatial resolution
Procedia PDF Downloads 53443973 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams
Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem
Abstract:
In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data
Procedia PDF Downloads 16143972 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification
Procedia PDF Downloads 465