Search results for: data warehouses
24288 Open Source, Open Hardware Ground Truth for Visual Odometry and Simultaneous Localization and Mapping Applications
Authors: Janusz Bedkowski, Grzegorz Kisala, Michal Wlasiuk, Piotr Pokorski
Abstract:
Ground-truth data is essential for VO (Visual Odometry) and SLAM (Simultaneous Localization and Mapping) quantitative evaluation using e.g. ATE (Absolute Trajectory Error) and RPE (Relative Pose Error). Many open-access data sets provide raw and ground-truth data for benchmark purposes. The issue appears when one would like to validate Visual Odometry and/or SLAM approaches on data captured using the device for which the algorithm is targeted for example mobile phone and disseminate data for other researchers. For this reason, we propose an open source, open hardware groundtruth system that provides an accurate and precise trajectory with a 3D point cloud. It is based on LiDAR Livox Mid-360 with a non-repetitive scanning pattern, on-board Raspberry Pi 4B computer, battery and software for off-line calculations (camera to LiDAR calibration, LiDAR odometry, SLAM, georeferencing). We show how this system can be used for the evaluation of various the state of the art algorithms (Stella SLAM, ORB SLAM3, DSO) in typical indoor monocular VO/SLAM.Keywords: SLAM, ground truth, navigation, LiDAR, visual odometry, mapping
Procedia PDF Downloads 7024287 Prediction of Gully Erosion with Stochastic Modeling by using Geographic Information System and Remote Sensing Data in North of Iran
Authors: Reza Zakerinejad
Abstract:
Gully erosion is a serious problem that threading the sustainability of agricultural area and rangeland and water in a large part of Iran. This type of water erosion is the main source of sedimentation in many catchment areas in the north of Iran. Since in many national assessment approaches just qualitative models were applied the aim of this study is to predict the spatial distribution of gully erosion processes by means of detail terrain analysis and GIS -based logistic regression in the loess deposition in a case study in the Golestan Province. This study the DEM with 25 meter result ion from ASTER data has been used. The Landsat ETM data have been used to mapping of land use. The TreeNet model as a stochastic modeling was applied to prediction the susceptible area for gully erosion. In this model ROC we have set 20 % of data as learning and 20 % as learning data. Therefore, applying the GIS and satellite image analysis techniques has been used to derive the input information for these stochastic models. The result of this study showed a high accurate map of potential for gully erosion.Keywords: TreeNet model, terrain analysis, Golestan Province, Iran
Procedia PDF Downloads 53624286 Data Science/Artificial Intelligence: A Possible Panacea for Refugee Crisis
Authors: Avi Shrivastava
Abstract:
In 2021, two heart-wrenching scenes, shown live on television screens across countries, painted a grim picture of refugees. One of them was of people clinging onto an airplane's wings in their desperate attempt to flee war-torn Afghanistan. They ultimately fell to their death. The other scene was the U.S. government authorities separating children from their parents or guardians to deter migrants/refugees from coming to the U.S. These events show the desperation refugees feel when they are trying to leave their homes in disaster zones. However, data paints a grave picture of the current refugee situation. It also indicates that a bleak future lies ahead for the refugees across the globe. Data and information are the two threads that intertwine to weave the shimmery fabric of modern society. Data and information are often used interchangeably, but they differ considerably. For example, information analysis reveals rationale, and logic, while data analysis, on the other hand, reveals a pattern. Moreover, patterns revealed by data can enable us to create the necessary tools to combat huge problems on our hands. Data analysis paints a clear picture so that the decision-making process becomes simple. Geopolitical and economic data can be used to predict future refugee hotspots. Accurately predicting the next refugee hotspots will allow governments and relief agencies to prepare better for future refugee crises. The refugee crisis does not have binary answers. Given the emotionally wrenching nature of the ground realities, experts often shy away from realistically stating things as they are. This hesitancy can cost lives. When decisions are based solely on data, emotions can be removed from the decision-making process. Data also presents irrefutable evidence and tells whether there is a solution or not. Moreover, it also responds to a nonbinary crisis with a binary answer. Because of all that, it becomes easier to tackle a problem. Data science and A.I. can predict future refugee crises. With the recent explosion of data due to the rise of social media platforms, data and insight into data has solved many social and political problems. Data science can also help solve many issues refugees face while staying in refugee camps or adopted countries. This paper looks into various ways data science can help solve refugee problems. A.I.-based chatbots can help refugees seek legal help to find asylum in the country they want to settle in. These chatbots can help them find a marketplace where they can find help from the people willing to help. Data science and technology can also help solve refugees' many problems, including food, shelter, employment, security, and assimilation. The refugee problem seems to be one of the most challenging for social and political reasons. Data science and machine learning can help prevent the refugee crisis and solve or alleviate some of the problems that refugees face in their journey to a better life. With the explosion of data in the last decade, data science has made it possible to solve many geopolitical and social issues.Keywords: refugee crisis, artificial intelligence, data science, refugee camps, Afghanistan, Ukraine
Procedia PDF Downloads 7324285 A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing
Authors: Youngji Yoo, Seung Hwan Park, Daewoong An, Sung-Shick Kim, Jun-Geol Baek
Abstract:
The yield management system is very important to produce high-quality semiconductor chips in the semiconductor manufacturing process. In order to improve quality of semiconductors, various tests are conducted in the post fabrication (FAB) process. During the test process, large amount of data are collected and the data includes a lot of information about defect. In general, the defect on the wafer is the main causes of yield loss. Therefore, analyzing the defect data is necessary to improve performance of yield prediction. The wafer bin map (WBM) is one of the data collected in the test process and includes defect information such as the fail bit patterns. The fail bit has characteristics of spatial point patterns. Therefore, this paper proposes the feature extraction method using the spatial point pattern analysis. Actual data obtained from the semiconductor process is used for experiments and the experimental result shows that the proposed method is more accurately recognize the fail bit patterns.Keywords: semiconductor, wafer bin map, feature extraction, spatial point patterns, contour map
Procedia PDF Downloads 38424284 The Measurement of the Multi-Period Efficiency of the Turkish Health Care Sector
Authors: Erhan Berk
Abstract:
The purpose of this study is to examine the efficiency and productivity of the health care sector in Turkey based on four years of health care cross-sectional data. Efficiency measures are calculated by a nonparametric approach known as Data Envelopment Analysis (DEA). Productivity is measured by the Malmquist index. The research shows how DEA-based Malmquist productivity index can be operated to appraise the technology and productivity changes resulted in the Turkish hospitals which are located all across the country.Keywords: data envelopment analysis, efficiency, health care, Malmquist Index
Procedia PDF Downloads 33524283 Comparison Of Data Mining Models To Predict Future Bridge Conditions
Authors: Pablo Martinez, Emad Mohamed, Osama Mohsen, Yasser Mohamed
Abstract:
Highway and bridge agencies, such as the Ministry of Transportation in Ontario, use the Bridge Condition Index (BCI) which is defined as the weighted condition of all bridge elements to determine the rehabilitation priorities for its bridges. Therefore, accurate forecasting of BCI is essential for bridge rehabilitation budgeting planning. The large amount of data available in regard to bridge conditions for several years dictate utilizing traditional mathematical models as infeasible analysis methods. This research study focuses on investigating different classification models that are developed to predict the bridge condition index in the province of Ontario, Canada based on the publicly available data for 2800 bridges over a period of more than 10 years. The data preparation is a key factor to develop acceptable classification models even with the simplest one, the k-NN model. All the models were tested, compared and statistically validated via cross validation and t-test. A simple k-NN model showed reasonable results (within 0.5% relative error) when predicting the bridge condition in an incoming year.Keywords: asset management, bridge condition index, data mining, forecasting, infrastructure, knowledge discovery in databases, maintenance, predictive models
Procedia PDF Downloads 19124282 Piql Preservation Services - A Holistic Approach to Digital Long-Term Preservation
Authors: Alexander Rych
Abstract:
Piql Preservation Services (“Piql”) is a turnkey solution designed for secure, migration-free long- term preservation of digital data. Piql sets an open standard for long- term preservation for the future. It consists of equipment and processes needed for writing and retrieving digital data. Exponentially growing amounts of data demand for logistically effective and cost effective processes. Digital storage media (hard disks, magnetic tape) exhibit limited lifetime. Repetitive data migration to overcome rapid obsolescence of hardware and software bears accelerated risk of data loss, data corruption or even manipulation and adds significant repetitive costs for hardware and software investments. Piql stores any kind of data in its digital as well as analog form securely for 500 years. The medium that provides this is a film reel. Using photosensitive film polyester base, a very stable material that is known for its immutability over hundreds of years, secure and cost-effective long- term preservation can be provided. The film reel itself is stored in a packaging capable of protecting the optical storage medium. These components have undergone extensive testing to ensure longevity of up to 500 years. In addition to its durability, film is a true WORM (write once- read many) medium. It therefore is resistant to editing or manipulation. Being able to store any form of data onto the film makes Piql a superior solution for long-term preservation. Paper documents, images, video or audio sequences – all of those file formats and documents can be preserved in its native file structure. In order to restore the encoded digital data, only a film scanner, a digital camera or any appropriate optical reading device will be needed in the future. Every film reel includes an index section describing the data saved on the film. It also contains a content section carrying meta-data, enabling users in the future to rebuild software in order to read and decode the digital information.Keywords: digital data, long-term preservation, migration-free, photosensitive film
Procedia PDF Downloads 39224281 Statistical Correlation between Logging-While-Drilling Measurements and Wireline Caliper Logs
Authors: Rima T. Alfaraj, Murtadha J. Al Tammar, Khaqan Khan, Khalid M. Alruwaili
Abstract:
OBJECTIVE/SCOPE (25-75): Caliper logging data provides critical information about wellbore shape and deformations, such as stress-induced borehole breakouts or washouts. Multiarm mechanical caliper logs are often run using wireline, which can be time-consuming, costly, and/or challenging to run in certain formations. To minimize rig time and improve operational safety, it is valuable to develop analytical solutions that can estimate caliper logs using available Logging-While-Drilling (LWD) data without the need to run wireline caliper logs. As a first step, the objective of this paper is to perform statistical analysis using an extensive datasetto identify important physical parameters that should be considered in developing such analytical solutions. METHODS, PROCEDURES, PROCESS (75-100): Caliper logs and LWD data of eleven wells, with a total of more than 80,000 data points, were obtained and imported into a data analytics software for analysis. Several parameters were selected to test the relationship of the parameters with the measured maximum and minimum caliper logs. These parameters includegamma ray, porosity, shear, and compressional sonic velocities, bulk densities, and azimuthal density. The data of the eleven wells were first visualized and cleaned.Using the analytics software, several analyses were then preformed, including the computation of Pearson’s correlation coefficients to show the statistical relationship between the selected parameters and the caliper logs. RESULTS, OBSERVATIONS, CONCLUSIONS (100-200): The results of this statistical analysis showed that some parameters show good correlation to the caliper log data. For instance, the bulk density and azimuthal directional densities showedPearson’s correlation coefficients in the range of 0.39 and 0.57, which wererelatively high when comparedto the correlation coefficients of caliper data with other parameters. Other parameters such as porosity exhibited extremely low correlation coefficients to the caliper data. Various crossplots and visualizations of the data were also demonstrated to gain further insights from the field data. NOVEL/ADDITIVE INFORMATION (25-75): This study offers a unique and novel look into the relative importance and correlation between different LWD measurements and wireline caliper logs via an extensive dataset. The results pave the way for a more informed development of new analytical solutions for estimating the size and shape of the wellbore in real-time while drilling using LWD data.Keywords: LWD measurements, caliper log, correlations, analysis
Procedia PDF Downloads 12124280 Inversion of Gravity Data for Density Reconstruction
Authors: Arka Roy, Chandra Prakash Dubey
Abstract:
Inverse problem generally used for recovering hidden information from outside available data. Vertical component of gravity field we will be going to use for underneath density structure calculation. Ill-posing nature is main obstacle for any inverse problem. Linear regularization using Tikhonov formulation are used for appropriate choice of SVD and GSVD components. For real time data handle, signal to noise ratios should have to be less for reliable solution. In our study, 2D and 3D synthetic model with rectangular grid are used for gravity field calculation and its corresponding inversion for density reconstruction. Fine grid also we have considered to hold any irregular structure. Keeping in mind of algebraic ambiguity factor number of observation point should be more than that of number of data point. Picard plot is represented here for choosing appropriate or main controlling Eigenvalues for a regularized solution. Another important study is depth resolution plot (DRP). DRP are generally used for studying how the inversion is influenced by regularizing or discretizing. Our further study involves real time gravity data inversion of Vredeforte Dome South Africa. We apply our method to this data. The results include density structure is in good agreement with known formation in that region, which puts an additional support of our method.Keywords: depth resolution plot, gravity inversion, Picard plot, SVD, Tikhonov formulation
Procedia PDF Downloads 21224279 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease
Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena
Abstract:
Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics
Procedia PDF Downloads 9824278 An Efficient Data Mining Technique for Online Stores
Authors: Mohammed Al-Shalabi, Alaa Obeidat
Abstract:
In any food stores, some items will be expired or destroyed because the demand on these items is infrequent, so we need a system that can help the decision maker to make an offer on such items to improve the demand on the items by putting them with some other frequent item and decrease the price to avoid losses. The system generates hundreds or thousands of patterns (offers) for each low demand item, then it uses the association rules (support, confidence) to find the interesting patterns (the best offer to achieve the lowest losses). In this paper, we propose a data mining method for determining the best offer by merging the data mining techniques with the e-commerce strategy. The task is to build a model to predict the best offer. The goal is to maximize the profits of a store and avoid the loss of products. The idea in this paper is the using of the association rules in marketing with a combination with e-commerce.Keywords: data mining, association rules, confidence, online stores
Procedia PDF Downloads 41024277 Elemental Graph Data Model: A Semantic and Topological Representation of Building Elements
Authors: Yasmeen A. S. Essawy, Khaled Nassar
Abstract:
With the rapid increase of complexity in the building industry, professionals in the A/E/C industry were forced to adopt Building Information Modeling (BIM) in order to enhance the communication between the different project stakeholders throughout the project life cycle and create a semantic object-oriented building model that can support geometric-topological analysis of building elements during design and construction. This paper presents a model that extracts topological relationships and geometrical properties of building elements from an existing fully designed BIM, and maps this information into a directed acyclic Elemental Graph Data Model (EGDM). The model incorporates BIM-based search algorithms for automatic deduction of geometrical data and topological relationships for each building element type. Using graph search algorithms, such as Depth First Search (DFS) and topological sortings, all possible construction sequences can be generated and compared against production and construction rules to generate an optimized construction sequence and its associated schedule. The model is implemented in a C# platform.Keywords: building information modeling (BIM), elemental graph data model (EGDM), geometric and topological data models, graph theory
Procedia PDF Downloads 38224276 Wireless Sensor Network for Forest Fire Detection and Localization
Authors: Tarek Dandashi
Abstract:
WSNs may provide a fast and reliable solution for the early detection of environment events like forest fires. This is crucial for alerting and calling for fire brigade intervention. Sensor nodes communicate sensor data to a host station, which enables a global analysis and the generation of a reliable decision on a potential fire and its location. A WSN with TinyOS and nesC for the capturing and transmission of a variety of sensor information with controlled source, data rates, duration, and the records/displaying activity traces is presented. We propose a similarity distance (SD) between the distribution of currently sensed data and that of a reference. At any given time, a fire causes diverging opinions in the reported data, which alters the usual data distribution. Basically, SD consists of a metric on the Cumulative Distribution Function (CDF). SD is designed to be invariant versus day-to-day changes of temperature, changes due to the surrounding environment, and normal changes in weather, which preserve the data locality. Evaluation shows that SD sensitivity is quadratic versus an increase in sensor node temperature for a group of sensors of different sizes and neighborhood. Simulation of fire spreading when ignition is placed at random locations with some wind speed shows that SD takes a few minutes to reliably detect fires and locate them. We also discuss the case of false negative and false positive and their impact on the decision reliability.Keywords: forest fire, WSN, wireless sensor network, algortihm
Procedia PDF Downloads 26224275 A Feasibility Study of Crowdsourcing Data Collection for Facility Maintenance Management
Authors: Mohamed Bin Alhaj, Hexu Liu, Mohammed Sulaiman, Osama Abudayyeh
Abstract:
An effective facility maintenance management (FMM) system plays a crucial role in improving the quality of services and maintaining the facility in good condition. Current FMM heavily relies on the quality of the data collection function of the FMM systems, at times resulting in inefficient FMM decision-making. The new technology-based crowdsourcing provides great potential to improve the current FMM practices, especially in terms of timeliness and quality of data. This research aims to investigate the feasibility of using new technology-driven crowdsourcing for FMM and highlight its opportunities and challenges. A survey was carried out to understand the human, data, system, geospatial, and automation characteristics of crowdsourcing for an educational campus FMM via social networks. The survey results were analyzed to reveal the challenges and recommendations for the implementation of crowdsourcing for FMM. This research contributes to the body of knowledge by synthesizing the challenges and opportunities of using crowdsourcing for facility maintenance and providing a road map for applying crowdsourcing technology in FMM. In future work, a conceptual framework will be proposed to support data-driven FMM using social networks.Keywords: crowdsourcing, facility maintenance management, social networks
Procedia PDF Downloads 17424274 Challenges and Opportunities: One Stop Processing for the Automation of Indonesian Large-Scale Topographic Base Map Using Airborne LiDAR Data
Authors: Elyta Widyaningrum
Abstract:
The LiDAR data acquisition has been recognizable as one of the fastest solution to provide the basis data for topographic base mapping in Indonesia. The challenges to accelerate the provision of large-scale topographic base maps as a development plan basis gives the opportunity to implement the automated scheme in the map production process. The one stop processing will also contribute to accelerate the map provision especially to conform with the Indonesian fundamental spatial data catalog derived from ISO 19110 and geospatial database integration. Thus, the automated LiDAR classification, DTM generation and feature extraction will be conducted in one GIS-software environment to form all layers of topographic base maps. The quality of automated topographic base map will be assessed and analyzed based on its completeness, correctness, contiguity, consistency and possible customization.Keywords: automation, GIS environment, LiDAR processing, map quality
Procedia PDF Downloads 36824273 Mixtures of Length-Biased Weibull Distributions for Loss Severity Modelling
Authors: Taehan Bae
Abstract:
In this paper, a class of length-biased Weibull mixtures is presented to model loss severity data. The proposed model generalizes the Erlang mixtures with the common scale parameter, and it shares many important modelling features, such as flexibility to fit various data distribution shapes and weak-denseness in the class of positive continuous distributions, with the Erlang mixtures. We show that the asymptotic tail estimate of the length-biased Weibull mixture is Weibull-type, which makes the model effective to fit loss severity data with heavy-tailed observations. A method of statistical estimation is discussed with applications on real catastrophic loss data sets.Keywords: Erlang mixture, length-biased distribution, transformed gamma distribution, asymptotic tail estimate, EM algorithm, expectation-maximization algorithm
Procedia PDF Downloads 22424272 Robust Data Image Watermarking for Data Security
Authors: Harsh Vikram Singh, Ankur Rai, Anand Mohan
Abstract:
In this paper, we propose secure and robust data hiding algorithm based on DCT by Arnold transform and chaotic sequence. The watermark image is scrambled by Arnold cat map to increases its security and then the chaotic map is used for watermark signal spread in middle band of DCT coefficients of the cover image The chaotic map can be used as pseudo-random generator for digital data hiding, to increase security and robustness .Performance evaluation for robustness and imperceptibility of proposed algorithm has been made using bit error rate (BER), normalized correlation (NC), and peak signal to noise ratio (PSNR) value for different watermark and cover images such as Lena, Girl, Tank images and gain factor .We use a binary logo image and text image as watermark. The experimental results demonstrate that the proposed algorithm achieves higher security and robustness against JPEG compression as well as other attacks such as addition of noise, low pass filtering and cropping attacks compared to other existing algorithm using DCT coefficients. Moreover, to recover watermarks in proposed algorithm, there is no need to original cover image.Keywords: data hiding, watermarking, DCT, chaotic sequence, arnold transforms
Procedia PDF Downloads 51524271 An Empirical Investigation of Big Data Analytics: The Financial Performance of Users versus Vendors
Authors: Evisa Mitrou, Nicholas Tsitsianis, Supriya Shinde
Abstract:
In the age of digitisation and globalisation, businesses have shifted online and are investing in big data analytics (BDA) to respond to changing market conditions and sustain their performance. Our study shifts the focus from the adoption of BDA to the impact of BDA on financial performance. We explore the financial performance of both BDA-vendors (business-to-business) and BDA-clients (business-to-customer). We distinguish between the five BDA-technologies (big-data-as-a-service (BDaaS), descriptive, diagnostic, predictive, and prescriptive analytics) and discuss them individually. Further, we use four perspectives (internal business process, learning and growth, customer, and finance) and discuss the significance of how each of the five BDA-technologies affects the performance measures of these four perspectives. We also present the analysis of employee engagement, average turnover, average net income, and average net assets for BDA-clients and BDA-vendors. Our study also explores the effect of the COVID-19 pandemic on business continuity for both BDA-vendors and BDA-clients.Keywords: BDA-clients, BDA-vendors, big data analytics, financial performance
Procedia PDF Downloads 12424270 Rapid Monitoring of Earthquake Damages Using Optical and SAR Data
Authors: Saeid Gharechelou, Ryutaro Tateishi
Abstract:
Earthquake is an inevitable catastrophic natural disaster. The damages of buildings and man-made structures, where most of the human activities occur are the major cause of casualties from earthquakes. A comparison of optical and SAR data is presented in the case of Kathmandu valley which was hardly shaken by 2015-Nepal Earthquake. Though many existing researchers have conducted optical data based estimated or suggested combined use of optical and SAR data for improved accuracy, however finding cloud-free optical images when urgently needed are not assured. Therefore, this research is specializd in developing SAR based technique with the target of rapid and accurate geospatial reporting. Should considers that limited time available in post-disaster situation offering quick computation exclusively based on two pairs of pre-seismic and co-seismic single look complex (SLC) images. The InSAR coherence pre-seismic, co-seismic and post-seismic was used to detect the change in damaged area. In addition, the ground truth data from field applied to optical data by random forest classification for detection of damaged area. The ground truth data collected in the field were used to assess the accuracy of supervised classification approach. Though a higher accuracy obtained from the optical data then integration by optical-SAR data. Limitation of cloud-free images when urgently needed for earthquak evevent are and is not assured, thus further research on improving the SAR based damage detection is suggested. Availability of very accurate damage information is expected for channelling the rescue and emergency operations. It is expected that the quick reporting of the post-disaster damage situation quantified by the rapid earthquake assessment should assist in channeling the rescue and emergency operations, and in informing the public about the scale of damage.Keywords: Sentinel-1A data, Landsat-8, earthquake damage, InSAR, rapid damage monitoring, 2015-Nepal earthquake
Procedia PDF Downloads 17224269 Scheduling Nodes Activity and Data Communication for Target Tracking in Wireless Sensor Networks
Authors: AmirHossein Mohajerzadeh, Mohammad Alishahi, Saeed Aslishahi, Mohsen Zabihi
Abstract:
In this paper, we consider sensor nodes with the capability of measuring the bearings (relative angle to the target). We use geometric methods to select a set of observer nodes which are responsible for collecting data from the target. Considering the characteristics of target tracking applications, it is clear that significant numbers of sensor nodes are usually inactive. Therefore, in order to minimize the total network energy consumption, a set of sensor nodes, called sentinel, is periodically selected for monitoring, controlling the environment and transmitting data through the network. The other nodes are inactive. Furthermore, the proposed algorithm provides a joint scheduling and routing algorithm to transmit data between network nodes and the fusion center (FC) in which not only provides an efficient way to estimate the target position but also provides an efficient target tracking. Performance evaluation confirms the superiority of the proposed algorithm.Keywords: coverage, routing, scheduling, target tracking, wireless sensor networks
Procedia PDF Downloads 37824268 Urban Big Data: An Experimental Approach to Building-Value Estimation Using Web-Based Data
Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin
Abstract:
Current real-estate value estimation, difficult for laymen, usually is performed by specialists. This paper presents an automated estimation process based on big data and machine-learning technology that calculates influences of building conditions on real-estate price measurement. The present study analyzed actual building sales sample data for Nonhyeon-dong, Gangnam-gu, Seoul, Korea, measuring the major influencing factors among the various building conditions. Further to that analysis, a prediction model was established and applied using RapidMiner Studio, a graphical user interface (GUI)-based tool for derivation of machine-learning prototypes. The prediction model is formulated by reference to previous examples. When new examples are applied, it analyses and predicts accordingly. The analysis process discerns the crucial factors effecting price increases by calculation of weighted values. The model was verified, and its accuracy determined, by comparing its predicted values with actual price increases.Keywords: apartment complex, big data, life-cycle building value analysis, machine learning
Procedia PDF Downloads 37424267 Blockchain Technology Security Evaluation: Voting System Based on Blockchain
Authors: Omid Amini
Abstract:
Nowadays, technology plays the most important role in the life of human beings because people use technology to share data and to communicate with each other, but the challenge is the security of this data. For instance, as more people turn to technology in the world, more data is generated, and more hackers try to steal or infiltrate data. In addition, the data is under the control of the central authority, which can trigger the challenge of losing information and changing information; this can create widespread anxiety for different people in different communities. In this paper, we sought to investigate Blockchain technology that can guarantee information security and eliminate the challenge of central authority access to information. Now a day, people are suffering from the current voting system. This means that the lack of transparency in the voting system is a big problem for society and the government in most countries, but blockchain technology can be the best alternative to the previous voting system methods because it removes the most important challenge for voting. According to the results, this research can be a good start to getting acquainted with this new technology, especially on the security part and familiarity with how to use a voting system based on blockchain in the world. At the end of this research, it is concluded that the use of blockchain technology can solve the major security problem and lead to a secure and transparent election.Keywords: blockchain, technology, security, information, voting system, transparency
Procedia PDF Downloads 13224266 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach
Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini
Abstract:
Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing
Procedia PDF Downloads 16724265 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach
Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini
Abstract:
Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanismsKeywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing
Procedia PDF Downloads 15924264 Design and Implementation of Flexible Metadata Editing System for Digital Contents
Authors: K. W. Nam, B. J. Kim, S. J. Lee
Abstract:
Along with the development of network infrastructures, such as high-speed Internet and mobile environment, the explosion of multimedia data is expanding the range of multimedia services beyond voice and data services. Amid this flow, research is actively being done on the creation, management, and transmission of metadata on digital content to provide different services to users. This paper proposes a system for the insertion, storage, and retrieval of metadata about digital content. The metadata server with Binary XML was implemented for efficient storage space and retrieval speeds, and the transport data size required for metadata retrieval was simplified. With the proposed system, the metadata could be inserted into the moving objects in the video, and the unnecessary overlap could be minimized by improving the storage structure of the metadata. The proposed system can assemble metadata into one relevant topic, even if it is expressed in different media or in different forms. It is expected that the proposed system will handle complex network types of data.Keywords: video, multimedia, metadata, editing tool, XML
Procedia PDF Downloads 17124263 System for Monitoring Marine Turtles Using Unstructured Supplementary Service Data
Authors: Luís Pina
Abstract:
The conservation of marine biodiversity keeps ecosystems in balance and ensures the sustainable use of resources. In this context, technological resources have been used for monitoring marine species to allow biologists to obtain data in real-time. There are different mobile applications developed for data collection for monitoring purposes, but these systems are designed to be utilized only on third-generation (3G) phones or smartphones with Internet access and in rural parts of the developing countries, Internet services and smartphones are scarce. Thus, the objective of this work is to develop a system to monitor marine turtles using Unstructured Supplementary Service Data (USSD), which users can access through basic mobile phones. The system aims to improve the data collection mechanism and enhance the effectiveness of current systems in monitoring sea turtles using any type of mobile device without Internet access. The system will be able to report information related to the biological activities of marine turtles. Also, it will be used as a platform to assist marine conservation entities to receive reports of illegal sales of sea turtles. The system can also be utilized as an educational tool for communities, providing knowledge and allowing the inclusion of communities in the process of monitoring marine turtles. Therefore, this work may contribute with information to decision-making and implementation of contingency plans for marine conservation programs.Keywords: GSM, marine biology, marine turtles, unstructured supplementary service data (USSD)
Procedia PDF Downloads 20624262 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising
Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri
Abstract:
Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing
Procedia PDF Downloads 58924261 The Trend of Injuries in Building Fire in Tehran from 2002 to 2012
Authors: Mohammadreza Ashouri, Majid Bayatian
Abstract:
Analysis of fire data is a way for the implementation of any plan to improve the level of safety in cities. Such an analysis is able to reveal signs of changes in a given period and can be used as a measure of safety. The information of about 66,341 fires (from 2002 to 2012) released by Tehran Safety Services and Fire-Fighting Organization and data on the population and the number of households provided by Tehran Municipality and the Statistical Yearbook of Iran were extracted. Using the data, the fire changes, the rate of injuries, and mortality rate were determined and analyzed. The rate of injuries and mortality rate of fires per one million population of Tehran were 59.58% and 86.12%, respectively. During the study period, the number of fires and fire stations increased by 104.38% and 102.63%, respectively. Most fires (9.21%) happened in the 4th District of Tehran. The results showed that the recorded fire data have not been systematically planned for fire prevention since one of the ways to reduce injuries caused by fires is to develop a systematic plan for necessary actions in emergency situations. To determine a reliable source for fire prevention, the stages, definitions of working processes and the cause and effect chains should be considered. Therefore, a comprehensive statistical system should be developed for reported and recorded fire data.Keywords: fire statistics, fire analysis, accident prevention, Tehran
Procedia PDF Downloads 18424260 Design and Implementation a Virtualization Platform for Providing Smart Tourism Services
Authors: Nam Don Kim, Jungho Moon, Tae Yun Chung
Abstract:
This paper proposes an Internet of Things (IoT) based virtualization platform for providing smart tourism services. The virtualization platform provides a consistent access interface to various types of data by naming IoT devices and legacy information systems as pathnames in a virtual file system. In the other words, the IoT virtualization platform functions as a middleware which uses the metadata for underlying collected data. The proposed platform makes it easy to provide customized tourism information by using tourist locations collected by IoT devices and additionally enables to create new interactive smart tourism services focused on the tourist locations. The proposed platform is very efficient so that the provided tourism services are isolated from changes in raw data and the services can be modified or expanded without changing the underlying data structure.Keywords: internet of things (IoT), IoT platform, serviceplatform, virtual file system (VSF)
Procedia PDF Downloads 50324259 A Review on 3D Smart City Platforms Using Remotely Sensed Data to Aid Simulation and Urban Analysis
Authors: Slim Namouchi, Bruno Vallet, Imed Riadh Farah
Abstract:
3D urban models provide powerful tools for decision making, urban planning, and smart city services. The accuracy of this 3D based systems is directly related to the quality of these models. Since manual large-scale modeling, such as cities or countries is highly time intensive and very expensive process, a fully automatic 3D building generation is needed. However, 3D modeling process result depends on the input data, the proprieties of the captured objects, and the required characteristics of the reconstructed 3D model. Nowadays, producing 3D real-world model is no longer a problem. Remotely sensed data had experienced a remarkable increase in the recent years, especially data acquired using unmanned aerial vehicles (UAV). While the scanning techniques are developing, the captured data amount and the resolution are getting bigger and more precise. This paper presents a literature review, which aims to identify different methods of automatic 3D buildings extractions either from LiDAR or the combination of LiDAR and satellite or aerial images. Then, we present open source technologies, and data models (e.g., CityGML, PostGIS, Cesiumjs) used to integrate these models in geospatial base layers for smart city services.Keywords: CityGML, LiDAR, remote sensing, SIG, Smart City, 3D urban modeling
Procedia PDF Downloads 135