Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 788

Search results for: LiDAR datasets

758 Analysis of Citation Rate and Data Reuse for Openly Accessible Biodiversity Datasets on Global Biodiversity Information Facility

Authors: Nushrat Khan, Mike Thelwall, Kayvan Kousha

Abstract:

Making research data openly accessible has been mandated by most funders over the last 5 years as it promotes reproducibility in science and reduces duplication of effort to collect the same data. There are evidence that articles that publicly share research data have higher citation rates in biological and social sciences. However, how and whether shared data is being reused is not always intuitive as such information is not easily accessible from the majority of research data repositories. This study aims to understand the practice of data citation and how data is being reused over the years focusing on biodiversity since research data is frequently reused in this field. Metadata of 38,878 datasets including citation counts were collected through the Global Biodiversity Information Facility (GBIF) API for this purpose. GBIF was used as a data source since it provides citation count for datasets, not a commonly available feature for most repositories. Analysis of dataset types, citation counts, creation and update time of datasets suggests that citation rate varies for different types of datasets, where occurrence datasets that have more granular information have higher citation rates than checklist and metadata-only datasets. Another finding is that biodiversity datasets on GBIF are frequently updated, which is unique to this field. Majority of the datasets from the earliest year of 2007 were updated after 11 years, with no dataset that was not updated since creation. For each year between 2007 and 2017, we compared the correlations between update time and citation rate of four different types of datasets. While recent datasets do not show any correlations, 3 to 4 years old datasets show weak correlation where datasets that were updated more recently received high citations. The results are suggestive that it takes several years to cumulate citations for research datasets. However, this investigation found that when searched on Google Scholar or Scopus databases for the same datasets, the number of citations is often not the same as GBIF. Hence future aim is to further explore the citation count system adopted by GBIF to evaluate its reliability and whether it can be applicable to other fields of studies as well.

Keywords: data citation, data reuse, research data sharing, webometrics

Procedia PDF Downloads 146

757 Using 3D Satellite Imagery to Generate a High Precision Canopy Height Model

Authors: M. Varin, A. M. Dubois, R. Gadbois-Langevin, B. Chalghaf

Abstract:

Good knowledge of the physical environment is essential for an integrated forest planning. This information enables better forecasting of operating costs, determination of cutting volumes, and preservation of ecologically sensitive areas. The use of satellite images in stereoscopic pairs gives the capacity to generate high precision 3D models, which are scale-adapted for harvesting operations. These models could represent an alternative to 3D LiDAR data, thanks to their advantageous cost of acquisition. The objective of the study was to assess the quality of stereo-derived canopy height models (CHM) in comparison to a traditional LiDAR CHM and ground tree-height samples. Two study sites harboring two different forest stand types (broadleaf and conifer) were analyzed using stereo pairs and tri-stereo images from the WorldView-3 satellite to calculate CHM. Acquisition of multispectral images from an Unmanned Aerial Vehicle (UAV) was also realized on a smaller part of the broadleaf study site. Different algorithms using two softwares (PCI Geomatica and Correlator3D) with various spatial resolutions and band selections were tested to select the 3D modeling technique, which offered the best performance when compared with LiDAR. In the conifer study site, the CHM produced with Corelator3D using only the 50-cm resolution panchromatic band was the one with the smallest Root-mean-square deviation (RMSE: 1.31 m). In the broadleaf study site, the tri-stereo model provided slightly better performance, with an RMSE of 1.2 m. The tri-stereo model was also compared to the UAV, which resulted in an RMSE of 1.3 m. At individual tree level, when ground samples were compared to satellite, lidar, and UAV CHM, RMSE were 2.8, 2.0, and 2.0 m, respectively. Advanced analysis was done for all of these cases, and it has been noted that RMSE is reduced when the canopy cover is higher when shadow and slopes are lower and when clouds are distant from the analyzed site.

Keywords: very high spatial resolution, satellite imagery, WorlView-3, canopy height models, CHM, LiDAR, unmanned aerial vehicle, UAV

Procedia PDF Downloads 92

756 Geographical Data Visualization Using Video Games Technologies

Authors: Nizar Karim Uribe-Orihuela, Fernando Brambila-Paz, Ivette Caldelas, Rodrigo Montufar-Chaveznava

Abstract:

In this paper, we present the advances corresponding to the implementation of a strategy to visualize geographical data using a Software Development Kit (SDK) for video games. We use multispectral images from Landsat 7 platform and Laser Imaging Detection and Ranging (LIDAR) data from The National Institute of Geography and Statistics of Mexican (INEGI). We select a place of interest to visualize from Landsat platform and make some processing to the image (rotations, atmospheric correction and enhancement). The resulting image will be our gray scale color-map to fusion with the LIDAR data, which was selected using the same coordinates than in Landsat. The LIDAR data is translated to 8-bit raw data. Both images are fused in a software developed using Unity (an SDK employed for video games). The resulting image is then displayed and can be explored moving around. The idea is the software could be used for students of geology and geophysics at the Engineering School of the National University of Mexico. They will download the software and images corresponding to a geological place of interest to a smartphone and could virtually visit and explore the site with a virtual reality visor such as Google cardboard.

Keywords: virtual reality, interactive technologies, geographical data visualization, video games technologies, educational material

Procedia PDF Downloads 217

755 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets

Authors: Najmeh Abedzadeh, Matthew Jacobs

Abstract:

An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.

Keywords: IDS, imbalanced datasets, sampling algorithms, big data

Procedia PDF Downloads 277

754 High Resolution Satellite Imagery and Lidar Data for Object-Based Tree Species Classification in Quebec, Canada

Authors: Bilel Chalghaf, Mathieu Varin

Abstract:

Forest characterization in Quebec, Canada, is usually assessed based on photo-interpretation at the stand level. For species identification, this often results in a lack of precision. Very high spatial resolution imagery, such as DigitalGlobe, and Light Detection and Ranging (LiDAR), have the potential to overcome the limitations of aerial imagery. To date, few studies have used that data to map a large number of species at the tree level using machine learning techniques. The main objective of this study is to map 11 individual high tree species ( > 17m) at the tree level using an object-based approach in the broadleaf forest of Kenauk Nature, Quebec. For the individual tree crown segmentation, three canopy-height models (CHMs) from LiDAR data were assessed: 1) the original, 2) a filtered, and 3) a corrected model. The corrected CHM gave the best accuracy and was then coupled with imagery to refine tree species crown identification. When compared with photo-interpretation, 90% of the objects represented a single species. For modeling, 313 variables were derived from 16-band WorldView-3 imagery and LiDAR data, using radiance, reflectance, pixel, and object-based calculation techniques. Variable selection procedures were employed to reduce their number from 313 to 16, using only 11 bands to aid reproducibility. For classification, a global approach using all 11 species was compared to a semi-hierarchical hybrid classification approach at two levels: (1) tree type (broadleaf/conifer) and (2) individual broadleaf (five) and conifer (six) species. Five different model techniques were used: (1) support vector machine (SVM), (2) classification and regression tree (CART), (3) random forest (RF), (4) k-nearest neighbors (k-NN), and (5) linear discriminant analysis (LDA). Each model was tuned separately for all approaches and levels. For the global approach, the best model was the SVM using eight variables (overall accuracy (OA): 80%, Kappa: 0.77). With the semi-hierarchical hybrid approach, at the tree type level, the best model was the k-NN using six variables (OA: 100% and Kappa: 1.00). At the level of identifying broadleaf and conifer species, the best model was the SVM, with OA of 80% and 97% and Kappa values of 0.74 and 0.97, respectively, using seven variables for both models. This paper demonstrates that a hybrid classification approach gives better results and that using 16-band WorldView-3 with LiDAR data leads to more precise predictions for tree segmentation and classification, especially when the number of tree species is large.

Keywords: tree species, object-based, classification, multispectral, machine learning, WorldView-3, LiDAR

Procedia PDF Downloads 106

753 Multi Object Tracking for Predictive Collision Avoidance

Authors: Bruk Gebregziabher

Abstract:

The safe and efficient operation of Autonomous Mobile Robots (AMRs) in complex environments, such as manufacturing, logistics, and agriculture, necessitates accurate multiobject tracking and predictive collision avoidance. This paper presents algorithms and techniques for addressing these challenges using Lidar sensor data, emphasizing ensemble Kalman filter. The developed predictive collision avoidance algorithm employs the data provided by lidar sensors to track multiple objects and predict their velocities and future positions, enabling the AMR to navigate safely and effectively. A modification to the dynamic windowing approach is introduced to enhance the performance of the collision avoidance system. The overall system architecture encompasses object detection, multi-object tracking, and predictive collision avoidance control. The experimental results, obtained from both simulation and real-world data, demonstrate the effectiveness of the proposed methods in various scenarios, which lays the foundation for future research on global planners, other controllers, and the integration of additional sensors. This thesis contributes to the ongoing development of safe and efficient autonomous systems in complex and dynamic environments.

Keywords: autonomous mobile robots, multi-object tracking, predictive collision avoidance, ensemble Kalman filter, lidar sensors

Procedia PDF Downloads 55

752 Heterogeneous-Resolution and Multi-Source Terrain Builder for CesiumJS WebGL Virtual Globe

Authors: Umberto Di Staso, Marco Soave, Alessio Giori, Federico Prandi, Raffaele De Amicis

Abstract:

The increasing availability of information about earth surface elevation (Digital Elevation Models DEM) generated from different sources (remote sensing, Aerial Images, Lidar) poses the question about how to integrate and make available to the most than possible audience this huge amount of data. In order to exploit the potential of 3D elevation representation the quality of data management plays a fundamental role. Due to the high acquisition costs and the huge amount of generated data, highresolution terrain surveys tend to be small or medium sized and available on limited portion of earth. Here comes the need to merge large-scale height maps that typically are made available for free at worldwide level, with very specific high resolute datasets. One the other hand, the third dimension increases the user experience and the data representation quality, unlocking new possibilities in data analysis for civil protection, real estate, urban planning, environment monitoring, etc. The open-source 3D virtual globes, which are trending topics in Geovisual Analytics, aim at improving the visualization of geographical data provided by standard web services or with proprietary formats. Typically, 3D Virtual globes like do not offer an open-source tool that allows the generation of a terrain elevation data structure starting from heterogeneous-resolution terrain datasets. This paper describes a technological solution aimed to set up a so-called “Terrain Builder”. This tool is able to merge heterogeneous-resolution datasets, and to provide a multi-resolution worldwide terrain services fully compatible with CesiumJS and therefore accessible via web using traditional browser without any additional plug-in.

Keywords: Terrain Builder, WebGL, Virtual Globe, CesiumJS, Tiled Map Service, TMS, Height-Map, Regular Grid, Geovisual Analytics, DTM

Procedia PDF Downloads 397

751 A Review on 3D Smart City Platforms Using Remotely Sensed Data to Aid Simulation and Urban Analysis

Authors: Slim Namouchi, Bruno Vallet, Imed Riadh Farah

Abstract:

3D urban models provide powerful tools for decision making, urban planning, and smart city services. The accuracy of this 3D based systems is directly related to the quality of these models. Since manual large-scale modeling, such as cities or countries is highly time intensive and very expensive process, a fully automatic 3D building generation is needed. However, 3D modeling process result depends on the input data, the proprieties of the captured objects, and the required characteristics of the reconstructed 3D model. Nowadays, producing 3D real-world model is no longer a problem. Remotely sensed data had experienced a remarkable increase in the recent years, especially data acquired using unmanned aerial vehicles (UAV). While the scanning techniques are developing, the captured data amount and the resolution are getting bigger and more precise. This paper presents a literature review, which aims to identify different methods of automatic 3D buildings extractions either from LiDAR or the combination of LiDAR and satellite or aerial images. Then, we present open source technologies, and data models (e.g., CityGML, PostGIS, Cesiumjs) used to integrate these models in geospatial base layers for smart city services.

Keywords: CityGML, LiDAR, remote sensing, SIG, Smart City, 3D urban modeling

Procedia PDF Downloads 100

750 Passively Q-Switched 914 nm Microchip Laser for LIDAR Systems

Authors: Marco Naegele, Klaus Stoppel, Thomas Dekorsy

Abstract:

Passively Q-switched microchip lasers enable the great potential for sophisticated LiDAR systems due to their compact overall system design, excellent beam quality, and scalable pulse energies. However, many near-infrared solid-state lasers show emitting wavelengths > 1000 nm, which are not compatible with state-of-the-art silicon detectors. Here we demonstrate a passively Q-switched microchip laser operating at 914 nm. The microchip laser consists of a 3 mm long Nd:YVO₄ crystal as a gain medium, while Cr⁴⁺:YAG with an initial transmission of 98% is used as a saturable absorber. Quasi-continuous pumping enables single pulse operation, and low duty cycles ensure low overall heat generation and power consumption. Thus, thermally induced instabilities are minimized, and operation without active cooling is possible while ambient temperature changes are compensated by adjustment of the pump laser current only. Single-emitter diode pumping at 808 nm leads to a compact overall system design and robust setup. Utilization of a microchip cavity approach ensures single-longitudinal mode operation with spectral bandwidths in the picometer regime and results in short laser pulses with pulse durations below 10 ns. Beam quality measurements reveal an almost diffraction-limited beam and enable conclusions concerning the thermal lens, which is essential to stabilize the plane-plane resonator. A 7% output coupler transmissivity is used to generate pulses with energies in the microjoule regime and peak powers of more than 600 W. Long-term pulse duration, pulse energy, central wavelength, and spectral bandwidth measurements emphasize the excellent system stability and facilitate the utilization of this laser in the context of a LiDAR system.

Keywords: diode-pumping, LiDAR system, microchip laser, Nd:YVO4 laser, passively Q-switched

Procedia PDF Downloads 104

749 Airborne CO₂ Lidar Measurements for Atmospheric Carbon and Transport: America (ACT-America) Project and Active Sensing of CO₂ Emissions over Nights, Days, and Seasons 2017-2018 Field Campaigns

Authors: Joel F. Campbell, Bing Lin, Michael Obland, Susan Kooi, Tai-Fang Fan, Byron Meadows, Edward Browell, Wayne Erxleben, Doug McGregor, Jeremy Dobler, Sandip Pal, Christopher O'Dell, Ken Davis

Abstract:

The Active Sensing of CO₂ Emissions over Nights, Days, and Seasons (ASCENDS) CarbonHawk Experiment Simulator (ACES) is a NASA Langley Research Center instrument funded by NASA’s Science Mission Directorate that seeks to advance technologies critical to measuring atmospheric column carbon dioxide (CO₂ ) mixing ratios in support of the NASA ASCENDS mission. The ACES instrument, an Intensity-Modulated Continuous-Wave (IM-CW) lidar, was designed for high-altitude aircraft operations and can be directly applied to space instrumentation to meet the ASCENDS mission requirements. The ACES design demonstrates advanced technologies critical for developing an airborne simulator and spaceborne instrument with lower platform consumption of size, mass, and power, and with improved performance. The Atmospheric Carbon and Transport – America (ACT-America) is an Earth Venture Suborbital -2 (EVS-2) mission sponsored by the Earth Science Division of NASA’s Science Mission Directorate. A major objective is to enhance knowledge of the sources/sinks and transport of atmospheric CO₂ through the application of remote and in situ airborne measurements of CO₂ and other atmospheric properties on spatial and temporal scales. ACT-America consists of five campaigns to measure regional carbon and evaluate transport under various meteorological conditions in three regional areas of the Continental United States. Regional CO₂ distributions of the lower atmosphere were observed from the C-130 aircraft by the Harris Corp. Multi-Frequency Fiber Laser Lidar (MFLL) and the ACES lidar. The airborne lidars provide unique data that complement the more traditional in situ sensors. This presentation shows the applications of CO₂ lidars in support of these science needs.

Keywords: CO₂ measurement, IMCW, CW lidar, laser spectroscopy

Procedia PDF Downloads 132

748 3D Classification Optimization of Low-Density Airborne Light Detection and Ranging Point Cloud by Parameters Selection

Authors: Baha Eddine Aissou, Aichouche Belhadj Aissa

Abstract:

Light detection and ranging (LiDAR) is an active remote sensing technology used for several applications. Airborne LiDAR is becoming an important technology for the acquisition of a highly accurate dense point cloud. A classification of airborne laser scanning (ALS) point cloud is a very important task that still remains a real challenge for many scientists. Support vector machine (SVM) is one of the most used statistical learning algorithms based on kernels. SVM is a non-parametric method, and it is recommended to be used in cases where the data distribution cannot be well modeled by a standard parametric probability density function. Using a kernel, it performs a robust non-linear classification of samples. Often, the data are rarely linearly separable. SVMs are able to map the data into a higher-dimensional space to become linearly separable, which allows performing all the computations in the original space. This is one of the main reasons that SVMs are well suited for high-dimensional classification problems. Only a few training samples, called support vectors, are required. SVM has also shown its potential to cope with uncertainty in data caused by noise and fluctuation, and it is computationally efficient as compared to several other methods. Such properties are particularly suited for remote sensing classification problems and explain their recent adoption. In this poster, the SVM classification of ALS LiDAR data is proposed. Firstly, connected component analysis is applied for clustering the point cloud. Secondly, the resulting clusters are incorporated in the SVM classifier. Radial basic function (RFB) kernel is used due to the few numbers of parameters (C and γ) that needs to be chosen, which decreases the computation time. In order to optimize the classification rates, the parameters selection is explored. It consists to find the parameters (C and γ) leading to the best overall accuracy using grid search and 5-fold cross-validation. The exploited LiDAR point cloud is provided by the German Society for Photogrammetry, Remote Sensing, and Geoinformation. The ALS data used is characterized by a low density (4-6 points/m²) and is covering an urban area located in residential parts of the city Vaihingen in southern Germany. The class ground and three other classes belonging to roof superstructures are considered, i.e., a total of 4 classes. The training and test sets are selected randomly several times. The obtained results demonstrated that a parameters selection can orient the selection in a restricted interval of (C and γ) that can be further explored but does not systematically lead to the optimal rates. The SVM classifier with hyper-parameters is compared with the most used classifiers in literature for LiDAR data, random forest, AdaBoost, and decision tree. The comparison showed the superiority of the SVM classifier using parameters selection for LiDAR data compared to other classifiers.

Keywords: classification, airborne LiDAR, parameters selection, support vector machine

Procedia PDF Downloads 122

747 Fusion of MOLA-based DEMs and HiRISE Images for Large-Scale Mars Mapping

Authors: Ahmed F. Elaksher, Islam Omar

Abstract:

In this project, we used MOLA-based DEMs to orthorectify HiRISE optical images. The MOLA data was interpolated using the kriging interpolation technique. Corresponding tie points were then digitized from both datasets. These points were employed in co-registering both datasets using GIS analysis tools. Different transformation models, including the affine and projective transformation models, were used with different sets and distributions of tie points. Additionally, we evaluated the use of the MOLA elevations in co-registering the MOLA and HiRISE datasets. The planimetric RMSEs achieved for each model are reported. Results suggested the use of 3D-2D transformation models.

Keywords: photogrammetry, Mars, MOLA, HiRISE

Procedia PDF Downloads 43

746 Hazardous Vegetation Detection in Right-Of-Way Power Transmission Lines in Brazil Using Unmanned Aerial Vehicle and Light Detection and Ranging

Authors: Mauricio George Miguel Jardini, Jose Antonio Jardini

Abstract:

Transmission power utilities participate with kilometers of circuits, many with particularities in terms of vegetation growth. To control these rights-of-way, maintenance teams perform ground, and air inspections, and the identification method is subjective (indirect). On a ground inspection, when identifying an irregularity, for example, high vegetation threatening contact with the conductor cable, pruning or suppression is performed immediately. In an aerial inspection, the suppression team is mobilized to the identified point. This work investigates the use of 3D modeling of a transmission line segment using RGB (red, blue, and green) images and LiDAR (Light Detection and Ranging) sensor data. Both sensors are coupled to unmanned aerial vehicle. The goal is the accurate and timely detection of vegetation along the right-of-way that can cause shutdowns.

Keywords: 3D modeling, LiDAR, right-of-way, transmission lines, vegetation

Procedia PDF Downloads 106

745 Shallow Water Lidar System in Measuring Erosion Rate of Coarse-Grained Materials

Authors: Ghada S. Ellithy, John. W. Murphy, Maureen K. Corcoran

Abstract:

Erosion rate of soils during a levee or dam overtopping event is a major component in risk assessment evaluation of breach time and downstream consequences. The mechanism and evolution of dam or levee breach caused by overtopping erosion is a complicated process and difficult to measure during overflow due to accessibility and quickly changing conditions. In this paper, the results of a flume erosion tests are presented and discussed. The tests are conducted on a coarse-grained material with a median grain size D50 of 5 mm in a 1-m (3-ft) wide flume under varying flow rates. Each test is performed by compacting the soil mix r to its near optimum moisture and dry density as determined from standard Proctor test in a box embedded in the flume floor. The box measures 0.45 m wide x 1.2 m long x 0.25 m deep. The material is tested several times at varying hydraulic loading to determine the erosion rate after equal time intervals. The water depth, velocity are measured at each hydraulic loading, and the acting bed shear is calculated. A shallow water lidar (SWL) system was utilized to record the progress of soil erodibility and water depth along the scanned profiles of the tested box. SWL is a non-contact system that transmits laser pulses from above the water and records the time-delay between top and bottom reflections. Results from the SWL scans are compared with before and after manual measurements to determine the erosion rate of the soil mix and other erosion parameters.

Keywords: coarse-grained materials, erosion rate, LIDAR system, soil erosion

Procedia PDF Downloads 89

744 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 232

743 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 101

742 Fine Grained Action Recognition of Skateboarding Tricks

Authors: Frederik Calsius, Mirela Popa, Alexia Briassouli

Abstract:

In the field of machine learning, it is common practice to use benchmark datasets to prove the working of a method. The domain of action recognition in videos often uses datasets like Kinet-ics, Something-Something, UCF-101 and HMDB-51 to report results. Considering the properties of the datasets, there are no datasets that focus solely on very short clips (2 to 3 seconds), and on highly-similar fine-grained actions within one specific domain. This paper researches how current state-of-the-art action recognition methods perform on a dataset that consists of highly similar, fine-grained actions. To do so, a dataset of skateboarding tricks was created. The performed analysis highlights both benefits and limitations of state-of-the-art methods, while proposing future research directions in the activity recognition domain. The conducted research shows that the best results are obtained by fusing RGB data with OpenPose data for the Temporal Shift Module.

Keywords: activity recognition, fused deep representations, fine-grained dataset, temporal modeling

Procedia PDF Downloads 199

741 Railway Ballast Volumes Automated Estimation Based on LiDAR Data

Authors: Bahar Salavati Vie Le Sage, Ismaïl Ben Hariz, Flavien Viguier, Sirine Noura Kahil, Audrey Jacquin, Maxime Convert

Abstract:

The ballast layer plays a key role in railroad maintenance and the geometry of the track structure. Ballast also holds the track in place as the trains roll over it. Track ballast is packed between the sleepers and on the sides of railway tracks. An imbalance in ballast volume on the tracks can lead to safety issues as well as a quick degradation of the overall quality of the railway segment. If there is a lack of ballast in the track bed during the summer, there is a risk that the rails will expand and buckle slightly due to the high temperatures. Furthermore, the knowledge of the ballast quantities that will be excavated during renewal works is important for efficient ballast management. The volume of excavated ballast per meter of track can be calculated based on excavation depth, excavation width, volume of track skeleton (sleeper and rail) and sleeper spacing. Since 2012, SNCF has been collecting 3D points cloud data covering its entire railway network by using 3D laser scanning technology (LiDAR). This vast amount of data represents a modelization of the entire railway infrastructure, allowing to conduct various simulations for maintenance purposes. This paper aims to present an automated method for ballast volume estimation based on the processing of LiDAR data. The estimation of abnormal volumes in ballast on the tracks is performed by analyzing the cross-section of the track. Further, since the amount of ballast required varies depending on the track configuration, the knowledge of the ballast profile is required. Prior to track rehabilitation, excess ballast is often present in the ballast shoulders. Based on 3D laser scans, a Digital Terrain Model (DTM) was generated and automatic extraction of the ballast profiles from this data is carried out. The surplus in ballast is then estimated by performing a comparison between this ballast profile obtained empirically, and a geometric modelization of the theoretical ballast profile thresholds as dictated by maintenance standards. Ideally, this excess should be removed prior to renewal works and recycled to optimize the output of the ballast renewal machine. Based on these parameters, an application has been developed to allow the automatic measurement of ballast profiles. We evaluated the method on a 108 kilometers segment of railroad LiDAR scans, and the results show that the proposed algorithm detects ballast surplus that amounts to values close to the total quantities of spoil ballast excavated.

Keywords: ballast, railroad, LiDAR , cloud point, track ballast, 3D point

Procedia PDF Downloads 65

740 Study of Storms on the Javits Center Green Roof

Authors: Alexander Cho, Harsho Sanyal, Joseph Cataldo

Abstract:

A quantitative analysis of the different variables on both the South and North green roofs of the Jacob K. Javits Convention Center was taken to find mathematical relationships between net radiation and evapotranspiration (ET), average outside temperature, and the lysimeter weight. Groups of datasets were analyzed, and the relationships were plotted on linear and semi-log graphs to find consistent relationships. Antecedent conditions for each rainstorm were also recorded and plotted against the volumetric water difference within the lysimeter. The first relation was the inverse parabolic relationship between the lysimeter weight and the net radiation and ET. The peaks and valleys of the lysimeter weight corresponded to valleys and peaks in the net radiation and ET respectively, with the 8/22/15 and 1/22/16 datasets showing this trend. The U-shaped and inverse U-shaped plots of the two variables coincided, indicating an inverse relationship between the two variables. Cross variable relationships were examined through graphs with lysimeter weight as the dependent variable on the y-axis. 10 out of 16 of the plots of lysimeter weight vs. outside temperature plots had R² values > 0.9. Antecedent conditions were also recorded for rainstorms, categorized by the amount of precipitation accumulating during the storm. Plotted against the change in the volumetric water weight difference within the lysimeter, a logarithmic regression was found with large R² values. The datasets were compared using the Mann Whitney U-test to see if the datasets were statistically different, using a significance level of 5%; all datasets compared showed a U test statistic value, proving the null hypothesis of the datasets being different from being true.

Keywords: green roof, green infrastructure, Javits Center, evapotranspiration, net radiation, lysimeter

Procedia PDF Downloads 80

739 Recommender Systems for Technology Enhanced Learning (TEL)

Authors: Hailah Alballaa, Azeddine Chikh

Abstract:

Several challenges impede the adoption of Recommender Systems for Technology Enhanced Learning (TEL): to collect and identify possible datasets; to select between different recommender approaches; to evaluate their performances. The aim is of this paper is twofold: First, it aims to introduce a survey on the most significant work in this area. Second, it aims at identifying possible research directions.

Keywords: datasets, content-based filtering, recommender systems, TEL

Procedia PDF Downloads 216

738 Wind Velocity Mitigation for Conceptual Design: A Spatial Decision (Support Framework)

Authors: Mohamed Khallaf, Hossein M Rizeei

Abstract:

Simulating wind pattern behavior over proposed urban features is critical in the early stage of the conceptual design of both architectural and urban disciplines. However, it is typically not possible for designers to explore the impact of wind flow profiles across new urban developments due to a lack of real data and inaccurate estimation of building parameters. Modeling the details of existing and proposed urban features and testing them against wind flows is the missing part of the conceptual design puzzle where architectural and urban discipline can focus. This research aims to develop a spatial decision-support design method utilizing LiDAR, GIS, and performance-based wind simulation technology to mitigate wind-related hazards on a design by simulating alternative design scenarios at the pedestrian level prior to its implementation in Sydney, Australia. The result of the experiment demonstrates the capability of the proposed framework to improve pedestrian comfort in relation to wind profile.

Keywords: spatial decision-support design, performance-based wind simulation, LiDAR, GIS

Procedia PDF Downloads 88

737 Current Methods for Drug Property Prediction in the Real World

Authors: Jacob Green, Cecilia Cabrera, Maximilian Jakobs, Andrea Dimitracopoulos, Mark van der Wilk, Ryan Greenhalgh

Abstract:

Predicting drug properties is key in drug discovery to enable de-risking of assets before expensive clinical trials and to find highly active compounds faster. Interest from the machine learning community has led to the release of a variety of benchmark datasets and proposed methods. However, it remains unclear for practitioners which method or approach is most suitable, as different papers benchmark on different datasets and methods, leading to varying conclusions that are not easily compared. Our large-scale empirical study links together numerous earlier works on different datasets and methods, thus offering a comprehensive overview of the existing property classes, datasets, and their interactions with different methods. We emphasise the importance of uncertainty quantification and the time and, therefore, cost of applying these methods in the drug development decision-making cycle. To the best of the author's knowledge, it has been observed that the optimal approach varies depending on the dataset and that engineered features with classical machine learning methods often outperform deep learning. Specifically, QSAR datasets are typically best analysed with classical methods such as Gaussian Processes, while ADMET datasets are sometimes better described by Trees or deep learning methods such as Graph Neural Networks or language models. Our work highlights that practitioners do not yet have a straightforward, black-box procedure to rely on and sets a precedent for creating practitioner-relevant benchmarks. Deep learning approaches must be proven on these benchmarks to become the practical method of choice in drug property prediction.

Keywords: activity (QSAR), ADMET, classical methods, drug property prediction, empirical study, machine learning

Procedia PDF Downloads 46

736 Terrestrial Laser Scans to Assess Aerial LiDAR Data

Authors: J. F. Reinoso-Gordo, F. J. Ariza-López, A. Mozas-Calvache, J. L. García-Balboa, S. Eddargani

Abstract:

The DEMs quality may depend on several factors such as data source, capture method, processing type used to derive them, or the cell size of the DEM. The two most important capture methods to produce regional-sized DEMs are photogrammetry and LiDAR; DEMs covering entire countries have been obtained with these methods. The quality of these DEMs has traditionally been evaluated by the national cartographic agencies through punctual sampling that focused on its vertical component. For this type of evaluation there are standards such as NMAS and ASPRS Positional Accuracy Standards for Digital Geospatial Data. However, it seems more appropriate to carry out this evaluation by means of a method that takes into account the superficial nature of the DEM and, therefore, its sampling is superficial and not punctual. This work is part of the Research Project "Functional Quality of Digital Elevation Models in Engineering" where it is necessary to control the quality of a DEM whose data source is an experimental LiDAR flight with a density of 14 points per square meter to which we call Point Cloud Product (PCpro). In the present work it is described the capture data on the ground and the postprocessing tasks until getting the point cloud that will be used as reference (PCref) to evaluate the PCpro quality. Each PCref consists of a patch 50x50 m size coming from a registration of 4 different scan stations. The area studied was the Spanish region of Navarra that covers an area of 10,391 km2; 30 patches homogeneously distributed were necessary to sample the entire surface. The patches have been captured using a Leica BLK360 terrestrial laser scanner mounted on a pole that reached heights of up to 7 meters; the position of the scanner was inverted so that the characteristic shadow circle does not exist when the scanner is in direct position. To ensure that the accuracy of the PCref is greater than that of the PCpro, the georeferencing of the PCref has been carried out with real-time GNSS, and its accuracy positioning was better than 4 cm; this accuracy is much better than the altimetric mean square error estimated for the PCpro (<15 cm); The kind of DEM of interest is the corresponding to the bare earth, so that it was necessary to apply a filter to eliminate vegetation and auxiliary elements such as poles, tripods, etc. After the postprocessing tasks the PCref is ready to be compared with the PCpro using different techniques: cloud to cloud or after a resampling process DEM to DEM.

Keywords: data quality, DEM, LiDAR, terrestrial laser scanner, accuracy

Procedia PDF Downloads 73

735 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 635

734 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 281

733 Scalable Learning of Tree-Based Models on Sparsely Representable Data

Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou

Abstract:

Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.

Keywords: big data, sparsely representable data, tree-based models, scalable learning

Procedia PDF Downloads 236

732 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence

Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno

Abstract:

Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.

Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index

Procedia PDF Downloads 138

731 Detecting Hate Speech And Cyberbullying Using Natural Language Processing

Authors: Nádia Pereira, Paula Ferreira, Sofia Francisco, Sofia Oliveira, Sidclay Souza, Paula Paulino, Ana Margarida Veiga Simão

Abstract:

Social media has progressed into a platform for hate speech among its users, and thus, there is an increasing need to develop automatic detection classifiers of offense and conflicts to help decrease the prevalence of such incidents. Online communication can be used to intentionally harm someone, which is why such classifiers could be essential in social networks. A possible application of these classifiers is the automatic detection of cyberbullying. Even though identifying the aggressive language used in online interactions could be important to build cyberbullying datasets, there are other criteria that must be considered. Being able to capture the language, which is indicative of the intent to harm others in a specific context of online interaction is fundamental. Offense and hate speech may be the foundation of online conflicts, which have become commonly used in social media and are an emergent research focus in machine learning and natural language processing. This study presents two Portuguese language offense-related datasets which serve as examples for future research and extend the study of the topic. The first is similar to other offense detection related datasets and is entitled Aggressiveness dataset. The second is a novelty because of the use of the history of the interaction between users and is entitled the Conflicts/Attacks dataset. Both datasets were developed in different phases. Firstly, we performed a content analysis of verbal aggression witnessed by adolescents in situations of cyberbullying. Secondly, we computed frequency analyses from the previous phase to gather lexical and linguistic cues used to identify potentially aggressive conflicts and attacks which were posted on Twitter. Thirdly, thorough annotation of real tweets was performed byindependent postgraduate educational psychologists with experience in cyberbullying research. Lastly, we benchmarked these datasets with other machine learning classifiers.

Keywords: aggression, classifiers, cyberbullying, datasets, hate speech, machine learning

Procedia PDF Downloads 195

730 A Comparative Evaluation of the SIR and SEIZ Epidemiological Models to Describe the Diffusion Characteristics of COVID-19 Polarizing Viewpoints on Online

Authors: Maryam Maleki, Esther Mead, Mohammad Arani, Nitin Agarwal

Abstract:

This study is conducted to examine how opposing viewpoints related to COVID-19 were diffused on Twitter. To accomplish this, six datasets using two epidemiological models, SIR (Susceptible, Infected, Recovered) and SEIZ (Susceptible, Exposed, Infected, Skeptics), were analyzed. The six datasets were chosen because they represent opposing viewpoints on the COVID-19 pandemic. Three of the datasets contain anti-subject hashtags, while the other three contain pro-subject hashtags. The time frame for all datasets is three years, starting from January 2020 to December 2022. The findings revealed that while both models were effective in evaluating the propagation trends of these polarizing viewpoints, the SEIZ model was more accurate with a relatively lower error rate (6.7%) compared to the SIR model (17.3%). Additionally, the relative error for both models was lower for anti-subject hashtags compared to pro-subject hashtags. By leveraging epidemiological models, insights into the propagation trends of polarizing viewpoints on Twitter were gained. This study paves the way for the development of methods to prevent the spread of ideas that lack scientific evidence while promoting the dissemination of scientifically backed ideas.

Keywords: mathematical modeling, epidemiological model, seiz model, sir model, covid-19, twitter, social network analysis, social contagion

Procedia PDF Downloads 30

729 Origins: An Interpretive History of MMA Design Studio’s Exhibition for the 2023 Venice Biennale

Authors: Jonathan A. Noble

Abstract:

‘Origins’ is an exhibition designed and installed by MMA Design Studio, at the 2023 Venice Biennale. The instillation formed part of the ‘Dangerous Liaisons’ group exhibition at the Arsenale building. An immersive experience was created for those who visited, where video projection and the bodies of visitors interacted with the scene. Designed by South African architect, Mphethi Morojele – founder and owner of MMA – the primary inspiration for ‘Origins’ was the recent discovery by Professor Karim Sadr in 2019, of a substantial Tswana settlement. Situated in present day Suikerbosrand Nature Reserve, some 45km south of Johannesburg, this precolonial city named Kweneng, has been dated back to the fifteenth century. This remarkable discovery was achieved thanks to advanced aerial, LiDAR scanning technology, which was used to capture the traces of Kweneng, spanning a terrain of some 10km long and 2km wide. Discovered by light (LiDAR) and exhibited through light, Origins presents a simulated experience of Kweneng. The presentation of Kweneng was achieved primarily though video, with a circular projection onto the floor of an animated LiDAR data sequence, and onto the walls a filmed dance sequence choreographed to embody the architectural, spatial and symbolic significance of Kweneng. This paper documents the design process that was involved in the conceptualization, development and final realization of this noteworthy exhibition, with an elucidation upon key social and cultural questions pertaining to precolonial heritage, reimagined histories and postcolonial identity. Periods of change and of social awakening sometimes spark an interest in questions of origin, of cultural lineage and belonging – and which certainly is the case for contemporary, post-Apartheid South Africa. Researching this paper has required primary study of MMA Design Studio’s project archive, including various proposals and other design related documents, conceptual design sketches, architectural drawings and photographs. This material is supported by the authors first-hand interviews with Morejele and others who were involved, especially with respect to the choreography of the interpretive dance, LiDAR visualization techniques and video production that informed the simulated, immersive experience at the exhibition. Presenting a ‘dangerous liaison’ between architecture and dance, Origins looks into the distant past to frame contemporary questions pertaining to intangible heritage, animism and embodiment through architecture and dance – considerations which are required “to survive the future”, says Morojele.

Keywords: architecture and dance, Kweneng, MMA design studio, origins, Venice Biennale

Procedia PDF Downloads 47