Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25471

Search results for: data retrieval

25291 Arabic Quran Search Tool Based on Ontology

Authors: Mohammad Alqahtani, Eric Atwell

Abstract:

This paper reviews and classifies most of the important types of search techniques that have been applied on the holy Quran. Then, it addresses the limitations in these techniques. Additionally, this paper surveys most existing Quranic ontologies and what are their deficiencies. Finally, it explains a new search tool called: A semantic search tool for Al Quran based on Qur’anic ontologies. This tool will overcome all limitations in the existing Quranic search applications.

Keywords: holy Quran, natural language processing (NLP), semantic search, information retrieval (IR), ontology

Procedia PDF Downloads 574

25290 Low Cost LiDAR-GNSS-UAV Technology Development for PT Garam’s Three Dimensional Stockpile Modeling Needs

Authors: Mohkammad Nur Cahyadi, Imam Wahyu Farid, Ronny Mardianto, Agung Budi Cahyono, Eko Yuli Handoko, Daud Wahyu Imani, Arizal Bawazir, Luki Adi Triawan

Abstract:

Unmanned aerial vehicle (UAV) technology has cost efficiency and data retrieval time advantages. Using technologies such as UAV, GNSS, and LiDAR will later be combined into one of the newest technologies to cover each other's deficiencies. This integration system aims to increase the accuracy of calculating the volume of the land stockpile of PT. Garam (Salt Company). The use of UAV applications to obtain geometric data and capture textures that characterize the structure of objects. This study uses the Taror 650 Iron Man drone with four propellers, which can fly for 15 minutes. LiDAR can classify based on the number of image acquisitions processed in the software, utilizing photogrammetry and structural science principles from Motion point cloud technology. LiDAR can perform data acquisition that enables the creation of point clouds, three-dimensional models, Digital Surface Models, Contours, and orthomosaics with high accuracy. LiDAR has a drawback in the form of coordinate data positions that have local references. Therefore, researchers use GNSS, LiDAR, and drone multi-sensor technology to map the stockpile of salt on open land and warehouses every year, carried out by PT. Garam twice, where the previous process used terrestrial methods and manual calculations with sacks. Research with LiDAR needs to be combined with UAV to overcome data acquisition limitations because it only passes through the right and left sides of the object, mainly when applied to a salt stockpile. The UAV is flown to assist data acquisition with a wide coverage with the help of integration of the 200-gram LiDAR system so that the flying angle taken can be optimal during the flight process. Using LiDAR for low-cost mapping surveys will make it easier for surveyors and academics to obtain pretty accurate data at a more economical price. As a survey tool, LiDAR is included in a tool with a low price, around 999 USD; this device can produce detailed data. Therefore, to minimize the operational costs of using LiDAR, surveyors can use Low-Cost LiDAR, GNSS, and UAV at a price of around 638 USD. The data generated by this sensor is in the form of a visualization of an object shape made in three dimensions. This study aims to combine Low-Cost GPS measurements with Low-Cost LiDAR, which are processed using free user software. GPS Low Cost generates data in the form of position-determining latitude and longitude coordinates. The data generates X, Y, and Z values to help georeferencing process the detected object. This research will also produce LiDAR, which can detect objects, including the height of the entire environment in that location. The results of the data obtained are calibrated with pitch, roll, and yaw to get the vertical height of the existing contours. This study conducted an experimental process on the roof of a building with a radius of approximately 30 meters.

Keywords: LiDAR, unmanned aerial vehicle, low-cost GNSS, contour

Procedia PDF Downloads 100

25289 Towards Improved Public Information on Industrial Emissions in Italy: Concepts and Specific Issues Associated to the Italian Experience in IPPC Permit Licensing

Authors: C. Mazziotti Gomez de Teran, D. Fiore, B. Cola, A. Fardelli

Abstract:

The present paper summarizes the analysis of the request for consultation of information and data on industrial emissions made publicly available on the web site of the Ministry of Environment, Land and Sea on integrated pollution prevention and control from large industrial installations, the so called “AIA Portal”. However, since also local Competent Authorities have been organizing their own web sites on IPPC permits releasing procedures for public consultation purposes, as a result, a huge amount of information on national industrial plants is already available on internet, although it is usually proposed as textual documentation or images. Thus, it is not possible to access all the relevant information through interoperability systems and also to retrieval relevant information for decision making purposes as well as rising of awareness on environmental issue. Moreover, since in Italy the number of institutional and private subjects involved in the management of the public information on industrial emissions is substantial, the access to the information is provided on internet web sites according to different criteria; thus, at present it is not structurally homogeneous and comparable. To overcome the mentioned difficulties in the case of the Coordinating Committee for the implementation of the Agreement for the industrial area in Taranto and Statte, operating before the IPPC permit granting procedures of the relevant installation located in the area, a big effort was devoted to elaborate and to validate data and information on characterization of soil, ground water aquifer and coastal sea at disposal of different subjects to derive a global perspective for decision making purposes. Thus, the present paper also focuses on main outcomes matured during such experience.

Keywords: public information, emissions into atmosphere, IPPC permits, territorial information systems

Procedia PDF Downloads 290

25288 A Method of the Semantic on Image Auto-Annotation

Authors: Lin Huo, Xianwei Liu, Jingxiong Zhou

Abstract:

Recently, due to the existence of semantic gap between image visual features and human concepts, the semantic of image auto-annotation has become an important topic. Firstly, by extract low-level visual features of the image, and the corresponding Hash method, mapping the feature into the corresponding Hash coding, eventually, transformed that into a group of binary string and store it, image auto-annotation by search is a popular method, we can use it to design and implement a method of image semantic auto-annotation. Finally, Through the test based on the Corel image set, and the results show that, this method is effective.

Keywords: image auto-annotation, color correlograms, Hash code, image retrieval

Procedia PDF Downloads 500

25287 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 145

25286 Using Machine Learning Techniques to Extract Useful Information from Dark Data

Authors: Nigar Hussain

Abstract:

It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.

Keywords: big data, dark data, machine learning, heatmap, random forest

Procedia PDF Downloads 35

25285 Prevalence of Visual Impairment among School Children in Ethiopia: A Systematic Review and Meta-Analysis

Authors: Merkineh Markos Lorato, Gedefaw Diress Alene

Abstract:

Introduction: Visual impairment is any condition of the eye or visual system that results in loss/reduction of visual functioning. It significantly influences the academic routine and social activities of children, and the effect is severe for low-income countries like Ethiopia. So, this study aimed to determine the pooled prevalence of visual impairment among school children in Ethiopia. Methods: Databases such as Medical Literature Analysis and Retrieval System Online, Excerpta Medica dataBASE, World Wide Web of Science, and Cochrane Library searched to retrieve eligible articles. In addition, Google Scholar and a reference list of the retrieved eligible articles were addressed. Studies that reported the prevalence of visual impairment were included to estimate the pooled prevalence. Data were extracted using a standardized data extraction format prepared in Microsoft Excel and analysis was held using STATA 11 statistical software. I² was used to assess the heterogeneity. Because of considerable heterogeneity, a random effect meta-analysis model was used to estimate the pooled prevalence of visual impairment among school children in Ethiopia. Results: The result of 9 eligible studies showed that the pooled prevalence of visual impairment among school children in Ethiopia was 7.01% (95% CI: 5.46, 8.56%). In the subgroup analysis, the highest prevalence was reported in South Nations Nationalities and Tigray region together (7.99%; 3.63, 12.35), while the lowest prevalence was reported in Addis Ababa (5.73%; 3.93, 7.53). Conclusion: The prevalence of visual impairment among school children is significantly high in Ethiopia. If it is not detected and intervened early, it will cause a lifetime threat to visually impaired school children, so that school vision screening program plan and its implementation may cure the life quality of future generations in Ethiopia.

Keywords: visual impairment, school children, Ethiopia, prevalence

Procedia PDF Downloads 47

25284 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 397

25283 Private Coded Computation of Matrix Multiplication

Authors: Malihe Aliasgari, Yousef Nejatbakhsh

Abstract:

The era of Big Data and the immensity of real-life datasets compels computation tasks to be performed in a distributed fashion, where the data is dispersed among many servers that operate in parallel. However, massive parallelization leads to computational bottlenecks due to faulty servers and stragglers. Stragglers refer to a few slow or delay-prone processors that can bottleneck the entire computation because one has to wait for all the parallel nodes to finish. The problem of straggling processors, has been well studied in the context of distributed computing. Recently, it has been pointed out that, for the important case of linear functions, it is possible to improve over repetition strategies in terms of the tradeoff between performance and latency by carrying out linear precoding of the data prior to processing. The key idea is that, by employing suitable linear codes operating over fractions of the original data, a function may be completed as soon as enough number of processors, depending on the minimum distance of the code, have completed their operations. The problem of matrix-matrix multiplication in the presence of practically big sized of data sets faced with computational and memory related difficulties, which makes such operations are carried out using distributed computing platforms. In this work, we study the problem of distributed matrix-matrix multiplication W = XY under storage constraints, i.e., when each server is allowed to store a fixed fraction of each of the matrices X and Y, which is a fundamental building of many science and engineering fields such as machine learning, image and signal processing, wireless communication, optimization. Non-secure and secure matrix multiplication are studied. We want to study the setup, in which the identity of the matrix of interest should be kept private from the workers and then obtain the recovery threshold of the colluding model, that is, the number of workers that need to complete their task before the master server can recover the product W. The problem of secure and private distributed matrix multiplication W = XY which the matrix X is confidential, while matrix Y is selected in a private manner from a library of public matrices. We present the best currently known trade-off between communication load and recovery threshold. On the other words, we design an achievable PSGPD scheme for any arbitrary privacy level by trivially concatenating a robust PIR scheme for arbitrary colluding workers and private databases and the proposed SGPD code that provides a smaller computational complexity at the workers.

Keywords: coded distributed computation, private information retrieval, secret sharing, stragglers

Procedia PDF Downloads 127

25282 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 529

25281 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 478

25280 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 406

25279 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 644

25278 Self-Attention Mechanism for Target Hiding Based on Satellite Images

Authors: Hao Yuan, Yongjian Shen, Xiangjun He, Yuheng Li, Zhouzhou Zhang, Pengyu Zhang, Minkang Cai

Abstract:

Remote sensing data can provide support for decision-making in disaster assessment or disaster relief. The traditional processing methods of sensitive targets in remote sensing mapping are mainly based on manual retrieval and image editing tools, which are inefficient. Methods based on deep learning for sensitive target hiding are faster and more flexible. But these methods have disadvantages in training time and cost of calculation. This paper proposed a target hiding model Self Attention (SA) Deepfill, which used self-attention modules to replace part of gated convolution layers in image inpainting. By this operation, the calculation amount of the model becomes smaller, and the performance is improved. And this paper adds free-form masks to the model’s training to enhance the model’s universal. The experiment on an open remote sensing dataset proved the efficiency of our method. Moreover, through experimental comparison, the proposed method can train for a longer time without over-fitting. Finally, compared with the existing methods, the proposed model has lower computational weight and better performance.

Keywords: remote sensing mapping, image inpainting, self-attention mechanism, target hiding

Procedia PDF Downloads 143

25277 Empirical Study on Factors Influencing SEO

Authors: Pakinee Aimmanee, Phoom Chokratsamesiri

Abstract:

Search engine has become an essential tool nowadays for people to search for their needed information on the internet. In this work, we evaluate the performance of the search engine from three factors: the keyword frequency, the number of inbound links, and the difficulty of the keyword. The evaluations are based on the ranking position and the number of days that Google has seen or detect the webpage. We find that the keyword frequency and the difficulty of the keyword do not affect the Google ranking where the number of inbound links gives remarkable improvement of the ranking position. The optimal number of inbound links found in the experiment is 10.

Keywords: SEO, information retrieval, web search, knowledge technologies

Procedia PDF Downloads 286

25276 Neurophysiology of Domain Specific Execution Costs of Grasping in Working Memory Phases

Authors: Rumeysa Gunduz, Dirk Koester, Thomas Schack

Abstract:

Previous behavioral studies have shown that working memory (WM) and manual actions share limited capacity cognitive resources, which in turn results in execution costs of manual actions in WM. However, to the best of our knowledge, there is no study investigating the neurophysiology of execution costs. The current study aims to fill this research gap investigating the neurophysiology of execution costs of grasping in WM phases (encoding, maintenance, retrieval) considering verbal and visuospatial domains of WM. A WM-grasping dual task paradigm was implemented to examine execution costs. Baseline single task required performing verbal or visuospatial version of a WM task. Dual task required performing the WM task embedded in a high precision grasp to place task. 30 participants were tested in a 2 (single vs. dual task) x 2 (visuo-spatial vs. verbal WM) within subject design. Event related potentials (ERPs) were extracted for each WM phase separately in the single and dual tasks. Memory performance for visuospatial WM, but not for verbal WM, was significantly lower in the dual task compared to the single task. Encoding related ERPs in the single task revealed different ERPs of verbal WM and visuospatial WM at bilateral anterior sites and right posterior site. In the dual task, bilateral anterior difference disappeared due to bilaterally increased anterior negativities for visuospatial WM. Maintenance related ERPs in the dual task revealed different ERPs of verbal WM and visuospatial WM at bilateral posterior sites. There was also anterior negativity for visuospatial WM. Retrieval related ERPs in the single task revealed different ERPs of verbal WM and visuospatial WM at bilateral posterior sites. In the dual task, there was no difference between verbal WM and visuospatial WM. Behavioral and ERP findings suggest that execution of grasping shares cognitive resources only with visuospatial WM, which in turn results in domain specific execution costs. Moreover, ERP findings suggest unique patterns of costs in each WM phase, which supports the idea that each WM phase reflects a separate cognitive process. This study not only contributes to the understanding of cognitive principles of manual action control, but also contributes to the understanding of WM as an entity consisting of separate modalities and cognitive processes.

Keywords: dual task, grasping execution, neurophysiology, working memory domains, working memory phases

Procedia PDF Downloads 429

25275 Structured-Ness and Contextual Retrieval Underlie Language Comprehension

Authors: Yao-Ying Lai, Maria Pinango, Ashwini Deo

Abstract:

While grammatical devices are essential to language processing, how comprehension utilizes cognitive mechanisms is less emphasized. This study addresses this issue by probing the complement coercion phenomenon: an entity-denoting complement following verbs like begin and finish receives an eventive interpretation. For example, (1) “The queen began the book” receives an agentive reading like (2) “The queen began [reading/writing/etc.…] the book.” Such sentences engender additional processing cost in real-time comprehension. The traditional account attributes this cost to an operation that coerces the entity-denoting complement to an event, assuming that these verbs require eventive complements. However, in closer examination, examples like “Chapter 1 began the book” undermine this assumption. An alternative, Structured Individual (SI) hypothesis, proposes that the complement following aspectual verbs (AspV; e.g. begin, finish) is conceptualized as a structured individual, construed as an axis along various dimensions (e.g. spatial, eventive, temporal, informational). The composition of an animate subject and an AspV such as (1) engenders an ambiguity between an agentive reading along the eventive dimension like (2), and a constitutive reading along the informational/spatial dimension like (3) “[The story of the queen] began the book,” in which the subject is interpreted as a subpart of the complement denotation. Comprehenders need to resolve the ambiguity by searching contextual information, resulting in additional cost. To evaluate the SI hypothesis, a questionnaire was employed. Method: Target AspV sentences such as “Shakespeare began the volume.” were preceded by one of the following types of context sentence: (A) Agentive-biasing, in which an event was mentioned (…writers often read…), (C) Constitutive-biasing, in which a constitutive meaning was hinted (Larry owns collections of Renaissance literature.), (N) Neutral context, which allowed both interpretations. Thirty-nine native speakers of English were asked to (i) rate each context-target sentence pair from a 1~5 scale (5=fully understandable), and (ii) choose possible interpretations for the target sentence given the context. The SI hypothesis predicts that comprehension is harder for the Neutral condition, as compared to the biasing conditions because no contextual information is provided to resolve an ambiguity. Also, comprehenders should obtain the specific interpretation corresponding to the context type. Results: (A) Agentive-biasing and (C) Constitutive-biasing were rated higher than (N) Neutral conditions (p< .001), while all conditions were within the acceptable range (> 3.5 on the 1~5 scale). This suggests that when lacking relevant contextual information, semantic ambiguity decreases comprehensibility. The interpretation task shows that the participants selected the biased agentive/constitutive reading for condition (A) and (C) respectively. For the Neutral condition, the agentive and constitutive readings were chosen equally often. Conclusion: These findings support the SI hypothesis: the meaning of AspV sentences is conceptualized as a parthood relation involving structured individuals. We argue that semantic representation makes reference to spatial structured-ness (abstracted axis). To obtain an appropriate interpretation, comprehenders utilize contextual information to enrich the conceptual representation of the sentence in question. This study connects semantic structure to human’s conceptual structure, and provides a processing model that incorporates contextual retrieval.

Keywords: ambiguity resolution, contextual retrieval, spatial structured-ness, structured individual

Procedia PDF Downloads 336

25274 Relevance Feedback within CBIR Systems

Authors: Mawloud Mosbah, Bachir Boucheham

Abstract:

We present here the results for a comparative study of some techniques, available in the literature, related to the relevance feedback mechanism in the case of a short-term learning. Only one method among those considered here is belonging to the data mining field which is the K-Nearest Neighbours Algorithm (KNN) while the rest of the methods is related purely to the information retrieval field and they fall under the purview of the following three major axes: Shifting query, Feature Weighting and the optimization of the parameters of similarity metric. As a contribution, and in addition to the comparative purpose, we propose a new version of the KNN algorithm referred to as an incremental KNN which is distinct from the original version in the sense that besides the influence of the seeds, the rate of the actual target image is influenced also by the images already rated. The results presented here have been obtained after experiments conducted on the Wang database for one iteration and utilizing colour moments on the RGB space. This compact descriptor, Colour Moments, is adequate for the efficiency purposes needed in the case of interactive systems. The results obtained allow us to claim that the proposed algorithm proves good results; it even outperforms a wide range of techniques available in the literature.

Keywords: CBIR, category search, relevance feedback, query point movement, standard Rocchio’s formula, adaptive shifting query, feature weighting, original KNN, incremental KNN

Procedia PDF Downloads 285

25273 Landsat 8-TIRS NEΔT at Kīlauea Volcano and the Active East Rift Zone, Hawaii

Authors: Flora Paganelli

Abstract:

The radiometric performance of remotely sensed images is important for volcanic monitoring. The Thermal Infrared Sensor (TIRS) on-board Landsat 8 was designed with specific requirements in regard to the noise-equivalent change in temperature (NEΔT) at ≤ 0.4 K at 300 K for the two thermal infrared bands B10 and B11. This study investigated the on-orbit NEΔT of the TIRS two bands from a scene-based method using clear-sky images over the volcanic activity of Kīlauea Volcano and the active East Rift Zone (Hawaii), in order to optimize the use of TIRS data. Results showed that the NEΔTs of the two bands exceeded the design specification by an order of magnitude at 300 K. Both separate bands and split window algorithm were examined to estimate the effect of NEΔT on the land surface temperature (LST) retrieval, and NEΔT contribution to the final LST error. These results were also useful in the current efforts to assess the requirements for volcanology research campaign using the Hyperspectral Infrared Imager (HyspIRI) whose airborne prototype MODIS/ASTER instruments is plan to be flown by NASA as a single campaign to the Hawaiian Islands in support of volcanology and coastal area monitoring in 2016.

Keywords: landsat 8, radiometric performance, thermal infrared sensor (TIRS), volcanology

Procedia PDF Downloads 243

25272 An Improved Atmospheric Correction Method with Diurnal Temperature Cycle Model for MSG-SEVIRI TIR Data under Clear Sky Condition

Authors: Caixia Gao, Chuanrong Li, Lingli Tang, Lingling Ma, Yonggang Qian, Ning Wang

Abstract:

Knowledge of land surface temperature (LST) is of crucial important in energy balance studies and environment modeling. Satellite thermal infrared (TIR) imagery is the primary source for retrieving LST at the regional and global scales. Due to the combination of atmosphere and land surface of received radiance by TIR sensors, atmospheric effect correction has to be performed to remove the atmospheric transmittance and upwelling radiance. Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard Meteosat Second Generation (MSG) provides measurements every 15 minutes in 12 spectral channels covering from visible to infrared spectrum at fixed view angles with 3km pixel size at nadir, offering new and unique capabilities for LST, LSE measurements. However, due to its high temporal resolution, the atmosphere correction could not be performed with radiosonde profiles or reanalysis data since these profiles are not available at all SEVIRI TIR image acquisition times. To solve this problem, a two-part six-parameter semi-empirical diurnal temperature cycle (DTC) model has been applied to the temporal interpolation of ECMWF reanalysis data. Due to the fact that the DTC model is underdetermined with ECMWF data at four synoptic times (UTC times: 00:00, 06:00, 12:00, 18:00) in one day for each location, some approaches are adopted in this study. It is well known that the atmospheric transmittance and upwelling radiance has a relationship with water vapour content (WVC). With the aid of simulated data, the relationship could be determined under each viewing zenith angle for each SEVIRI TIR channel. Thus, the atmospheric transmittance and upwelling radiance are preliminary removed with the aid of instantaneous WVC, which is retrieved from the brightness temperature in the SEVIRI channels 5, 9 and 10, and a group of the brightness temperatures for surface leaving radiance (Tg) are acquired. Subsequently, a group of the six parameters of the DTC model is fitted with these Tg by a Levenberg-Marquardt least squares algorithm (denoted as DTC model 1). Although the retrieval error of WVC and the approximate relationships between WVC and atmospheric parameters would induce some uncertainties, this would not significantly affect the determination of the three parameters, td, ts and β (β is the angular frequency, td is the time where the Tg reaches its maximum, ts is the starting time of attenuation) in DTC model. Furthermore, due to the large fluctuation in temperature and the inaccuracy of the DTC model around sunrise, SEVIRI measurements from two hours before sunrise to two hours after sunrise are excluded. With the knowledge of td , ts, and β, a new DTC model (denoted as DTC model 2) is accurately fitted again with these Tg at UTC times: 05:57, 11:57, 17:57 and 23:57, which is atmospherically corrected with ECMWF data. And then a new group of the six parameters of the DTC model is generated and subsequently, the Tg at any given times are acquired. Finally, this method is applied to SEVIRI data in channel 9 successfully. The result shows that the proposed method could be performed reasonably without assumption and the Tg derived with the improved method is much more consistent with that from radiosonde measurements.

Keywords: atmosphere correction, diurnal temperature cycle model, land surface temperature, SEVIRI

Procedia PDF Downloads 271

25271 Book Recommendation Using Query Expansion and Information Retrieval Methods

Authors: Ritesh Kumar, Rajendra Pamula

Abstract:

In this paper, we present our contribution for book recommendation. In our experiment, we combine the results of Sequential Dependence Model (SDM) and exploitation of book information such as reviews, tags and ratings. This social information is assigned by users. For this, we used CLEF-2016 Social Book Search Track Suggestion task. Finally, our proposed method extensively evaluated on CLEF -2015 Social Book Search datasets, and has better performance (nDCG@10) compared to other state-of-the-art systems. Recently we got the good performance in CLEF-2016.

Keywords: sequential dependence model, social information, social book search, query expansion

Procedia PDF Downloads 291

25270 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 384

25269 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 165

25268 Cloud-Based Multiresolution Geodata Cube for Efficient Raster Data Visualization and Analysis

Authors: Lassi Lehto, Jaakko Kahkonen, Juha Oksanen, Tapani Sarjakoski

Abstract:

The use of raster-formatted data sets in geospatial analysis is increasing rapidly. At the same time, geographic data are being introduced into disciplines outside the traditional domain of geoinformatics, like climate change, intelligent transport, and immigration studies. These developments call for better methods to deliver raster geodata in an efficient and easy-to-use manner. Data cube technologies have traditionally been used in the geospatial domain for managing Earth Observation data sets that have strict requirements for effective handling of time series. The same approach and methodologies can also be applied in managing other types of geospatial data sets. A cloud service-based geodata cube, called GeoCubes Finland, has been developed to support online delivery and analysis of most important geospatial data sets with national coverage. The main target group of the service is the academic research institutes in the country. The most significant aspects of the GeoCubes data repository include the use of multiple resolution levels, cloud-optimized file structure, and a customized, flexible content access API. Input data sets are pre-processed while being ingested into the repository to bring them into a harmonized form in aspects like georeferencing, sampling resolutions, spatial subdivision, and value encoding. All the resolution levels are created using an appropriate generalization method, selected depending on the nature of the source data set. Multiple pre-processed resolutions enable new kinds of online analysis approaches to be introduced. Analysis processes based on interactive visual exploration can be effectively carried out, as the level of resolution most close to the visual scale can always be used. In the same way, statistical analysis can be carried out on resolution levels that best reflect the scale of the phenomenon being studied. Access times remain close to constant, independent of the scale applied in the application. The cloud service-based approach, applied in the GeoCubes Finland repository, enables analysis operations to be performed on the server platform, thus making high-performance computing facilities easily accessible. The developed GeoCubes API supports this kind of approach for online analysis. The use of cloud-optimized file structures in data storage enables the fast extraction of subareas. The access API allows for the use of vector-formatted administrative areas and user-defined polygons as definitions of subareas for data retrieval. Administrative areas of the country in four levels are available readily from the GeoCubes platform. In addition to direct delivery of raster data, the service also supports the so-called virtual file format, in which only a small text file is first downloaded. The text file contains links to the raster content on the service platform. The actual raster data is downloaded on demand, from the spatial area and resolution level required in each stage of the application. By the geodata cube approach, pre-harmonized geospatial data sets are made accessible to new categories of inexperienced users in an easy-to-use manner. At the same time, the multiresolution nature of the GeoCubes repository facilitates expert users to introduce new kinds of interactive online analysis operations.

Keywords: cloud service, geodata cube, multiresolution, raster geodata

Procedia PDF Downloads 144

25267 Linguistic Analysis of Borderline Personality Disorder: Using Language to Predict Maladaptive Thoughts and Behaviours

Authors: Charlotte Entwistle, Ryan Boyd

Abstract:

Recent developments in information retrieval techniques and natural language processing have allowed for greater exploration of psychological and social processes. Linguistic analysis methods for understanding behaviour have provided useful insights within the field of mental health. One area within mental health that has received little attention though, is borderline personality disorder (BPD). BPD is a common mental health disorder characterised by instability of interpersonal relationships, self-image and affect. It also manifests through maladaptive behaviours, such as impulsivity and self-harm. Examination of language patterns associated with BPD could allow for a greater understanding of the disorder and its links to maladaptive thoughts and behaviours. Language analysis methods could also be used in a predictive way, such as by identifying indicators of BPD or predicting maladaptive thoughts, emotions and behaviours. Additionally, associations that are uncovered between language and maladaptive thoughts and behaviours could then be applied at a more general level. This study explores linguistic characteristics of BPD, and their links to maladaptive thoughts and behaviours, through the analysis of social media data. Data were collected from a large corpus of posts from the publicly available social media platform Reddit, namely, from the ‘r/BPD’ subreddit whereby people identify as having BPD. Data were collected using the Python Reddit API Wrapper and included all users which had posted within the BPD subreddit. All posts were manually inspected to ensure that they were not posted by someone who clearly did not have BPD, such as people posting about a loved one with BPD. These users were then tracked across all other subreddits of which they had posted in and data from these subreddits were also collected. Additionally, data were collected from a random control group of Reddit users. Disorder-relevant behaviours, such as self-harming or aggression-related behaviours, outlined within Reddit posts were coded to by expert raters. All posts and comments were aggregated by user and split by subreddit. Language data were then analysed using the Linguistic Inquiry and Word Count (LIWC) 2015 software. LIWC is a text analysis program that identifies and categorises words based on linguistic and paralinguistic dimensions, psychological constructs and personal concern categories. Statistical analyses of linguistic features could then be conducted. Findings revealed distinct linguistic features associated with BPD, based on Reddit posts, which differentiated these users from a control group. Language patterns were also found to be associated with the occurrence of maladaptive thoughts and behaviours. Thus, this study demonstrates that there are indeed linguistic markers of BPD present on social media. It also implies that language could be predictive of maladaptive thoughts and behaviours associated with BPD. These findings are of importance as they suggest potential for clinical interventions to be provided based on the language of people with BPD to try to reduce the likelihood of maladaptive thoughts and behaviours occurring. For example, by social media tracking or engaging people with BPD in expressive writing therapy. Overall, this study has provided a greater understanding of the disorder and how it manifests through language and behaviour.

Keywords: behaviour analysis, borderline personality disorder, natural language processing, social media data

Procedia PDF Downloads 356

25266 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 235

25265 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 172

25264 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 210

25263 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 488

25262 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 575