Search results for: sparsely representable data

24094 Scalable Learning of Tree-Based Models on Sparsely Representable Data

Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou

Abstract:

Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.

Keywords: big data, sparsely representable data, tree-based models, scalable learning

Procedia PDF Downloads 228

24093 Approximation of Convex Set by Compactly Semidefinite Representable Set

Authors: Anusuya Ghosh, Vishnu Narayanan

Abstract:

The approximation of convex set by semidefinite representable set plays an important role in semidefinite programming, especially in modern convex optimization. To optimize a linear function over a convex set is a hard problem. But optimizing the linear function over the semidefinite representable set which approximates the convex set is easy to solve as there exists numerous efficient algorithms to solve semidefinite programming problems. So, our approximation technique is significant in optimization. We develop a technique to approximate any closed convex set, say K by compactly semidefinite representable set. Further we prove that there exists a sequence of compactly semidefinite representable sets which give tighter approximation of the closed convex set, K gradually. We discuss about the convergence of the sequence of compactly semidefinite representable sets to closed convex set K. The recession cone of K and the recession cone of the compactly semidefinite representable set are equal. So, we say that the sequence of compactly semidefinite representable sets converge strongly to the closed convex set. Thus, this approximation technique is very useful development in semidefinite programming.

Keywords: semidefinite programming, semidefinite representable set, compactly semidefinite representable set, approximation

Procedia PDF Downloads 347

24092 Second Representation of Modules over Commutative Rings

Authors: Jawad Abuhlail, Hamza Hroub

Abstract:

Let R be a commutative ring. Representation theory studies the representation of R-modules as (possibly finite) sums of special types of R-submodules. Here we are interested in a class of R-modules between the class of semisimple R-modules and the class of R-modules that can be written as (possibly finite) sums of secondary R-submodules (we know that every simple R-submodule is secondary). We investigate R-modules which can be written as (possibly finite) sums of second R-submodules (we call those modules second representable). Moreover, we investigate the class of (main) second attached prime ideals related to a module with such representation. We provide sufficient conditions for an R-module M to get a (minimal) second representation. We also found the collection of second attached prime ideals for some types of second representable R-modules, in particular within the class of injective R-modules. As we know that every simple R-submodule is second and every second R-submodule is secondary, we can see the importance of the second representable R-module.

Keywords: lifting modules, second attached prime ideals, second representations, secondary representations, semisimple modules, second submodules

Procedia PDF Downloads 152

24091 Advancement of Computer Science Research in Nigeria: A Bibliometric Analysis of the Past Three Decades

Authors: Temidayo O. Omotehinwa, David O. Oyewola, Friday J. Agbo

Abstract:

This study aims to gather a proper perspective of the development landscape of Computer Science research in Nigeria. Therefore, a bibliometric analysis of 4,333 bibliographic records of Computer Science research in Nigeria in the last 31 years (1991-2021) was carried out. The bibliographic data were extracted from the Scopus database and analyzed using VOSviewer and the bibliometrix R package through the biblioshiny web interface. The findings of this study revealed that Computer Science research in Nigeria has a growth rate of 24.19%. The most developed and well-studied research areas in the Computer Science field in Nigeria are machine learning, data mining, and deep learning. The social structure analysis result revealed that there is a need for improved international collaborations. Sparsely established collaborations are largely influenced by geographic proximity. The funding analysis result showed that Computer Science research in Nigeria is under-funded. The findings of this study will be useful for researchers conducting Computer Science related research. Experts can gain insights into how to develop a strategic framework that will advance the field in a more impactful manner. Government agencies and policymakers can also utilize the outcome of this research to develop strategies for improved funding for Computer Science research.

Keywords: bibliometric analysis, biblioshiny, computer science, Nigeria, science mapping

Procedia PDF Downloads 71

24090 Impact Evaluation of Discriminant Analysis on Epidemic Protocol in Warships’s Scenarios

Authors: Davi Marinho de Araujo Falcão, Ronaldo Moreira Salles, Paulo Henrique Maranhão

Abstract:

Disruption Tolerant Networks (DTN) are an evolution of Mobile Adhoc Networks (MANET) and work good in scenarioswhere nodes are sparsely distributed, with low density, intermittent connections and an end-to-end infrastructure is not possible to guarantee. Therefore, DTNs are recommended for high latency applications that can last from hours to days. The maritime scenario has mobility characteristics that contribute to a DTN network approach, but the concern with data security is also a relevant aspect in such scenarios. Continuing the previous work, which evaluated the performance of some DTN protocols (Epidemic, Spray and Wait, and Direct Delivery) in three warship scenarios and proposed the application of discriminant analysis, as a classification technique for secure connections, in the Epidemic protocol, thus, the current article proposes a new analysis of the directional discriminant function with opening angles smaller than 90 degrees, demonstrating that the increase in directivity influences the selection of a greater number of secure connections by the directional discriminant Epidemic protocol.

Keywords: DTN, discriminant function, epidemic protocol, security, tactical messages, warship scenario

Procedia PDF Downloads 157

24089 Mapping Thermal Properties Using Resistivity, Lithology and Thermal Conductivity Measurements

Authors: Riccardo Pasquali, Keith Harlin, Mark Muller

Abstract:

The ShallowTherm project is focussed on developing and applying a methodology for extrapolating relatively sparsely sampled thermal conductivity measurements across Ireland using mapped Litho-Electrical (LE) units. The primary data used consist of electrical resistivities derived from the Geological Survey Ireland Tellus airborne electromagnetic dataset, GIS-based maps of Irish geology, and rock thermal conductivities derived from both the current Irish Ground Thermal Properties (IGTP) database and a new programme of sampling and laboratory measurement. The workflow has been developed across three case-study areas that sample a range of different calcareous, arenaceous, argillaceous, and volcanic lithologies. Statistical analysis of resistivity data from individual geological formations has been assessed and integrated with detailed lithological descriptions to define distinct LE units. Thermal conductivity measurements from core and hand samples have been acquired for every geological formation within each study area. The variability and consistency of thermal conductivity measurements within each LE unit is examined with the aim of defining a characteristic thermal conductivity (or range of thermal conductivities) for each LE unit. Mapping of LE units, coupled with characteristic thermal conductivities, provides a method of defining thermal conductivity properties at a regional scale and facilitating the design of ground source heat pump closed-loop collectors.

Keywords: thermal conductivity, ground source heat pumps, resistivity, heat exchange, shallow geothermal, Ireland

Procedia PDF Downloads 137

24088 The Shona and isiXhosa Linguistic Matrimony Through Code-Switching in Cape Town

Authors: John Mambambo

Abstract:

Debates on the link between Bantu languages are often epitomized by animated theoretical critiques, including the language zoning and groupings. This evaluative, qualitative inquiry hovers above theoretical critiques to offer the sparsely studied ChiShona and isiXhosa code-switching nexus, a yawning gap in scholarship. Using interviews, questionnaires and observations, data germane to the study were collected from a purposively selected group of Shona speakers who had resided in Xhosa-speaking communities for not less than a year. Deploying Myers-Scotton’s Markedness theory, the paper gazes into the pragmatic linguistic affinity that is affirmed through the Shona-Xhosa code-switching in Cape Town. The assorted social variables motivating bilingual speakers to code-switch in Cape Town are also explored in this study. The study unveils that Shona speakers are motivated to code-switch by the linguistic affinity between ChiShona and isiXhosa. Other socio-political justifications also give an impetus to this phenomenon. The Matrix Language Frame Model affirms that ChiShona is the base while isiXhosa is the embedded language during code-switching. This paper is a momentous advancement of the extant literature on code-switching. It is a unique contribution to the nexus between ChiShona and isiXhosa languages, providing fresh insights into the discourse on African language comparison studies.

Keywords: code-switching, chishona, isiXhosa, bilingualism

Procedia PDF Downloads 75

24087 Detecting Memory-Related Gene Modules in sc/snRNA-seq Data by Deep-Learning

Authors: Yong Chen

Abstract:

To understand the detailed molecular mechanisms of memory formation in engram cells is one of the most fundamental questions in neuroscience. Recent single-cell RNA-seq (scRNA-seq) and single-nucleus RNA-seq (snRNA-seq) techniques have allowed us to explore the sparsely activated engram ensembles, enabling access to the molecular mechanisms that underlie experience-dependent memory formation and consolidation. However, the absence of specific and powerful computational methods to detect memory-related genes (modules) and their regulatory relationships in the sc/snRNA-seq datasets has strictly limited the analysis of underlying mechanisms and memory coding principles in mammalian brains. Here, we present a deep-learning method named SCENTBOX, to detect memory-related gene modules and causal regulatory relationships among themfromsc/snRNA-seq datasets. SCENTBOX first constructs codifferential expression gene network (CEGN) from case versus control sc/snRNA-seq datasets. It then detects the highly correlated modules of differential expression genes (DEGs) in CEGN. The deep network embedding and attention-based convolutional neural network strategies are employed to precisely detect regulatory relationships among DEG genes in a module. We applied them on scRNA-seq datasets of TRAP; Ai14 mouse neurons with fear memory and detected not only known memory-related genes, but also the modules and potential causal regulations. Our results provided novel regulations within an interesting module, including Arc, Bdnf, Creb, Dusp1, Rgs4, and Btg2. Overall, our methods provide a general computational tool for processing sc/snRNA-seq data from case versus control studie and a systematic investigation of fear-memory-related gene modules.

Keywords: sc/snRNA-seq, memory formation, deep learning, gene module, causal inference

Procedia PDF Downloads 81

24086 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Keywords: time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder

Procedia PDF Downloads 254

24085 Assesment of the Economic Potential of Lead Contaminated Brownfield for Growth of Oil Producing Crop Like Helianthus annus (Sunflower)

Authors: Shahenaz Sidi, S. K. Tank

Abstract:

When sparsely used industrial and commercial facilities are retired or abandoned, one of the biggest issues that arise is what to do with the remaining land. This land, referred to as a ‘Brownfield site’ or simply ‘Brownfield’ is often contaminated with waste and pollutants left behind by the defunct industrial facilities and factories that stand on the land. Phytoremediation has been proved a promising greener and cleaner technology in remediating the land unlike other chemical excavation methods. Helianthus annus is a hyper accumulator of lead. Helianthus annus can be used for remediation procedures in metal contaminated soils. It is a fast-growing crop which would favour soil stabilization. Its tough leaves and stems are rarely eaten by animals. The seeds (actively eaten by birds) have very low concentrations of potentially toxic elements, and represent low risk for the food web. The study is conducted to determine the phytoextraction potentials of the plant and the eventual seed harvesting and commercial oil production on remediated soil.

Keywords: Brownfield, phytoextraction, helianthus, oil, commercial

Procedia PDF Downloads 293

24084 Spectral Mapping of Hydrothermal Alteration Minerals for Geothermal Exploration Using Advanced Spaceborne Thermal Emission and Reflection Radiometer Short Wave Infrared Data

Authors: Aliyu J. Abubakar, Mazlan Hashim, Amin B. Pour

Abstract:

Exploiting geothermal resources for either power, home heating, Spa, greenhouses, industrial or tourism requires an initial identification of suitable areas. This can be done cost-effectively using remote sensing satellite imagery which has synoptic capabilities of covering large areas in real time and by identifying possible areas of hydrothermal alteration and minerals related to Geothermal systems. Earth features and minerals are known to have unique diagnostic spectral reflectance characteristics that can be used to discriminate them. The focus of this paper is to investigate the applicability of mapping hydrothermal alteration in relation to geothermal systems (thermal springs) at Yankari Park Northeastern Nigeria, using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) satellite data for resource exploration. The ASTER Short Wave Infrared (SWIR) bands are used to highlight and discriminate alteration areas by employing sophisticated digital image processing techniques including image transformations and spectral mapping methods. Field verifications are conducted at the Yankari Park using hand held Global Positioning System (GPS) monterra to identify locations of hydrothermal alteration and rock samples obtained at the vicinity and surrounding areas of the ‘Mawulgo’ and ‘Wikki’ thermal springs. X-Ray Diffraction (XRD) results of rock samples obtained from the field validated hydrothermal alteration by the presence of indicator minerals including; Dickite, Kaolinite, Hematite and Quart. The study indicated the applicability of mapping geothermal anomalies for resource exploration in unmapped sparsely vegetated savanna environment characterized by subtle surface manifestations such as thermal springs. The results could have implication for geothermal resource exploration especially at the prefeasibility stages by narrowing targets for comprehensive surveys and in unexplored savanna regions where expensive airborne surveys are unaffordable.

Keywords: geothermal exploration, image enhancement, minerals, spectral mapping

Procedia PDF Downloads 331

24083 Lake Water Surface Variations and Its Influencing Factors in Tibetan Plateau in Recent 10 Years

Authors: Shanlong Lu, Jiming Jin, Xiaochun Wang

Abstract:

The Tibetan Plateau has the largest number of inland lakes with the highest elevation on the planet. These massive and large lakes are mostly in natural state and are less affected by human activities. Their shrinking or expansion can truly reflect regional climate and environmental changes and are sensitive indicators of global climate change. However, due to the sparsely populated nature of the plateau and the poor natural conditions, it is difficult to effectively obtain the change data of the lake, which has affected people's understanding of the temporal and spatial processes of lake water changes and their influencing factors. By using the MODIS (Moderate Resolution Imaging Spectroradiometer) MOD09Q1 surface reflectance images as basic data, this study produced the 8-day lake water surface data set of the Tibetan Plateau from 2000 to 2012 at 250 m spatial resolution, with a lake water surface extraction method of combined with lake water surface boundary buffer analyzing and lake by lake segmentation threshold determining. Then based on the dataset, the lake water surface variations and their influencing factors were analyzed, by using 4 typical natural geographical zones of Eastern Qinghai and Qilian, Southern Qinghai, Qiangtang, and Southern Tibet, and the watersheds of the top 10 lakes of Qinghai, Siling Co, Namco, Zhari NamCo, Tangra Yumco, Ngoring, UlanUla, Yamdrok Tso, Har and Gyaring as the analysis units. The accuracy analysis indicate that compared with water surface data of the 134 sample lakes extracted from the 30 m Landsat TM (Thematic Mapper ) images, the average overall accuracy of the lake water surface data set is 91.81% with average commission and omission error of 3.26% and 5.38%; the results also show strong linear (R2=0.9991) correlation with the global MODIS water mask dataset with overall accuracy of 86.30%; and the lake area difference between the Second National Lake Survey and this study is only 4.74%, respectively. This study provides reliable dataset for the lake change research of the plateau in the recent decade. The change trends and influencing factors analysis indicate that the total water surface area of lakes in the plateau showed overall increases, but only lakes with areas larger than 10 km2 had statistically significant increases. Furthermore, lakes with area larger than 100 km2 experienced an abrupt change in 2005. In addition, the annual average precipitation of Southern Tibet and Southern Qinghai experienced significant increasing and decreasing trends, and corresponding abrupt changes in 2004 and 2006, respectively. The annual average temperature of Southern Tibet and Qiangtang showed a significant increasing trend with an abrupt change in 2004. The major reason for the lake water surface variation in Eastern Qinghai and Qilian, Southern Qinghai and Southern Tibet is the changes of precipitation, and that for Qiangtang is the temperature variations.

Keywords: lake water surface variation, MODIS MOD09Q1, remote sensing, Tibetan Plateau

Procedia PDF Downloads 201

24082 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 299

24081 Improved 3D Structure Prediction of Beta-Barrel Membrane Proteins by Using Evolutionary Coupling Constraints, Reduced State Space and an Empirical Potential Function

Authors: Wei Tian, Jie Liang, Hammad Naveed

Abstract:

Beta-barrel membrane proteins are found in the outer membrane of gram-negative bacteria, mitochondria, and chloroplasts. They carry out diverse biological functions, including pore formation, membrane anchoring, enzyme activity, and bacterial virulence. In addition, beta-barrel membrane proteins increasingly serve as scaffolds for bacterial surface display and nanopore-based DNA sequencing. Due to difficulties in experimental structure determination, they are sparsely represented in the protein structure databank and computational methods can help to understand their biophysical principles. We have developed a novel computational method to predict the 3D structure of beta-barrel membrane proteins using evolutionary coupling (EC) constraints and a reduced state space. Combined with an empirical potential function, we can successfully predict strand register at > 80% accuracy for a set of 49 non-homologous proteins with known structures. This is a significant improvement from previous results using EC alone (44%) and using empirical potential function alone (73%). Our method is general and can be applied to genome-wide structural prediction.

Keywords: beta-barrel membrane proteins, structure prediction, evolutionary constraints, reduced state space

Procedia PDF Downloads 575

24080 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 375

24079 The Concepts of Urban Sustainable Development and Smart Cities: In the Understanding of Academia and the European Union

Authors: Wolfgang Haupt

Abstract:

When considering the future city one repeatedly comes across two sometimes sparsely differentiated terms: Sustainable and smart. ‘A European Strategy for Smart, Sustainable, and Inclusive Growth’, this is how the European Commission named its current growth strategy. Thus, Europe should become smarter and more sustainable. Both, the smart and the sustainable city represent a positive vision of urban development as well as a subject area for contemporary and future urban policies. However, more clarity on what is actually behind these terminologies is required. The paper analyses how the terms are defined academically and how this academic understanding is represented in the funding mechanisms of European urban policies. The theoretical framework is mainly based on sources such as journal articles and policy reports. It became clear that despite some similarities, such as the broad field of work or the tendency to operationalize the terms by defining sub-categories, both ideas are distinctly different in terms of the development history, the main driving forces behind and the theoretical scope. Moreover, the significantly more comprehensively defined term sustainability has found its way into the centre of European regional funding policies. On the contrary, the smart city vision still lacks terminological and content-related clarity and as a consequence, the corresponding European funding landscape is more small-scaled and less customized.

Keywords: European spatial policy, European union, smart city, urban sustainable development

Procedia PDF Downloads 337

24078 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 507

24077 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 525

24076 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 359

24075 Authenticity during Conflict Reporting: The China-India Border Clash in the Indian Press

Authors: Arjun Chatterjee

Abstract:

The India-China border clash in Galwan valley in June 2020, the first deadly skirmish between the two Asian giants in the Himalayan border area in over four decades, highlighted the need to examine the notion of ‘authenticity’ in journalistic practices. Information emanating from such remotely located, sparsely populated, and not well-demarcated international land borders have limited sources, restricted to official sources, which have their own narrative. Geopolitical goals and ambitions embolden narratives of nationalism in the media, and these often challenge the notion and understanding of authenticity in journalism. The Indian press, contrary to the Chinese press, which is state-owned, is diverse and also confrontational, where narratives of nationalism are differentially interpreted, embedded, and realised. This paper examines how authenticity has become a variable, rather than a constant, in conflict reporting of the Sino-Indian border clash and how authenticity is interpreted similarly or differently in conflict journalism. The paper reports qualitative textual analysis of two leading English language newspapers – The Times of India and The Hindu, and two mainstream regional language newspapers, Amar Ujala (Hindi) and Ananda Bazar Patrika (Bengali), to evaluate the ways in which representations of information function in conflict reporting and to recontextualize (and thus change or modify the meaning of) that which they represent, and with what political and cultural implications.

Keywords: India-China, framing, conflict, media narratives, border dispute

Procedia PDF Downloads 55

24074 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 79

24073 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 342

24072 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 482

24071 The Belt and Road Initiative in a Spiderweb of Conflicting Great Power Interests: A Geopolitical Analysis

Authors: Csaba Barnabas Horvath

Abstract:

The Belt and Road initiative of China is one that can change the face of Eurasia as we know it. Instead of four major, densely populated subcontinents defined by Mackinder (East Asia, Europe, the Indian Subcontinent, and the Middle East) isolated from each other by vast, sparsely populated and underdeveloped regions, it can at last start to function as a geographic whole, with a sophisticated infrastructure linking its different parts to each other. This initiative, however, happens not in a geopolitical vacuum, but in a space of conflicting great power interests. In Central Asia, the influence of China and Russia are in a setting of competition, where despite the cooperation between the two powers to a great degree, issues causing mutual mistrust emerge repeatedly. In Afghanistan, besides western military presence, even India’s efforts can be added to the picture. In Southeast Asia, a key region regarding the maritime Silk Road, India’s Act East policy meets with China’s Belt and Road, not always in consensus, not to mention US and Japanese interests in the region. The presentation aims to take an overview on how conflicting great power interests are likely to influence the outcome of the Belt and Road initiative. The findings show, that overall success of the Belt and Road Initiative may not be as smooth, as hoped by China, but at the same time, in a limited number of strategically important countries (such as Pakistan, Laos, and Cambodia), this setting is actually a factor favoring China, providing at least a selected number of reliable corridors, where the initiative is actually likely to be successful.

Keywords: belt and road initiative, geostrategic corridors, geopolitics, great power rivalry

Procedia PDF Downloads 190

24070 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 433

24069 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 361

24068 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 606

24067 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 336

24066 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 124

24065 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 178