19331 Building and Tree Detection Using Multiscale Matched Filtering

Authors: Abdullah H. Özcan, Dilara Hisar, Yetkin Sayar, Cem Ünsalan


In this study, an automated building and tree detection method is proposed using DSM data and true orthophoto image. A multiscale matched filtering is used on DSM data. Therefore, first watershed transform is applied. Then, Otsu’s thresholding method is used as an adaptive threshold to segment each watershed region. Detected objects are masked with NDVI to separate buildings and trees. The proposed method is able to detect buildings and trees without entering any elevation threshold. We tested our method on ISPRS semantic labeling dataset and obtained promising results.

Keywords: building detection, local maximum filtering, matched filtering, multiscale

19330 Self-Supervised Learning for Hate-Speech Identification

Authors: Shrabani Ghosh


Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.

Keywords: attention learning, language model, offensive language detection, self-supervised learning

19329 Constructing White-Box Implementations Based on Threshold Shares and Composite Fields

Authors: Tingting Lin, Manfred von Willich, Dafu Lou, Phil Eisen


A white-box implementation of a cryptographic algorithm is a software implementation intended to resist extraction of the secret key by an adversary. To date, most of the white-box techniques are used to protect block cipher implementations. However, a large proportion of the white-box implementations are proven to be vulnerable to affine equivalence attacks and other algebraic attacks, as well as differential computation analysis (DCA). In this paper, we identify a class of block ciphers for which we propose a method of constructing white-box implementations. Our method is based on threshold implementations and operations in composite fields. The resulting implementations consist of lookup tables and few exclusive OR operations. All intermediate values (inputs and outputs of the lookup tables) are masked. The threshold implementation makes the distribution of the masked values uniform and independent of the original inputs, and the operations in composite fields reduce the size of the lookup tables. The white-box implementations can provide resistance against algebraic attacks and DCA-like attacks.

Keywords: white-box, block cipher, composite field, threshold implementation

19328 Fexofenadine Hydrochloride Orodispersisble Tablets: Formulation and in vitro/in vivo Evaluation in Healthy Human Volunteers

Authors: Soad Ali Yehia, Mohamed Shafik El-Ridi, Mina Ibrahim Tadros, Nolwa Gamal El-Sherif


Fexofenadine hydrochloride (FXD) is a slightly soluble, bitter-tasting, drug having an oral bioavailability of 35%. The maximum plasma concentration is reached 2.6 hours (Tmax) post-dose. The current work aimed to develop taste-masked FXD orodispersible tablets (ODTs) to increase extent of drug absorption and reduce Tmax. Taste masking was achieved via solid dispersion (SD) with chitosan (CS) or sodium alginate (ALG). FT-IR, DSC and XRD were performed to identify physicochemical interactions and FXD crystallinity. Taste-masked FXD-ODTs were developed via addition of superdisintegrants (crosscarmelose sodium or sodium starch glycolate, 5% and 10%, w/w) or sublimable agents (camphor, menthol or thymol; 10% and 20%, w/w) to FXD-SDs. ODTs were evaluated for weight variation, drug-content, friability, wetting time, disintegration time and drug release. Camphor-based (20%, w/w) FXD-ODT (F12) was optimized (F23) by incorporation of a more hydrophilic lubricant, sodium stearyl fumarate (Pruv®). The topography of the latter formula was examined via scanning electron microscopy (SEM). The in vivo estimation of FXD pharmacokinetics, relative to Allegra® tablets, was evaluated in healthy human volunteers. Based on the gustatory sensation test in healthy volunteers, FXD:CS (1:1) and FXD:ALG (1:0.5) SDs were selected. Taste-masked FXD-ODTs had appropriate physicochemical properties and showed short wetting and disintegration times. Drug release profiles of F23 and phenylalanine-containing Allegra® ODT were similar (f2 = 96) showing a complete release in two minutes. SEM micrographs revealed pores following camphor sublimation. Compared to Allegra® tablets, pharmacokinetic studies in healthy volunteers proved F23 ability to increase extent of FXD absorption (14%) and reduce Tmax to 1.83 h.

Keywords: fexofenadine hydrochloride, taste masking, chitosan, orodispersible

19327 A Single-Channel BSS-Based Method for Structural Health Monitoring of Civil Infrastructure under Environmental Variations

Authors: Yanjie Zhu, André Jesus, Irwanda Laory


Structural Health Monitoring (SHM), involving data acquisition, data interpretation and decision-making system aim to continuously monitor the structural performance of civil infrastructures under various in-service circumstances. The main value and purpose of SHM is identifying damages through data interpretation system. Research on SHM has been expanded in the last decades and a large volume of data is recorded every day owing to the dramatic development in sensor techniques and certain progress in signal processing techniques. However, efficient and reliable data interpretation for damage detection under environmental variations is still a big challenge. Structural damages might be masked because variations in measured data can be the result of environmental variations. This research reports a novel method based on single-channel Blind Signal Separation (BSS), which extracts environmental effects from measured data directly without any prior knowledge of the structure loading and environmental conditions. Despite the successful application in audio processing and bio-medical research fields, BSS has never been used to detect damage under varying environmental conditions. This proposed method optimizes and combines Ensemble Empirical Mode Decomposition (EEMD), Principal Component Analysis (PCA) and Independent Component Analysis (ICA) together to separate structural responses due to different loading conditions respectively from a single channel input signal. The ICA is applying on dimension-reduced output of EEMD. Numerical simulation of a truss bridge, inspired from New Joban Line Arakawa Railway Bridge, is used to validate this method. All results demonstrate that the single-channel BSS-based method can recover temperature effects from mixed structural response recorded by a single sensor with a convincing accuracy. This will be the foundation of further research on direct damage detection under varying environment.

Keywords: damage detection, ensemble empirical mode decomposition (EEMD), environmental variations, independent component analysis (ICA), principal component analysis (PCA), structural health monitoring (SHM)

19326 An Event-Related Potential Study of Individual Differences in Word Recognition: The Evidence from Morphological Knowledge of Sino-Korean Prefixes

Authors: Jinwon Kang, Seonghak Jo, Joohee Ahn, Junghye Choi, Sun-Young Lee


A morphological priming has proved its importance by showing that segmentation occurs in morphemes when visual words are recognized within a noticeably short time. Regarding Sino-Korean prefixes, this study conducted an experiment on visual masked priming tasks with 57 ms stimulus-onset asynchrony (SOA) to see how individual differences in the amount of morphological knowledge affect morphological priming. The relationship between the prime and target words were classified as morphological (e.g., 미개척 migaecheog [unexplored] – 미해결 mihaegyel [unresolved]), semantical (e.g., 친환경 chinhwangyeong [eco-friendly]) – 무공해 mugonghae [no-pollution]), and orthographical (e.g., 미용실 miyongsil [beauty shop] – 미확보 mihwagbo [uncertainty]) conditions. We then compared the priming by configuring irrelevant paired stimuli for each condition’s control group. As a result, in the behavioral data, we observed facilitatory priming from a group with high morphological knowledge only under the morphological condition. In contrast, a group with low morphological knowledge showed the priming only under the orthographic condition. In the event-related potential (ERP) data, the group with high morphological knowledge presented the N250 only under the morphological condition. The findings of this study imply that individual differences in morphological knowledge in Korean may have a significant influence on the segmental processing of Korean word recognition.

Keywords: ERP, individual differences, morphological priming, sino-Korean prefixes

19325 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi


Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

19324 Traffic Light Detection Using Image Segmentation

Authors: Vaishnavi Shivde, Shrishti Sinha, Trapti Mishra


Traffic light detection from a moving vehicle is an important technology both for driver safety assistance functions as well as for autonomous driving in the city. This paper proposed a deep-learning-based traffic light recognition method that consists of a pixel-wise image segmentation technique and a fully convolutional network i.e., UNET architecture. This paper has used a method for detecting the position and recognizing the state of the traffic lights in video sequences is presented and evaluated using Traffic Light Dataset which contains masked traffic light image data. The first stage is the detection, which is accomplished through image processing (image segmentation) techniques such as image cropping, color transformation, segmentation of possible traffic lights. The second stage is the recognition, which means identifying the color of the traffic light or knowing the state of traffic light which is achieved by using a Convolutional Neural Network (UNET architecture).

Keywords: traffic light detection, image segmentation, machine learning, classification, convolutional neural networks

19323 Applications of Big Data in Education

Authors: Faisal Kalota


Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

19322 The Anti-Angiogenic Effect of Tectorigenin in a Mouse Model of Retinopathy of Prematurity

Authors: KuiDong Kang, Hye Bin Yim, Su Ah Kim


Purpose: Tectorigenin is an isoflavone derived from the rhizome of Belamacanda chinensis. In this study, oxygen-induced retinopathy was used to characterize the anti-angiogenic properties of tectorigenin in mice. Methods: ICR neonatal mice were exposed to 75% oxygen from postnatal day P7 until P12 and returned to room air (21% oxygen) for five days (P12 to P17). Mice were subjected to daily intraperitoneal injection of tectorigenin (1 mg/kg, 10 mg/kg) and vehicle from P12 to P17. Retro-orbital injection of FITC-dextran was performed and retinal flat mounts were viewed by fluorescence microscopy. The Central avascular area was quantified from the digital images in a masked fashion using image analysis software (NIH ImageJ). Neovascular tufts were quantified by using SWIFT_NV and neovascular lumens were quantified from a histologic section in a masked fashion. Immunohistochemistry and Western blot analysis were also performed to demonstrate the anti-angiogenic activity of this compound in vivo. Results: In the retina of tectorigenin injected mouse (10mg/kg), the central non-perfusion area was significantly decreased compared to the vehicle injected group (1.76±0.5 mm2 vs 2.85±0.6 mm2, P<0.05). In vehicle-injected group, 33.45 ± 5.51% of the total retinal area was avascular, whereas the retinas of pups treated with high-dose (10 mg/kg) tectorigenin showed avascular retinal areas of 21.25 ±4.34% (P<0.05). High dose of tectorigenin also significantly reduced the number of vascular lumens in the histologic section. Tectorigenin (10 mg/kg) significantly reduced the expression of vascular endothelial growth factor (VEGF), matrix metalloproteinase-2 (MMP-2), MMP-9, and angiotensin II compared to the vehicle injected group. Tectorigenin did not affect CD31 abundance at any tested dose. Conclusions: Our results show that tectorigenin possesses powerful anti-angiogenic properties and can attenuate new vessel formation in the retina after systemic administration. These results imply that this compound can be considered as a candidate substance for therapeutic inhibition of retinal angiogenesis.

Keywords: tectorigenin, anti-angiogenic, retinopathy, Belamacanda chinensis

19321 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh


As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

19320 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin


This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

19319 A Partially Accelerated Life Test Planning with Competing Risks and Linear Degradation Path under Tampered Failure Rate Model

Authors: Fariba Azizi, Firoozeh Haghighi, Viliam Makis


In this paper, we propose a method to model the relationship between failure time and degradation for a simple step stress test where underlying degradation path is linear and different causes of failure are possible. It is assumed that the intensity function depends only on the degradation value. No assumptions are made about the distribution of the failure times. A simple step-stress test is used to shorten failure time of products and a tampered failure rate (TFR) model is proposed to describe the effect of the changing stress on the intensities. We assume that some of the products that fail during the test have a cause of failure that is only known to belong to a certain subset of all possible failures. This case is known as masking. In the presence of masking, the maximum likelihood estimates (MLEs) of the model parameters are obtained through an expectation-maximization (EM) algorithm by treating the causes of failure as missing values. The effect of incomplete information on the estimation of parameters is studied through a Monte-Carlo simulation. Finally, a real example is analyzed to illustrate the application of the proposed methods.

Keywords: cause of failure, linear degradation path, reliability function, expectation-maximization algorithm, intensity, masked data

19318 Duration of Isolated Vowels in Infants with Cochlear Implants

Authors: Paris Binos


The present work investigates developmental aspects of the duration of isolated vowels in infants with normal hearing compared to those who received cochlear implants (CIs) before two years of age. Infants with normal hearing produced shorter vowel duration since this find related with more mature production abilities. First isolated vowels are transparent during the protophonic stage as evidence of an increased motor and linguistic control. Vowel duration is a crucial factor for the transition of prelexical speech to normal adult speech. Despite current knowledge of data for infants with normal hearing more research is needed to unravel productions skills in early implanted children. Thus, isolated vowel productions by two congenitally hearing-impaired Greek infants (implantation ages 1:4-1:11; post-implant ages 0:6-1:3) were recorded and sampled for six months after implantation with a Nucleus-24. The results compared with the productions of three normal hearing infants (chronological ages 0:8-1:1). Vegetative data and vocalizations masked by external noise or sounds were excluded. Participants had no other disabilities and had unknown deafness etiology. Prior to implantation the infants had an average unaided hearing loss of 95-110 dB HL while the post-implantation PTA decreased to 10-38 dB HL. The current research offers a methodology for the processing of the prelinguistic productions based on a combination of acoustical and auditory analyses. Based on the current methodological framework, duration measured through spectrograms based on wideband analysis, from the voicing onset to the end of the vowel. The end marked by two co-occurring events: 1) The onset of aperiodicity with a rapid change in amplitude in the waveform and 2) a loss in formant’s energy. Cut-off levels of significance were set at 0.05 for all tests. Bonferroni post hoc tests indicated that difference was significant between the mean duration of vowels of infants wearing CIs and their normal hearing peers. Thus, the mean vowel duration of CIs measured longer compared to the normal hearing peers (0.000). The current longitudinal findings contribute to the existing data for the performance of children wearing CIs at a very young age and enrich also the data of the Greek language. The above described weakness for CI’s performance is a challenge for future work in speech processing and CI’s processing strategies.

Keywords: cochlear implant, duration, spectrogram, vowel

19317 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez


Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

19316 The Critical Relevance of Credit and Debt Data in Household Food Security Analysis: The Risks of Ineffective Response Actions

Authors: Siddharth Krishnaswamy


Problem Statement: Currently, when analyzing household food security, the most commonly studied food access indicators are household income and expenditure. Larger studies do take into account other indices such as credit and employment. But these are baselines studies and by definition are conducted infrequently. Food security analysis for access is usually dedicated to analyzing income and expenditure indicators. And both these indicators are notoriously inconsistent. Yet this data can very often end up being the basis on which household food access is calculated; and by extension, be used for decision making. Objectives: This paper argues that along with income and expenditure, credit and debit information should be collected so that an accurate analysis of household food security (and in particular) food access can be determined. The lack of collection and analysis of this information routinely means that there is often a “masking” of the actual situation; a household’s food access and food availability patterns may be adequate mainly as a result of borrowing and may even be due to a long- term dependency (a debt cycle). In other words, such a household is, in reality, worse off than it appears a factor masked by its performance on basic access indicators. Procedures/methodologies/approaches: Existing food security data sets collected in 2005 in Azerbaijan, 2010 across Myanmar and 2014-15 across Uganda were used to support the theory that analyzing income and expenditure of a HHs and analyzing the same in addition to data on credit & borrowing patterns will result in an entirely different scenario of food access of the household. Furthermore, the data analyzed depicts food consumption patterns across groups of households and then relates this to the extent of dependency on credit, i.e. households borrowing money in order to meet food needs. Finally, response options that were based on analyzing only income and expenditure; and response options based on income, expenditure, credit, and borrowing – from the same geographical area of operation are studied and discussed. Results: The purpose of this work was to see if existing methods of household food security analysis could be improved. It is hoped that food security analysts will collect household level information on credit and debit and analyze them against income, expenditure and consumption patterns. This will help determine if a household’s food access and availability are dependent on unsustainable strategies such as borrowing money for food or undertaking sustained debts. Conclusions: The results clearly show the amount of relevant information that is missing in Food Access analysis if debit and borrowing of the household is not analyzed along with the typical Food Access indicators that are usually analyzed. And the serious repercussions this has on Programmatic response and interventions.

Keywords: analysis, food security indicators, response, resilience analysis

19315 A Long Short-Term Memory Based Deep Learning Model for Corporate Bond Price Predictions

Authors: Vikrant Gupta, Amrit Goswami


The fixed income market forms the basis of the modern financial market. All other assets in financial markets derive their value from the bond market. Owing to its over-the-counter nature, corporate bonds have relatively less data publicly available and thus is researched upon far less compared to Equities. Bond price prediction is a complex financial time series forecasting problem and is considered very crucial in the domain of finance. The bond prices are highly volatile and full of noise which makes it very difficult for traditional statistical time-series models to capture the complexity in series patterns which leads to inefficient forecasts. To overcome the inefficiencies of statistical models, various machine learning techniques were initially used in the literature for more accurate forecasting of time-series. However, simple machine learning methods such as linear regression, support vectors, random forests fail to provide efficient results when tested on highly complex sequences such as stock prices and bond prices. hence to capture these intricate sequence patterns, various deep learning-based methodologies have been discussed in the literature. In this study, a recurrent neural network-based deep learning model using long short term networks for prediction of corporate bond prices has been discussed. Long Short Term networks (LSTM) have been widely used in the literature for various sequence learning tasks in various domains such as machine translation, speech recognition, etc. In recent years, various studies have discussed the effectiveness of LSTMs in forecasting complex time-series sequences and have shown promising results when compared to other methodologies. LSTMs are a special kind of recurrent neural networks which are capable of learning long term dependencies due to its memory function which traditional neural networks fail to capture. In this study, a simple LSTM, Stacked LSTM and a Masked LSTM based model has been discussed with respect to varying input sequences (three days, seven days and 14 days). In order to facilitate faster learning and to gradually decompose the complexity of bond price sequence, an Empirical Mode Decomposition (EMD) has been used, which has resulted in accuracy improvement of the standalone LSTM model. With a variety of Technical Indicators and EMD decomposed time series, Masked LSTM outperformed the other two counterparts in terms of prediction accuracy. To benchmark the proposed model, the results have been compared with traditional time series models (ARIMA), shallow neural networks and above discussed three different LSTM models. In summary, our results show that the use of LSTM models provide more accurate results and should be explored more within the asset management industry.

Keywords: bond prices, long short-term memory, time series forecasting, empirical mode decomposition

19314 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua


In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

19313 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad


Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

19312 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda


The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

19311 Extraction of Urban Building Damage Using Spectral, Height and Corner Information

Authors: X. Wang


Timely and accurate information on urban building damage caused by earthquake is important basis for disaster assessment and emergency relief. Very high resolution (VHR) remotely sensed imagery containing abundant fine-scale information offers a large quantity of data for detecting and assessing urban building damage in the aftermath of earthquake disasters. However, the accuracy obtained using spectral features alone is comparatively low, since building damage, intact buildings and pavements are spectrally similar. Therefore, it is of great significance to detect urban building damage effectively using multi-source data. Considering that in general height or geometric structure of buildings change dramatically in the devastated areas, a novel multi-stage urban building damage detection method, using bi-temporal spectral, height and corner information, was proposed in this study. The pre-event height information was generated using stereo VHR images acquired from two different satellites, while the post-event height information was produced from airborne LiDAR data. The corner information was extracted from pre- and post-event panchromatic images. The proposed method can be summarized as follows. To reduce the classification errors caused by spectral similarity and errors in extracting height information, ground surface, shadows, and vegetation were first extracted using the post-event VHR image and height data and were masked out. Two different types of building damage were then extracted from the remaining areas: the height difference between pre- and post-event was used for detecting building damage showing significant height change; the difference in the density of corners between pre- and post-event was used for extracting building damage showing drastic change in geometric structure. The initial building damage result was generated by combining above two building damage results. Finally, a post-processing procedure was adopted to refine the obtained initial result. The proposed method was quantitatively evaluated and compared to two existing methods in Port au Prince, Haiti, which was heavily hit by an earthquake in January 2010, using pre-event GeoEye-1 image, pre-event WorldView-2 image, post-event QuickBird image and post-event LiDAR data. The results showed that the method proposed in this study significantly outperformed the two comparative methods in terms of urban building damage extraction accuracy. The proposed method provides a fast and reliable method to detect urban building collapse, which is also applicable to relevant applications.

Keywords: building damage, corner, earthquake, height, very high resolution (VHR)

19310 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani


This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

19309 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy


Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

19308 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee


Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

19307 An Enhanced SAR-Based Tsunami Detection System

Authors: Jean-Pierre Dubois, Jihad S. Daba, H. Karam, J. Abdallah


Tsunami early detection and warning systems have proved to be of ultimate importance, especially after the destructive tsunami that hit Japan in March 2012. Such systems are crucial to inform the authorities of any risk of a tsunami and of the degree of its danger in order to make the right decision and notify the public of the actions they need to take to save their lives. The purpose of this research is to enhance existing tsunami detection and warning systems. We first propose an automated and miniaturized model of an early tsunami detection and warning system. The model for the operation of a tsunami warning system is simulated using the data acquisition toolbox of Matlab and measurements acquired from specified internet pages due to the lack of the required real-life sensors, both seismic and hydrologic, and building a graphical user interface for the system. In the second phase of this work, we implement various satellite image filtering schemes to enhance the acquired synthetic aperture radar images of the tsunami affected region that are masked by speckle noise. This enables us to conduct a post-tsunami damage extent study and calculate the percentage damage. We conclude by proposing improvements to the existing telecommunication infrastructure of existing warning tsunami systems using a migration to IP-based networks and fiber optics links.

Keywords: detection, GIS, GSN, GTS, GPS, speckle noise, synthetic aperture radar, tsunami, wiener filter

19306 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis


Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

19305 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis


Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

19304 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker


Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

19303 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima


In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

19302 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro


Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

