Search results for: Text Data Mining
7219 Identification of Conserved Domains and Motifs for GRF Gene Family
Authors: Jafar Ahmadi, Nafiseh Noormohammadi, Sedigheh Fabriki Ourang
Abstract:
GRF, Growth regulating factor, genes encode a novel class of plant-specific transcription factors. The GRF proteins play a role in the regulation of cell numbers in young and growing tissues and may act as transcription activations in growth and development of plants. Identification of GRF genes and their expression are important in plants to performance of the growth and development of various organs. In this study, to better understanding the structural and functional differences of GRFs family, 45 GRF proteins sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H. vulgare and S. bicolor, have been collected and analyzed through bioinformatics data mining. As a result, in secondary structure of GRFs, the number of alpha helices was more than beta sheets and in all of them QLQ domains were completely in the biggest alpha helix. In all GRFs, QLQ and WRC domains were completely protected except in AtGRF9. These proteins have no trans-membrane domain and due to have nuclear localization signals act in nuclear and they are component of unstable proteins in the test tube.
Keywords: Domain, Gene Family, GRF, Motif.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23307218 Photo Mosaic Smartphone Application in Client-Server Based Large-Scale Image Databases
Authors: Sang-Hun Lee, Bum-Soo Kim, Yang-Sae Moon, Jinho Kim
Abstract:
In this paper we present a photo mosaic smartphone application in client-server based large-scale image databases. Photo mosaic is not a new concept, but there are very few smartphone applications especially for a huge number of images in the client-server environment. To support large-scale image databases, we first propose an overall framework working as a client-server model. We then present a concept of image-PAA features to efficiently handle a huge number of images and discuss its lower bounding property. We also present a best-match algorithm that exploits the lower bounding property of image-PAA. We finally implement an efficient Android-based application and demonstrate its feasibility.Keywords: smartphone applications; photo mosaic; similarity search; data mining; large-scale image databases.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16717217 Reading against the Grain: Transcodifying Stimulus Meaning
Authors: Aba-Carina Pârlog
Abstract:
The paper shows that on transferring sense from the SL to the TL, the translator’s reading against the grain determines the creation of a faulty pattern of rendering the original meaning in the receiving culture which reflects the use of misleading transformative codes. In this case, the translator is a writer per se who decides what goes in and out of the book, how the style is to be ciphered and what elements of ideology are to be highlighted. The paper also proves that figurative language must not be flattened for the sake of clarity or naturalness. The missing figurative elements make the translated text less interesting, less challenging and less vivid which reflects poorly on the writer. There is a close connection between style and the writer’s person. If the writer’s style is very much altered in a translation, the translation is useless as the original writer and his / her imaginative world can no longer be discovered. The purpose of the paper is to prove that adaptation is a dangerous tool which leads to variants that sometimes reflect the original less than the reader would wish to. It contradicts the very essence of the process of translation which is that of making an original work available in a foreign language. If the adaptive transformative codes are so flexible that they encourage the translator to repeatedly leave out parts of the original work, then a subversive pattern emerges which changes the entire book. In conclusion, as a result of using adaptation, manipulative or subversive effects are created in the translated work. This is generally achieved by adding new words or connotations, creating new figures of speech or using explicitations. The additional meanings of the original work are neglected and the translator creates new meanings, implications, emphases and contexts. Again s/he turns into a new author who enjoys the freedom of expressing his / her own ideas without the constraints of the original text. Reading against the grain is unadvisable during the process of translation and consequently, following personal common sense becomes essential in the field of translation as well as everywhere else, so that translation should not become a source of fantasy.Keywords: Speculative aesthetics, substance of expression, transformative code, translation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16537216 Query Algebra for Semistuctured Data
Authors: Ei Ei Myat, Ni Lar Thein
Abstract:
With the tremendous growth of World Wide Web (WWW) data, there is an emerging need for effective information retrieval at the document level. Several query languages such as XML-QL, XPath, XQL, Quilt and XQuery are proposed in recent years to provide faster way of querying XML data, but they still lack of generality and efficiency. Our approach towards evolving a framework for querying semistructured documents is based on formal query algebra. Two elements are introduced in the proposed framework: first, a generic and flexible data model for logical representation of semistructured data and second, a set of operators for the manipulation of objects defined in the data model. In additional to accommodating several peculiarities of semistructured data, our model offers novel features such as bidirectional paths for navigational querying and partitions for data transformation that are not available in other proposals.Keywords: Algebra, Semistructured data, Query Algebra.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13777215 Simulation Data Summarization Based on Spatial Histograms
Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura
Abstract:
In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.Keywords: Simulation data, data summarization, spatial histograms, exploration and visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7557214 A Comparison and Analysis of Name Matching Algorithms
Authors: Chakkrit Snae
Abstract:
Names are important in many societies, even in technologically oriented ones which use e.g. ID systems to identify individual people. Names such as surnames are the most important as they are used in many processes, such as identifying of people and genealogical research. On the other hand variation of names can be a major problem for the identification and search for people, e.g. web search or security reasons. Name matching presumes a-priori that the recorded name written in one alphabet reflects the phonetic identity of two samples or some transcription error in copying a previously recorded name. We add to this the lode that the two names imply the same person. This paper describes name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search. The implementation contains algorithms for computing a range of fuzzy matching based on different types of algorithms, e.g. composite and hybrid methods and allowing us to test and measure algorithms for accuracy. NYSIIS, LIG2 and Phonex have been shown to perform well and provided sufficient flexibility to be included in the linkage/matching process for optimising name searching.Keywords: Data mining, name matching algorithm, nominaldata, searching system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 110917213 Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis
Authors: Reza Nadimi, Fariborz Jolai
Abstract:
This article combines two techniques: data envelopment analysis (DEA) and Factor analysis (FA) to data reduction in decision making units (DMU). Data envelopment analysis (DEA), a popular linear programming technique is useful to rate comparatively operational efficiency of decision making units (DMU) based on their deterministic (not necessarily stochastic) input–output data and factor analysis techniques, have been proposed as data reduction and classification technique, which can be applied in data envelopment analysis (DEA) technique for reduction input – output data. Numerical results reveal that the new approach shows a good consistency in ranking with DEA.Keywords: Effectiveness, Decision Making, Data EnvelopmentAnalysis, Factor Analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24267212 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach
Authors: Rajvir Kaur, Jeewani Anupama Ginige
Abstract:
With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15557211 Characteristics of Cognitive Functions among Polish Adolescence with Spelling Disorders
Authors: Izabela Pietras
Abstract:
The level of visual abilities, language, memory processes and intellectual functioning development affects the quality of a written text. The following analysis will present the results of diagnostic tests indicating the most common criterion for a group and stating whether a person has been diagnosed with having cognitive developmental level below the group-s average or not.The study-s aim is to determine whether there are specific patterns of cognitive deficits, which can be distinguished among the group of young people with spelling disorders.Keywords: cognitive deficits, cognitive functions, spellingdisorders
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13857210 An Empirical Analysis of the Impact of Selected Macroeconomic Variables on Capital Formation in Libya (1970–2010)
Authors: Khaled Ramadan Elbeydi
Abstract:
This study is carried out to provide an insight into the analysis of the impact of selected macro-economic variables on gross fixed capital formation in Libya using annual data over the period (1970-2010). The importance of this study comes from the ability to show the relative important factors that impact the Libyan gross fixed capital formation. This understanding would give indications to decision makers on which policy they must focus to stimulate the economy. An Autoregressive Distributed Lag (ARDL) modeling process is employed to investigate the impact of the Gross Domestic Product, Monetary Base and Trade Openness on Gross Fixed Capital Formation in Libya. The results of this study reveal that there is an equilibrium relationship between capital formation and its determinants. The results also indicate that GDP and trade openness largely explain the pattern of capital formation in Libya. The findings and recommendations provide vital information relevant for policy formulation and implementation aimed to improve capital formation in Libya.
Keywords: ARDL, Bounds test, capital formation, Cointegration, Libya.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17297209 An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching
Authors: Chinmay Soman, Hrishikesh Pathak, Vishal Shah, Aniket Padhye, Amey Inamdar
Abstract:
Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.Keywords: World Wide Web, Phishing, Internet security, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18337208 Digital Preservation in Nigeria Universities Libraries: A Comparison between University of Nigeria Nsukka and Ahmadu Bello University Zaria
Authors: Suleiman Musa, Shuaibu Sidi Safiyanu
Abstract:
This study examined the digital preservation in Nigeria university libraries. A comparison between the university of Nigeria Nsukka (UNN) and Ahmadu Bello University Zaria (ABU, Zaria). The study utilized primary source of data obtained from two selected institution librarians. Finding revealed varying results in terms of skills acquired by librarians before and after digitization of the two institutions. The study reports that journals publication, text book, CD-ROMS, conference papers and proceedings, theses, dissertations and seminar papers are among the information resources available for digitization. The study further documents that copyright issue, power failure, and unavailability of needed materials are among the challenges facing the digitization of library of the institution. On the basis of the finding, the study concluded that digitization of library enhances efficiency in organization and retrieval of information services. The study therefore recommended that software should be upgraded with backup, training of the librarians on digital process, installation of antivirus and enhancement of technical collaboration between the library and MIS.Keywords: Digitalization, preservation, libraries, comparison.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17267207 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification
Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman
Abstract:
In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26997206 Attacks Classification in Adaptive Intrusion Detection using Decision Tree
Authors: Dewan Md. Farid, Nouria Harbi, Emna Bahri, Mohammad Zahidur Rahman, Chowdhury Mofizur Rahman
Abstract:
Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.Keywords: Detection rate, decision tree, intrusion detectionsystem, network security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36337205 The Development of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications
Authors: Mohamed R. Mhereeg
Abstract:
The paper investigates the feasibility of constructing a software multi-agent based monitoring and classification system and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. The agents function autonomously to provide continuous and periodic monitoring of excels spreadsheet workbooks. Resulting in, the development of the MultiAgent classification System (MACS) that is in compliance with the specifications of the Foundation for Intelligent Physical Agents (FIPA). However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies that are Windows Communication Foundation (WCF) services, Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW that is in order to satisfy the monitoring and classification of the multiple developer aspect. ODM was used to automate the classification phase of MACS.
Keywords: Autonomous, Classification, MACS, Multi-Agent, SOA, WCF.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15917204 Privacy Concerns and Law Enforcement Data Collection to Tackle Domestic and Sexual Violence
Authors: Francesca Radice
Abstract:
It has been observed that violent or coercive behaviour has been apparent from initial conversations on dating apps like Tinder. Child pornography, stalking, and coercive control are some criminal offences from dating apps, including women murdered after finding partners through Tinder. Police databases and predictive policing are novel approaches taken to prevent crime before harm is done. This research will investigate how police databases can be used in a privacy-preserving way to characterise users in terms of their potential for violent crime. Using the COPS database of NSW Police, we will explore how the past criminal record can be interpreted to yield a category of potential danger for each dating app user. It is up to the judgement of each subscriber on what degree of the potential danger they are prepared to enter into. Sentiment analysis is an area where research into natural language processing has made great progress over the last decade. This research will investigate how sentiment analysis can be used to interpret interchanges between dating app users to detect manipulative or coercive sentiments. These can be used to alert law enforcement if continued for a defined number of communications. One of the potential problems of this approach is the potential prejudice a categorisation can cause. Another drawback is the possibility of misinterpreting communications and involving law enforcement without reason. The approach will be thoroughly tested with cross-checks by human readers who verify both the level of danger predicted by the interpretation of the criminal record and the sentiment detected from personal messages. Even if only a few violent crimes can be prevented, the approach will have a tangible value for real people.
Keywords: Sentiment Analysis, data mining, predictive policing, virtual manipulation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2647203 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area
Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim
Abstract:
In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.Keywords: Data Estimation, link data, machine learning, road network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15047202 CNet Module Design of IMCS
Authors: Youkyung Park, SeungYup Kang, SungHo Kim, SimKyun Yook
Abstract:
IMCS is Integrated Monitoring and Control System for thermal power plant. This system consists of mainly two parts; controllers and OIS (Operator Interface System). These two parts are connected by Ethernet-based communication. The controller side of communication is managed by CNet module and OIS side is managed by data server of OIS. CNet module sends the data of controller to data server and receives commend data from data server. To minimizes or balance the load of data server, this module buffers data created by controller at every cycle and send buffered data to data server on request of data server. For multiple data server, this module manages the connection line with each data server and response for each request from multiple data server. CNet module is included in each controller of redundant system. When controller fail-over happens on redundant system, this module can provide data of controller to data sever without loss. This paper presents three main features – separation of get task, usage of ring buffer and monitoring communication status –of CNet module to carry out these functions.Keywords: Ethernet communication, DCS, power plant, ring buffer, data integrity
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15667201 A Computational Model for Resolving Pronominal Anaphora in Turkish Using Hobbs- Naïve Algorithm
Authors: Pınar Tüfekçi, Yılmaz Kılıçaslan
Abstract:
In this paper we present a computational model for pronominal anaphora resolution in Turkish. The model is based on Hobbs’ Naїve Algorithm [4, 5, 6], which exploits only the surface syntax of sentences in a given text.
Keywords: Anaphora Resolution, Pronoun Resolution, Syntax based Algorithms, Naїve Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32397200 Big Data: Concepts, Technologies and Applications in the Public Sector
Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora
Abstract:
Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.
Keywords: Big data, big data Analytics, Hadoop framework, cloud computing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23237199 The Prevalence of Organized Retail Crime in Riyadh, Saudi Arabia
Authors: Saleh Dabil
Abstract:
This study investigates the level of existence of organized retail crime in supermarkets of Riyadh, Saudi Arabia. The store managers, security managers and general employees were asked about the types of retail crimes occur in the stores. Three independent variables were related to the report of organized retail theft. The independent variables are: 1) the supermarket profile (volume, location, standard and type of the store), 2) the social physical environment of the store (maintenance, cleanness and overall organizational cooperation), 3) the security techniques and loss prevention electronics techniques used. The theoretical framework of this study based on the social disorganization theory. This study concluded that the organized retail theft, in specific, organized theft is moderately apparent in Riyadh stores. The general result showed that the environment of the stores has an effect on the prevalence of organized retail theft with relation to the gender of thieves, age groups, working shift, type of stolen items as well as the number of thieves in one case. Among other reasons, some factors of the organized theft are: economic pressure of customers based on the location of the store. The dealing of theft also was investigated to have a clear picture of stores dealing with organized retail theft. The result showed that mostly, thieves sent without any action and sometimes given written warning. Very few cases dealt with by police. There are other factors in the study can be looked up in the text. This study suggests solving the problem of organized theft; first, is "the well distributing of the duties and responsibilities between the employees especially for security purposes". Second "Installation of strong security system" and "Making well-designed store layout". Third is "giving training for general employees" and "to give periodically security skills training of employees". There are other suggestions in the study can be looked up in the text.
Keywords: Organized Crime, Retail, Theft, Loss prevention, Store environment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23377198 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights
Authors: Tomy Prihananto, Damar Apri Sudarmadi
Abstract:
Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.
Keywords: Indonesia, protection, personal data, privacy, human rights, encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9907197 Big Brain: A Single Database System for a Federated Data Warehouse Architecture
Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf
Abstract:
Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.Keywords: Data integration, data warehousing, federated architecture, online analytical processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7117196 An In-Depth Analysis of Open Data Portals as an Emerging Public E-Service
Authors: Martin Lnenicka
Abstract:
Governments collect and produce large amounts of data. Increasingly, governments worldwide have started to implement open data initiatives and also launch open data portals to enable the release of these data in open and reusable formats. Therefore, a large number of open data repositories, catalogues and portals have been emerging in the world. The greater availability of interoperable and linkable open government data catalyzes secondary use of such data, so they can be used for building useful applications which leverage their value, allow insight, provide access to government services, and support transparency. The efficient development of successful open data portals makes it necessary to evaluate them systematic, in order to understand them better and assess the various types of value they generate, and identify the required improvements for increasing this value. Thus, the attention of this paper is directed particularly to the field of open data portals. The main aim of this paper is to compare the selected open data portals on the national level using content analysis and propose a new evaluation framework, which further improves the quality of these portals. It also establishes a set of considerations for involving businesses and citizens to create eservices and applications that leverage on the datasets available from these portals.
Keywords: Big data, content analysis, criteria comparison, data quality, open data, open data portals, public sector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 30837195 File System-Based Data Protection Approach
Authors: Jaechun No
Abstract:
As data to be stored in storage subsystems tremendously increases, data protection techniques have become more important than ever, to provide data availability and reliability. In this paper, we present the file system-based data protection (WOWSnap) that has been implemented using WORM (Write-Once-Read-Many) scheme. In the WOWSnap, once WORM files have been created, only the privileged read requests to them are allowed to protect data against any intentional/accidental intrusions. Furthermore, all WORM files are related to their protection cycle that is a time period during which WORM files should securely be protected. Once their protection cycle is expired, the WORM files are automatically moved to the general-purpose data section without any user interference. This prevents the WORM data section from being consumed by unnecessary files. We evaluated the performance of WOWSnap on Linux cluster.Keywords: Data protection, Protection cycle, WORM
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16817194 Performance Analysis of Search Medical Imaging Service on Cloud Storage Using Decision Trees
Authors: González A. Julio, Ramírez L. Leonardo, Puerta A. Gabriel
Abstract:
Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.
Keywords: Cloud storage, decision trees, diagnostic image, search, telemedicine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9507193 Landscape Data Transformation: Categorical Descriptions to Numerical Descriptors
Authors: Dennis A. Apuan
Abstract:
Categorical data based on description of the agricultural landscape imposed some mathematical and analytical limitations. This problem however can be overcome by data transformation through coding scheme and the use of non-parametric multivariate approach. The present study describes data transformation from qualitative to numerical descriptors. In a collection of 103 random soil samples over a 60 hectare field, categorical data were obtained from the following variables: levels of nitrogen, phosphorus, potassium, pH, hue, chroma, value and data on topography, vegetation type, and the presence of rocks. Categorical data were coded, and Spearman-s rho correlation was then calculated using PAST software ver. 1.78 in which Principal Component Analysis was based. Results revealed successful data transformation, generating 1030 quantitative descriptors. Visualization based on the new set of descriptors showed clear differences among sites, and amount of variation was successfully measured. Possible applications of data transformation are discussed.Keywords: data transformation, numerical descriptors, principalcomponent analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15077192 Lexical Based Method for Opinion Detection on Tripadvisor Collection
Authors: Faiza Belbachir, Thibault Schienhinski
Abstract:
The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.Keywords: Tripadvisor, Opinion detection, SentiWordNet, trust score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7527191 A Survey of Semantic Integration Approaches in Bioinformatics
Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir
Abstract:
Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18197190 Smartphone Photography in Urban China
Authors: Wen Zhang
Abstract:
The smartphone plays a significant role in media convergence, and smartphone photography is reconstructing the way we communicate and think. This article aims to explore the smartphone photography practices of urban Chinese smartphone users and images produced by smartphones from a techno-cultural perspective. The analysis consists of two types of data: One is a semi-structured interview of 21 participants, and the other consists of the images created by the participants. The findings are organised in two parts. The first part summarises the current tendencies of capturing, editing, sharing and archiving digital images via smartphones. The second part shows that food and selfie/anti-selfie are the preferred subjects of smartphone photographic images from a technical and multi-purpose perspective and demonstrates that screenshots and image texts are new genres of non-photographic images that are frequently made by smartphones, which contributes to improving operational efficiency, disseminating information and sharing knowledge. The analyses illustrate the positive impacts between smartphones and photography enthusiasm and practices based on the diffusion of innovation theory, which also makes us rethink the value of photographs and the practice of ‘photographic seeing’ from the screen itself.
Keywords: Digital photography, photographic-seeing, media convergence, technological innovation, smartphone, selfie/anti-selfie, image-text.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672