Search results for: source documents
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5495

Search results for: source documents

5465 Precarious ID Cards - Studying Documentary Practices in India through the Lens of Internal Migration

Authors: Ambuja Raj

Abstract:

This research will attempt to understand how documents are materially indispensable civic artifacts for migrants in their encounters with the state. Documents such as ID cards are sites of mediation and bureaucratic manifestation which reveal the inherent dynamics of power between the state and a delocalized people. While ID cards allow the holder to retain a different identity and articulate their demands as a citizen, they at the same time transform subjects into ‘objects’ in the exercise of governmental power. The research is based on the study of internal migrants in India, who are ‘visible’ to the state through its host of ID documents such as the ‘Aadhaar card’, electoral IDs, Ration cards, and a variety of region-specific documents, without the possession of which, not only are they unable to access jobs, public goods and services, and accommodation, but are liable to exploitation from state forces and mediators. Through semi-structured interviews with social actors in the processes of documentation and welfare of migrants, as well as with settlements of migrants themselves located in the state of Kerala in India, the thesis will attempt to understand the salience of documentary practices in the lives of inter-state migrants who move within Indian states in the hope of bettering their economic conditions. The research will trace the material and evolving significance of ID cards in the tenacity of states dealing with these ‘illegible’ populations. It will try to bring theories of governmentality, biopolitics and Weberian bureaucracy into the migrant issue while critically grounding itself on secondary literature by scholars who have worked on South Asian ‘governments of paper’.

Keywords: migration, historiography of documents, anthropology of state, documentary practices

Procedia PDF Downloads 188
5464 Documents Emotions Classification Model Based on TF-IDF Weighting Measure

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Emotions classification of text documents is applied to reveal if the document expresses a determined emotion from its writer. As different supervised methods are previously used for emotion documents’ classification, in this research we present a novel model that supports the classification algorithms for more accurate results by the support of TF-IDF measure. Different experiments have been applied to reveal the applicability of the proposed model, the model succeeds in raising the accuracy percentage according to the determined metrics (precision, recall, and f-measure) based on applying the refinement of the lexicon, integration of lexicons using different perspectives, and applying the TF-IDF weighting measure over the classifying features. The proposed model has also been compared with other research to prove its competence in raising the results’ accuracy.

Keywords: emotion detection, TF-IDF, WEKA tool, classification algorithms

Procedia PDF Downloads 483
5463 An Approach of Computer Modalities for Exploration of Hieroglyphics Substantial in an Investigation

Authors: Aditi Chauhan, Neethu S. Mohan

Abstract:

In the modern era, the advancement and digitalization in technology have taken place during an investigation of crime scene. The rapid enhancement and investigative techniques have changed the mean of identification of suspect. Identification of the person is one of the significant aspects, and personal authentication is the key of security and reliability in society. Since early 90 s, people have relied on comparing handwriting through its class and individual characteristics. But in today’s 21st century we need more reliable means to identify individual through handwriting. An approach employing computer modalities have lately proved itself auspicious enough in exploration of hieroglyphics substantial in investigating the case. Various software’s such as FISH, WRITEON, and PIKASO, CEDAR-FOX SYSTEM identify and verify the associated quantitative measure of the similarity between two samples. The research till date has been confined to identify the authorship of the concerned samples. But prospects associated with the use of computational modalities might help to identify disguised writing, forged handwriting or say altered or modified writing. Considering the applications of such modal, similar work is sure to attract plethora of research in immediate future. It has a promising role in national security too. Documents exchanged among terrorist can also be brought under the radar of surveillance, bringing forth their source of existence.

Keywords: documents, identity, computational system, suspect

Procedia PDF Downloads 176
5462 The Exploration Targets of the Nanpu Sag: Insight from Organic Geochemical Characteristics of Source Rocks and Oils

Authors: Lixin Pei, Zhilong Huang, Wenzhe Gang

Abstract:

Organic geochemistry of source rocks and oils in the Nanpu Sag, Bohai Bay basin was studied on the basis of the results of Rock-Eval and biomarker. The possible source rocks consist of the third member (Es₃) and the first member (Es₁) of Shahejie formation and the third member of Dongying Formation (Ed₃) in the Nanpu Sag. The Es₃, Es₁, and Ed₃ source rock intervals in the Nanpu Sag all have high organic-matter richness and are at hydrocarbon generating stage, which are regarded as effective source rocks. The three possible source rock intervals have different biomarker associations and can be differentiated by gammacerane/αβ C₃₀ hopane, ETR ([C₂₈+C₂₉]/ [C₂₈+C₂₉+Ts]), C₂₇ diasterane/sterane and C₂₇/C₂₉ steranes, which suggests they deposited in different environments. Based on the oil-source rock correlation, the shallow oils mainly originated from the Es₃ and Es₁ source rocks in the Nanpu Sag. Through hydrocarbon generation and expulsion history of the source rocks, trap development history and accumulation history, the shallow oils mainly originated from paleo-reservoirs in the Es₃ and Es₁ during the period of Neotectonism, and the residual paleo-reservoirs in the Es₃ and Es₁ would be the focus targets in the Nanpu Sag; Bohai Bay Basin.

Keywords: source rock, biomarker association, Nanpu Sag, Bohai Bay Basin

Procedia PDF Downloads 373
5461 Text Localization in Fixed-Layout Documents Using Convolutional Networks in a Coarse-to-Fine Manner

Authors: Beier Zhu, Rui Zhang, Qi Song

Abstract:

Text contained within fixed-layout documents can be of great semantic value and so requires a high localization accuracy, such as ID cards, invoices, cheques, and passports. Recently, algorithms based on deep convolutional networks achieve high performance on text detection tasks. However, for text localization in fixed-layout documents, such algorithms detect word bounding boxes individually, which ignores the layout information. This paper presents a novel architecture built on convolutional neural networks (CNNs). A global text localization network and a regional bounding-box regression network are introduced to tackle the problem in a coarse-to-fine manner. The text localization network simultaneously locates word bounding points, which takes the layout information into account. The bounding-box regression network inputs the features pooled from arbitrarily sized RoIs and refine the localizations. These two networks share their convolutional features and are trained jointly. A typical type of fixed-layout documents: ID cards, is selected to evaluate the effectiveness of the proposed system. These networks are trained on data cropped from nature scene images, and synthetic data produced by a synthetic text generation engine. Experiments show that our approach locates high accuracy word bounding boxes and achieves state-of-the-art performance.

Keywords: bounding box regression, convolutional networks, fixed-layout documents, text localization

Procedia PDF Downloads 194
5460 Investigation of Topic Modeling-Based Semi-Supervised Interpretable Document Classifier

Authors: Dasom Kim, William Xiu Shun Wong, Yoonjin Hyun, Donghoon Lee, Minji Paek, Sungho Byun, Namgyu Kim

Abstract:

There have been many researches on document classification for classifying voluminous documents automatically. Through document classification, we can assign a specific category to each unlabeled document on the basis of various machine learning algorithms. However, providing labeled documents manually requires considerable time and effort. To overcome the limitations, the semi-supervised learning which uses unlabeled document as well as labeled documents has been invented. However, traditional document classifiers, regardless of supervised or semi-supervised ones, cannot sufficiently explain the reason or the process of the classification. Thus, in this paper, we proposed a methodology to visualize major topics and class components of each document. We believe that our methodology for visualizing topics and classes of each document can enhance the reliability and explanatory power of document classifiers.

Keywords: data mining, document classifier, text mining, topic modeling

Procedia PDF Downloads 402
5459 The Passive Recipient – How the Pupil Comes across in Local Swedish Health Policy Documents

Authors: Zofia Hammerin, Goran Basic, Disa Bergnehr

Abstract:

Ever since the Ottawa charter in 1986, health promotion through schools has been stressed across the globe. Both in the global and national discourse, schools are made responsible not only for providing education but also for working with pupil health and well-being. In Sweden, where the study is set, it is emphasized in national directives that promoting pupil health should be part of the school practice. Since the Swedish school system is decentralized, these directives need to be interpreted and recontextualized locally. This study aims to explore how the student comes across in Swedish local health policy documents. The data consists of 37 such documents called student health plans collected from different high schools throughout Sweden. The analysis was inspired by critical discourse analysis, and tentative results are divided into two main themes; the invisible actor and the passive recipient. The pupil is largely invisible in the documents, and the discourse instead focuses on school health service staff and, to some extent, the teachers. When the pupils are visible, they mainly come across as passive recipients of health promoting actions. Since participation, taking action, and feeling empowered are key aspects of health promotion, the findings could impact the pupils’ possibilities for health and well-being.

Keywords: health promotion, high school, student, sweden

Procedia PDF Downloads 101
5458 An Introductory Study on Optimization Algorithm for Movable Sensor Network-Based Odor Source Localization

Authors: Yossiri Ariyakul, Piyakiat Insom, Poonyawat Sangiamkulthavorn, Takamichi Nakamoto

Abstract:

In this paper, the method of optimization algorithm for sensor network comprised of movable sensor nodes which can be used for odor source localization was proposed. A sensor node is composed of an odor sensor, an anemometer, and a wireless communication module. The odor intensity measured from the sensor nodes are sent to the processor to perform the localization based on optimization algorithm by which the odor source localization map is obtained as a result. The map can represent the exact position of the odor source or show the direction toward it remotely. The proposed method was experimentally validated by creating the odor source localization map using three, four, and five sensor nodes in which the accuracy to predict the position of the odor source can be observed.

Keywords: odor sensor, odor source localization, optimization, sensor network

Procedia PDF Downloads 299
5457 Forensic Challenges in Source Device Identification for Digital Videos

Authors: Mustapha Aminu Bagiwa, Ainuddin Wahid Abdul Wahab, Mohd Yamani Idna Idris, Suleman Khan

Abstract:

Video source device identification has become a problem of concern in numerous domains especially in multimedia security and digital investigation. This is because videos are now used as evidence in legal proceedings. Source device identification aim at identifying the source of digital devices using the content they produced. However, due to affordable processing tools and the influx in digital content generating devices, source device identification is still a major problem within the digital forensic community. In this paper, we discuss source device identification for digital videos by identifying techniques that were proposed in the literature for model or specific device identification. This is aimed at identifying salient open challenges for future research.

Keywords: video forgery, source camcorder, device identification, forgery detection

Procedia PDF Downloads 631
5456 Authentication of Physical Objects with Dot-Based 2D Code

Authors: Michał Glet, Kamil Kaczyński

Abstract:

Counterfeit goods and documents are a global problem, which needs more and more sophisticated methods of resolving it. Existing techniques using watermarking or embedding symbols on objects are not suitable for all use cases. To address those special needs, we created complete system allowing authentication of paper documents and physical objects with flat surface. Objects are marked using orientation independent and resistant to camera noise 2D graphic codes, named DotAuth. Based on the identifier stored in 2D code, the system is able to perform basic authentication and allows to conduct more sophisticated analysis methods, e.g., relying on augmented reality and physical properties of the object. In this paper, we present the complete architecture, algorithms and applications of the proposed system. Results of the features comparison of the proposed solution and other products are presented as well, pointing to the existence of many advantages that increase usability and efficiency in the means of protecting physical objects.

Keywords: anti-forgery, authentication, paper documents, security

Procedia PDF Downloads 133
5455 A Social Identity Analysis of Ottoman and Safavid Architects in the Historical Documents of the 16th to 17th Centuries

Authors: Farzaneh Farrokhfar, Mohammad Khazaie

Abstract:

The 16th and 17th centuries coincide with the classical age of Ottoman art history. Simultaneously with this age and in the eastern neighborhood of the Ottoman state, the Safavid Shiite state emerged, which, despite political and religious differences with the Ottomans, played an important role in cultural and artistic exchanges with Anatolia. The harmony of arts, including architecture, is one of the most important manifestations of cultural exchange between the two regions, which shows the intellectual commonalities of the two regions. In parallel with the production of works of art, the registration of information and identities of Ottoman and Safavid artists and craftsmen has been done by many historians and biographers, some of whom, fortunately, are available to us today and can be evaluated. This research first intends to read historical documents and reports related to the architects of the two Ottoman states in Anatolia and Safavid states in Iran in the 16th and 17th centuries and then examines the status of architects' information records and their location in the two regions. The results reveal the names and identities of some Ottoman and Safavid architects in the 16th and 17th centuries and show the method of recording information in the documents of the two regions. This research is done in a comparative historical method, and the method of collecting its resources is a documentary library.

Keywords: classical era, Ottoman architecture, Safavid architecture, Central Asian historical documents

Procedia PDF Downloads 129
5454 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 399
5453 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 458
5452 Calculation of Detection Efficiency of Horizontal Large Volume Source Using Exvol Code

Authors: M. Y. Kang, Euntaek Yoon, H. D. Choi

Abstract:

To calculate the full energy (FE) absorption peak efficiency for arbitrary volume sample, we developed and verified the EXVol (Efficiency calculator for EXtended Voluminous source) code which is based on effective solid angle method. EXVol is possible to describe the source area as a non-uniform three-dimensional (x, y, z) source. And decompose and set it into several sets of volume units. Users can equally divide (x, y, z) coordinate system to calculate the detection efficiency at a specific position of a cylindrical volume source. By determining the detection efficiency for differential volume units, the total radiative absolute distribution and the correction factor of the detection efficiency can be obtained from the nondestructive measurement of the source. In order to check the performance of the EXVol code, Si ingot of 20 cm in diameter and 50 cm in height were used as a source. The detector was moved at the collimation geometry to calculate the detection efficiency at a specific position and compared with the experimental values. In this study, the performance of the EXVol code was extended to obtain the detection efficiency distribution at a specific position in a large volume source.

Keywords: attenuation, EXVol, detection efficiency, volume source

Procedia PDF Downloads 185
5451 Statistical Discrimination of Blue Ballpoint Pen Inks by Diamond Attenuated Total Reflectance (ATR) FTIR

Authors: Mohamed Izzharif Abdul Halim, Niamh Nic Daeid

Abstract:

Determining the source of pen inks used on a variety of documents is impartial for forensic document examiners. The examination of inks is often performed to differentiate between inks in order to evaluate the authenticity of a document. A ballpoint pen ink consists of synthetic dyes in (acidic and/or basic), pigments (organic and/or inorganic) and a range of additives. Inks of similar color may consist of different composition and are frequently the subjects of forensic examinations. This study emphasizes on blue ballpoint pen inks available in the market because it is reported that approximately 80% of questioned documents analysis involving ballpoint pen ink. Analytical techniques such as thin layer chromatography, high-performance liquid chromatography, UV-vis spectroscopy, luminescence spectroscopy and infrared spectroscopy have been used in the analysis of ink samples. In this study, application of Diamond Attenuated Total Reflectance (ATR) FTIR is straightforward but preferable in forensic science as it offers no sample preparation and minimal analysis time. The data obtained from these techniques were further analyzed using multivariate chemometric methods which enable extraction of more information based on the similarities and differences among samples in a dataset. It was indicated that some pens from the same manufactures can be similar in composition, however, discrete types can be significantly different.

Keywords: ATR FTIR, ballpoint, multivariate chemometric, PCA

Procedia PDF Downloads 457
5450 Exploring Social Impact of Emerging Technologies from Futuristic Data

Authors: Heeyeul Kwon, Yongtae Park

Abstract:

Despite the highly touted benefits, emerging technologies have unleashed pervasive concerns regarding unintended and unforeseen social impacts. Thus, those wishing to create safe and socially acceptable products need to identify such side effects and mitigate them prior to the market proliferation. Various methodologies in the field of technology assessment (TA), namely Delphi, impact assessment, and scenario planning, have been widely incorporated in such a circumstance. However, literatures face a major limitation in terms of sole reliance on participatory workshop activities. They unfortunately missed out the availability of a massive untapped data source of futuristic information flooding through the Internet. This research thus seeks to gain insights into utilization of futuristic data, future-oriented documents from the Internet, as a supplementary method to generate social impact scenarios whilst capturing perspectives of experts from a wide variety of disciplines. To this end, network analysis is conducted based on the social keywords extracted from the futuristic documents by text mining, which is then used as a guide to produce a comprehensive set of detailed scenarios. Our proposed approach facilitates harmonized depictions of possible hazardous consequences of emerging technologies and thereby makes decision makers more aware of, and responsive to, broad qualitative uncertainties.

Keywords: emerging technologies, futuristic data, scenario, text mining

Procedia PDF Downloads 491
5449 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 402
5448 Analysis of Joint Source Channel LDPC Coding for Correlated Sources Transmission over Noisy Channels

Authors: Marwa Ben Abdessalem, Amin Zribi, Ammar Bouallègue

Abstract:

In this paper, a Joint Source Channel coding scheme based on LDPC codes is investigated. We consider two concatenated LDPC codes, one allows to compress a correlated source and the second to protect it against channel degradations. The original information can be reconstructed at the receiver by a joint decoder, where the source decoder and the channel decoder run in parallel by transferring extrinsic information. We investigate the performance of the JSC LDPC code in terms of Bit-Error Rate (BER) in the case of transmission over an Additive White Gaussian Noise (AWGN) channel, and for different source and channel rate parameters. We emphasize how JSC LDPC presents a performance tradeoff depending on the channel state and on the source correlation. We show that, the JSC LDPC is an efficient solution for a relatively low Signal-to-Noise Ratio (SNR) channel, especially with highly correlated sources. Finally, a source-channel rate optimization has to be applied to guarantee the best JSC LDPC system performance for a given channel.

Keywords: AWGN channel, belief propagation, joint source channel coding, LDPC codes

Procedia PDF Downloads 357
5447 Topological Sensitivity Analysis for Reconstruction of the Inverse Source Problem from Boundary Measurement

Authors: Maatoug Hassine, Mourad Hrizi

Abstract:

In this paper, we consider a geometric inverse source problem for the heat equation with Dirichlet and Neumann boundary data. We will reconstruct the exact form of the unknown source term from additional boundary conditions. Our motivation is to detect the location, the size and the shape of source support. We present a one-shot algorithm based on the Kohn-Vogelius formulation and the topological gradient method. The geometric inverse source problem is formulated as a topology optimization one. A topological sensitivity analysis is derived from a source function. Then, we present a non-iterative numerical method for the geometric reconstruction of the source term with unknown support using a level curve of the topological gradient. Finally, we give several examples to show the viability of our presented method.

Keywords: geometric inverse source problem, heat equation, topological optimization, topological sensitivity, Kohn-Vogelius formulation

Procedia PDF Downloads 300
5446 Psychodidactic Strategies to Facilitate Flow of Logical Thinking in Preparation of Academic Documents

Authors: Deni Stincer Gomez, Zuraya Monroy Nasr, Luis Pérez Alvarez

Abstract:

The preparation of academic documents such as thesis, articles and research projects is one of the requirements of the higher educational level. These documents demand the implementation of logical argumentative thinking which is experienced and executed with difficulty. To mitigate the effect of these difficulties this study designed a thesis seminar, with which the authors have seven years of experience. It is taught in a graduate program in Psychology at the National Autonomous University of Mexico. In this study the authors use the Toulmin model as a mental heuristic and for the application of a set of psychodidactic strategies that facilitate the elaboration of the plot and culmination of the thesis. The efficiency in obtaining the degree in the groups exposed to the seminar has increased by 94% compared to the 10% that existed in the generations that were not exposed to the seminar. In this article the authors will emphasize the psychodidactic strategies used. The Toulmin model alone does not guarantee the success achieved. A set of actions of a psychological nature (almost psychotherapeutic) and didactics of the teacher also seem to contribute. These are actions that derive from an understanding of the psychological, epistemological and ontogenetic obstacles and the most frequent errors in which thought tends to fall when it is demanded a logical course. The authors have grouped the strategies into three groups: 1) strategies to facilitate logical thinking, 2) strategies to strengthen the scientific self and 3) strategies to facilitate the act of writing the text. In this work the authors delve into each of them.

Keywords: psychodidactic strategies, logical thinking, academic documents, Toulmin model

Procedia PDF Downloads 179
5445 Performance Analysis of Absorption Power Cycle under Different Source Temperatures

Authors: Kyoung Hoon Kim

Abstract:

The absorption power generation cycle based on the ammonia-water mixture has attracted much attention for efficient recovery of low-grade energy sources. In this paper, a thermodynamic performance analysis is carried out for a Kalina cycle using ammonia-water mixture as a working fluid for efficient conversion of low-temperature heat source in the form of sensible energy. The effects of the source temperature on the system performance are extensively investigated by using the thermodynamic models. The results show that the source temperature as well as the ammonia mass fraction affects greatly on the thermodynamic performance of the cycle.

Keywords: ammonia-water mixture, Kalina cycle, low-grade heat source, source temperature

Procedia PDF Downloads 458
5444 Methodologies for Deriving Semantic Technical Information Using an Unstructured Patent Text Data

Authors: Jaehyung An, Sungjoo Lee

Abstract:

Patent documents constitute an up-to-date and reliable source of knowledge for reflecting technological advance, so patent analysis has been widely used for identification of technological trends and formulation of technology strategies. But, identifying technological information from patent data entails some limitations such as, high cost, complexity, and inconsistency because it rely on the expert’ knowledge. To overcome these limitations, researchers have applied to a quantitative analysis based on the keyword technique. By using this method, you can include a technological implication, particularly patent documents, or extract a keyword that indicates the important contents. However, it only uses the simple-counting method by keyword frequency, so it cannot take into account the sematic relationship with the keywords and sematic information such as, how the technologies are used in their technology area and how the technologies affect the other technologies. To automatically analyze unstructured technological information in patents to extract the semantic information, it should be transformed into an abstracted form that includes the technological key concepts. Specific sentence structure ‘SAO’ (subject, action, object) is newly emerged by representing ‘key concepts’ and can be extracted by NLP (Natural language processor). An SAO structure can be organized in a problem-solution format if the action-object (AO) states that the problem and subject (S) form the solution. In this paper, we propose the new methodology that can extract the SAO structure through technical elements extracting rules. Although sentence structures in the patents text have a unique format, prior studies have depended on general NLP (Natural language processor) applied to the common documents such as newspaper, research paper, and twitter mentions, so it cannot take into account the specific sentence structure types of the patent documents. To overcome this limitation, we identified a unique form of the patent sentences and defined the SAO structures in the patents text data. There are four types of technical elements that consist of technology adoption purpose, application area, tool for technology, and technical components. These four types of sentence structures from patents have their own specific word structure by location or sequence of the part of speech at each sentence. Finally, we developed algorithms for extracting SAOs and this result offer insight for the technology innovation process by providing different perspectives of technology.

Keywords: NLP, patent analysis, SAO, semantic-analysis

Procedia PDF Downloads 262
5443 Requirement Engineering Within Open Source Software Development: A Case Study

Authors: Kars Beek, Remco Groeneveld, Sjaak Brinkkemper

Abstract:

Although there is much literature available on requirement documentation in traditional software development, few studies have been conducted about this topic in open source software development. While open-source software development is becoming more important, the software development processes are often not as structured as corporate software development processes. Papers show that communities, creating open-source software, often lack structure and documentation. However, most recent studies about this topic are often ten or more years old. Therefore, this research has been conducted to determine if the lack of structure and documentation in requirement engineering is currently still the situation in these communities. Three open-source products have been chosen as subjects for conducting this research. The data for this research was gathered based on interviews, observations, and analyses of feature proposals and issue tracking tools. In this paper, we present a comparison and an analysis of the different methods used for requirements documentation to understand the current practices of requirements documentation in open source software development.

Keywords: case study, open source software, open source software development, requirement elicitation, requirement engineering

Procedia PDF Downloads 103
5442 UNIX Source Code Leak: Evaluation and Feasible Solutions

Authors: Gu Dongxing, Li Yuxuan, Nong Tengxiao, Burra Venkata Durga Kumar

Abstract:

Since computers are widely used in business models, more and more companies choose to store important information in computers to improve productivity. However, this information can be compromised in many cases, such as when it is stored locally on the company's computers or when it is transferred between servers and clients. Of these important information leaks, source code leaks are probably the most costly. Because the source code often represents the core technology of the company, especially for the Internet companies, source code leakage may even lead to the company's core products lose market competitiveness, and then lead to the bankruptcy of the company. In recent years, such as Microsoft, AMD and other large companies have occurred source code leakage events, suffered a huge loss. This reveals to us the importance and necessity of preventing source code leakage. This paper aims to find ways to prevent source code leakage based on the direction of operating system, and based on the fact that most companies use Linux or Linux-like system to realize the interconnection between server and client, to discuss how to reduce the possibility of source code leakage during data transmission.

Keywords: data transmission, Linux, source code, operating system

Procedia PDF Downloads 270
5441 Ideology Shift in Political Translation

Authors: Jingsong Ma

Abstract:

In political translation, ideology plays an important role in conveying implications accurately. Ideological collisions can occur in political translation when there existdifferences of political environments embedded in the translingual political texts in both source and target languages. To reach an accurate translationrequires the translatorto understand the ideologies implied in (and often transcending) the texts. This paper explores the conditions, procedure, and purpose of processingideological collision and resolution of such issues in political translation. These points will be elucidated by case studies of translating English and Chinese political texts. First, there are specific political terminologies in certain political environments. These terminological peculiarities in one language are often determined by ideological elements rather than by syntactical and semantical understanding. The translation of these ideological-loaded terminologiesis a process and operation consisting of understanding the ideological context, including cultural, historical, and political situations. This will be explained with characteristic Chinese political terminologies and their renderings in English. Second, when the ideology in the source language fails to match with the ideology in the target language, the decisions to highlight or disregard these conflicts are shaped by power relations, political engagement, social context, etc. It thus is necessary to go beyond linguisticanalysis of the context by deciphering ideology in political documents to provide a faithful or equivalent rendering of certain messages. Finally, one of the practical issues is about equivalence in political translation by redefining the notion of faithfulness and retainment of ideological messages in the source language in translations of political texts. To avoid distortion, the translator should be liberated from grip the literal meaning, instead diving into functional meanings of the text.

Keywords: translation, ideology, politics, society

Procedia PDF Downloads 111
5440 Mapping of Adrenal Gland Diseases Research in Middle East Countries: A Scientometric Analysis, 2007-2013

Authors: Zahra Emami, Mohammad Ebrahim Khamseh, Nahid Hashemi Madani, Iman Kermani

Abstract:

The aim of the study was to map scientific research on adrenal gland diseases in the Middle East countries through the Web of Science database using scientometric analysis. Data were analyzed with Excel software; and HistCite was used for mapping of the scientific texts. In this study, from a total of 268 retrieved records, 1125 authors from 328 institutions published their texts in 138 journals. Among 17 Middle East countries, Turkey ranked first with 164 documents (61.19%), Israel ranked second with 47 documents (15.53%) and Iran came in the third place with 26 documents. Most of the publications (185 documents, 69.2%) were articles. Among the universities of the Middle East, Istanbul University had the highest science production rate (9.7%). The Journal of Clinical Endocrinology & Metabolism had the highest TGCS (243 citations). In the scientific mapping, 7 clusters were formed based on TLCS (Total Local Citation Score) & TGCS (Total Global Citation Score). considering the study results, establishment of scientific connections and collaboration with other countries and use of publications on adrenal gland diseases from high ranking universities can help in the development of this field and promote the medical practice in this regard. Moreover, investigation of the formed clusters in relation to Congenital Hyperplasia and puberty related disorders can be research priorities for investigators.

Keywords: mapping, scientific research, adrenal gland diseases, scientometric

Procedia PDF Downloads 273
5439 Study on Acoustic Source Detection Performance Improvement of Microphone Array Installed on Drones Using Blind Source Separation

Authors: Youngsun Moon, Yeong-Ju Go, Jong-Soo Choi

Abstract:

Most drones that currently have surveillance/reconnaissance missions are basically equipped with optical equipment, but we also need to use a microphone array to estimate the location of the acoustic source. This can provide additional information in the absence of optical equipment. The purpose of this study is to estimate Direction of Arrival (DOA) based on Time Difference of Arrival (TDOA) estimation of the acoustic source in the drone. The problem is that it is impossible to measure the clear target acoustic source because of the drone noise. To overcome this problem is to separate the drone noise and the target acoustic source using Blind Source Separation(BSS) based on Independent Component Analysis(ICA). ICA can be performed assuming that the drone noise and target acoustic source are independent and each signal has non-gaussianity. For maximized non-gaussianity each signal, we use Negentropy and Kurtosis based on probability theory. As a result, we can improve TDOA estimation and DOA estimation of the target source in the noisy environment. We simulated the performance of the DOA algorithm applying BSS algorithm, and demonstrated the simulation through experiment at the anechoic wind tunnel.

Keywords: aeroacoustics, acoustic source detection, time difference of arrival, direction of arrival, blind source separation, independent component analysis, drone

Procedia PDF Downloads 162
5438 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 189
5437 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 272
5436 A Bibliometric Analysis of Ukrainian Research Articles on SARS-COV-2 (COVID-19) in Compliance with the Standards of Current Research Information Systems

Authors: Sabina Auhunas

Abstract:

These days in Ukraine, Open Science dramatically develops for the sake of scientists of all branches, providing an opportunity to take a more close look on the studies by foreign scientists, as well as to deliver their own scientific data to national and international journals. However, when it comes to the generalization of data on science activities by Ukrainian scientists, these data are often integrated into E-systems that operate inconsistent and barely related information sources. In order to resolve these issues, developed countries productively use E-systems, designed to store and manage research data, such as Current Research Information Systems that enable combining uncompiled data obtained from different sources. An algorithm for selecting SARS-CoV-2 research articles was designed, by means of which we collected the set of papers published by Ukrainian scientists and uploaded by August 1, 2020. Resulting metadata (document type, open access status, citation count, h-index, most cited documents, international research funding, author counts, the bibliographic relationship of journals) were taken from Scopus and Web of Science databases. The study also considered the info from COVID-19/SARS-CoV-2-related documents published from December 2019 to September 2020, directly from documents published by authors depending on territorial affiliation to Ukraine. These databases are enabled to get the necessary information for bibliometric analysis and necessary details: copyright, which may not be available in other databases (e.g., Science Direct). Search criteria and results for each online database were considered according to the WHO classification of the virus and the disease caused by this virus and represented (Table 1). First, we identified 89 research papers that provided us with the final data set after consolidation and removing duplication; however, only 56 papers were used for the analysis. The total number of documents by results from the WoS database came out at 21641 documents (48 affiliated to Ukraine among them) in the Scopus database came out at 32478 documents (41 affiliated to Ukraine among them). According to the publication activity of Ukrainian scientists, the following areas prevailed: Education, educational research (9 documents, 20.58%); Social Sciences, interdisciplinary (6 documents, 11.76%) and Economics (4 documents, 8.82%). The highest publication activity by institution types was reported in the Ministry of Education and Science of Ukraine (its percent of published scientific papers equals 36% or 7 documents), Danylo Halytsky Lviv National Medical University goes next (5 documents, 15%) and P. L. Shupyk National Medical Academy of Postgraduate Education (4 documents, 12%). Basically, research activities by Ukrainian scientists were funded by 5 entities: Belgian Development Cooperation, the National Institutes of Health (NIH, U.S.), The United States Department of Health & Human Services, grant from the Whitney and Betty MacMillan Center for International and Area Studies at Yale, a grant from the Yale Women Faculty Forum. Based on the results of the analysis, we obtained a set of published articles and preprints to be assessed on the variety of features in upcoming studies, including citation count, most cited documents, a bibliographic relationship of journals, reference linking. Further research on the development of the national scientific E-database continues using brand new analytical methods.

Keywords: content analysis, COVID-19, scientometrics, text mining

Procedia PDF Downloads 115