Search results for: unstructured text
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 622

Search results for: unstructured text

412 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3728
411 Developing OMS in IHL

Authors: Suzana Basaruddin, Haryani Haron, Siti Arpah Noodin

Abstract:

Managing knowledge of research is one way to ensure just in time information and knowledge to support research strategist and activities. Unfortunately researcher found the vital research knowledge in IHL (Institutions of Higher Learning) are scattered, unstructured and unorganized. Aiming on lay aside conceptual foundations for understanding and developing OMS (Organizational Memory System) to facilitate research in IHL, this research revealed ten factors contributed to the needs of research in the IHL and seven internal challenges of IHL in promoting research to their academic members. This study then suggested a comprehensive support of managing research knowledge using Organizational Memory System (OMS). Eight OMS characteristics to support research were identified. Finally the initial work in designing OMS was projected using knowledge taxonomy. All analysis is derived from pertinent research paper related to research in IHL and OMS. Further study can be conducted to validate and verify results presented.

Keywords: corporate memory, Institutions of Higher Learning, organizational memory system, research

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2050
410 Robust Control Synthesis for an Unmanned Underwater Vehicle

Authors: A. Budiyono

Abstract:

The control design for unmanned underwater vehicles (UUVs) is challenging due to the uncertainties in the complex dynamic modeling of the vehicle as well as its unstructured operational environment. To cope with these difficulties, a practical robust control is therefore desirable. The paper deals with the application of coefficient diagram method (CDM) for a robust control design of an autonomous underwater vehicle. The CDM is an algebraic approach in which the characteristic polynomial and the controller are synthesized simultaneously. Particularly, a coefficient diagram (comparable to Bode diagram) is used effectively to convey pertinent design information and as a measure of trade-off between stability, response speed and robustness. In the polynomial ring, Kharitonov polynomials are employed to analyze the robustness of the controller due to parametric uncertainties.

Keywords: coefficient diagram method, robust control, Kharitonov polynomials, unmanned underwater vehicles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2043
409 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: Chain code frequency, character recognition, feature extraction, features matching, segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 700
408 A New Precautionary Method for Measurement and Improvement the Data Quality

Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi

Abstract:

the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.

Keywords: Data quality, precaution, information system, measurement, improvement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
407 Optimization of a New Three-Phase High Voltage Power Supply for Industrial Microwaves Generators with N Magnetrons by Phase (Treated Case N=1)

Authors: M. Bassoui, M. Ferfra, M. Chraygane, M. Ould Ahmedou, N. Elghazal, A. Belhaiba

Abstract:

Currently, the High voltage power supply for microwave generators with one magnetron uses a single-phase transformer with magnetic shunt. To contribute in the development of technological innovation in industry of manufacturing of power supplies of magnetrons for microwaves, ovens for domestic or industrial use, this original work treats the optimization of a new three-phase high voltage power supply for industrial microwaves generators with N magnetrons by phase (Treated case N=1), from its modeling with Matlab-Simulink. The design of this power supply uses three π quadruple models equivalents of new three-phase transformer with magnetic shunt of each phase. Every one supplies at its output a voltage doubler cell composed of a capacitor and a diode that in its output supplies only one magnetron.  In this work we will define a strategy that aims to reduce the volume of the transformer and the weight and cost of the entire system of the high voltage power supply, while respecting the conditions recommended by the manufacturer, concerning the current flowing in each magnetron: (Imax <1.2 A, IAv ≈ 300 mA).

 

Keywords: Optimization, Three-phase transformer, Modeling, power supply, magnetrons, Matlab Simulink, High Voltage

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2764
406 Object Identification with Color, Texture, and Object-Correlation in CBIR System

Authors: Awais Adnan, Muhammad Nawaz, Sajid Anwar, Tamleek Ali, Muhammad Ali

Abstract:

Needs of an efficient information retrieval in recent years in increased more then ever because of the frequent use of digital information in our life. We see a lot of work in the area of textual information but in multimedia information, we cannot find much progress. In text based information, new technology of data mining and data marts are now in working that were started from the basic concept of database some where in 1960. In image search and especially in image identification, computerized system at very initial stages. Even in the area of image search we cannot see much progress as in the case of text based search techniques. One main reason for this is the wide spread roots of image search where many area like artificial intelligence, statistics, image processing, pattern recognition play their role. Even human psychology and perception and cultural diversity also have their share for the design of a good and efficient image recognition and retrieval system. A new object based search technique is presented in this paper where object in the image are identified on the basis of their geometrical shapes and other features like color and texture where object-co-relation augments this search process. To be more focused on objects identification, simple images are selected for the work to reduce the role of segmentation in overall process however same technique can also be applied for other images.

Keywords: Object correlation, Geometrical shape, Color, texture, features, contents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1981
405 Persian/Arabic Document Segmentation Based On Pyramidal Image Structure

Authors: Seyyed Yasser Hashemi, Khalil Monfaredi

Abstract:

Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus in most previously reported methods it is inevitable to include these parameters. This problem becomes excessively acute and severe, especially in Persian/Arabic documents. Since the Persian/Arabic scripts differ considerably from the English scripts, most of the proposed methods for the English scripts do not render good results for the Persian scripts. In this paper, we present a novel parameter-free method for segmenting the Persian/Arabic document images which also works well for English scripts. This method segments the document image into maximal homogeneous regions and identifies them as texts and non-texts based on a pyramidal image structure. In other words the proposed method is capable of document segmentation without considering the character font sizes, text line spacing, and document layout structures. This algorithm is examined for 150 Arabic/Persian and English documents and document segmentation process are done successfully for 96 percent of documents.

Keywords: Persian/Arabic document, document segmentation, Pyramidal Image Structure, skew detection and correction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729
404 Decode and Forward Cooperative Protocol Enhancement Using Interference Cancellation

Authors: Siddeeq Y. Ameen, Mohammed K. Yousif

Abstract:

Cooperative communication systems are considered to be a promising technology to improve the system capacity, reliability and performances over fading wireless channels. Cooperative relaying system with a single antenna will be able to reach the advantages of multiple antenna communication systems. It is ideally suitable for the distributed communication systems; the relays can cooperate and form virtual MIMO systems. Thus the paper will aim to investigate the possible enhancement of cooperated system using decode and forward protocol. On the decode and forward an attempt to cancel or at least reduce the interference instead of increasing the SNR values is achieved. The latter can be achieved via the use group of relays depending on the channel status from source to relay and relay to destination respectively.

In the proposed system, the transmission time has been divided into two phases to be used by the decode and forward protocol. The first phase has been allocated for the source to transmit its data whereas the relays and destination nodes are in receiving mode. On the other hand, the second phase is allocated for the first and second groups of relay nodes to relay the data to the destination node. Simulations results have shown an improvement in performance is achieved compared to the conventional decode and forward in terms of BER and transmission rate.

Keywords: Cooperative systems, decode and forward, interference cancellation, virtual MIMO.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3688
403 The Name of Thai Muslim Students: The Reflection of Value and Identity of Thai Muslim

Authors: Apichaya Kaewuthai

Abstract:

To study the meaning of Muslim name in order to analyze the underlining value and identity from first year to forth year Muslim students at Prince of Songkla University, Hatyai Campus. The questionnaires are employed as a main analytical tool to acquire the names from 80 Muslim students in four study years. The meanings of obtained names are subsequently analyzed and summarized base upon related documents to uncover the beneath value. The study reveals that name of male is derived from the name of prophet; Nabi Muhammad, merit, dignity, origins, leadership and the faith in Islam. For female, on the other hand, their names are related to virtue and beauty, cleanliness and peace, hope and flowers which comply with their characteristics. One of the reasons contribute to the principle of naming is the regulation of Ministry of Culture which states that the name should represent one’s nature and characters. The given name reflects value and identity of Muslim which can be classified into three categories including 1) Value related to belief in Islam 2) value related to relationship among families and relatives 3) value about relationship with nature and environment. All the above mentioned reflect Muslim value and identity vividly.    The name of Muslim students allows the researcher to perceive the perspective, belief and value in giving the name of Thai Muslim. Besides, it reveals social condition and their culture. It can also be the fundamental of studying the meaning of name in other races.

Keywords: The naming, Thai Muslim.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1204
402 A Framework for Vacant City-Owned Land to Be Utilised for Urban Agriculture: The Case of Cape Town, South Africa

Authors: P. S. Van Staden, M. M. Campbell

Abstract:

Vacant City of Cape Town-owned land lying unutilized and -productive could be developed for land uses such as urban agriculture that may improve the livelihoods of low income families. The new City of Cape Town zoning scheme includes an Urban Agriculture zoning for the first time. Unstructured qualitative interviews among town planners revealed their optimism about this inclusion as it will provide low-income residents with opportunities to generate an income. An existing farming community at Philippi, located within the municipal boundary of the city, was approached and empirical data obtained through questionnaires provided proof that urban agriculture could be viable in a coastal metropolitan city such as Cape Town even if farmers only produce for their own households. The lease method proposed for urban agriculture is a usufruct agreement conferring the right to another party, other than the legal owner, to enjoy the use and advantages of the property.

Keywords: Land uses, urban agriculture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1957
401 Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

Authors: Jiqing Han, Rongchun Gao

Abstract:

One major source of performance decline in speaker recognition system is channel mismatch between training and testing. This paper focuses on improving channel robustness of speaker recognition system in two aspects of channel compensation technique and channel robust features. The system is text-independent speaker identification system based on two-stage recognition. In the aspect of channel compensation technique, this paper applies MAP (Maximum A Posterior Probability) channel compensation technique, which was used in speech recognition, to speaker recognition system. In the aspect of channel robust features, this paper introduces pitch-dependent features and pitch-dependent speaker model for the second stage recognition. Based on the first stage recognition to testing speech using GMM (Gaussian Mixture Model), the system uses GMM scores to decide if it needs to be recognized again. If it needs to, the system selects a few speakers from all of the speakers who participate in the first stage recognition for the second stage recognition. For each selected speaker, the system obtains 3 pitch-dependent results from his pitch-dependent speaker model, and then uses ANN (Artificial Neural Network) to unite the 3 pitch-dependent results and 1 GMM score for getting a fused result. The system makes the second stage recognition based on these fused results. The experiments show that the correct rate of two-stage recognition system based on MAP channel compensation technique and pitch-dependent features is 41.7% better than the baseline system for closed-set test.

Keywords: Channel Compensation, Channel Robustness, MAP, Speaker Identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502
400 Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation

Authors: Quratulain N. Rajput, Sajjad Haider, Nasir Touheed

Abstract:

The internet has become an attractive avenue for global e-business, e-learning, knowledge sharing, etc. Due to continuous increase in the volume of web content, it is not practically possible for a user to extract information by browsing and integrating data from a huge amount of web sources retrieved by the existing search engines. The semantic web technology enables advancement in information extraction by providing a suite of tools to integrate data from different sources. To take full advantage of semantic web, it is necessary to annotate existing web pages into semantic web pages. This research develops a tool, named OWIE (Ontology-based Web Information Extraction), for semantic web annotation using domain specific ontologies. The tool automatically extracts information from html pages with the help of pre-defined ontologies and gives them semantic representation. Two case studies have been conducted to analyze the accuracy of OWIE.

Keywords: Ontology, Semantic Annotation, Wrapper, Information Extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2070
399 A Multi-Phase Methodology for Investigating Localisation Policies within the GCC: The Hotel Industry in the KSA and the UAE

Authors: Areej Azhar, Peter Duncan, David Edgar

Abstract:

Due to a high unemployment rate among local people and a high reliance on expatriate workers, the governments in the Gulf Co-operation Council (GCC) countries have been implementing programmes of localisation (replacing foreign workers with GCC nationals). These programmes have been successful in the public sector but much less so in the private sector. However, there are now insufficient jobs for locals in the public sector and the onus to provide employment has fallen on the private sector. This paper is concerned with a study, which is a work in progress (certain elements are complete but not the whole study), investigating the effective implementation of localisation policies in four- and five-star hotels in the Kingdom of Saudi Arabia (KSA) and the United Arab Emirates (UAE). The purpose of the paper is to identify the research gap, and to present the need for the research. Further, it will explain how this research was conducted. Studies of localisation in the GCC countries are under-represented in scholarly literature. Currently, the hotel sectors in KSA and UAE play an important part in the countries’ economies. However, the total proportion of Saudis working in the hotel sector in KSA is slightly under 8%, and in the UAE, the hotel sector remains highly reliant on expatriates. There is therefore a need for research on strategies to enhance the implementation of the localisation policies in general and in the hotel sector in particular. Further, despite the importance of the hotel sector to their economies, there remains a dearth of research into the implementation of localisation policies in this sector. Indeed, as far as the researchers are aware, there is no study examining localisation in the hotel sector in KSA, and few in the UAE. This represents a considerable research gap. Regarding how the research was carried out, a multiple case study strategy was used. The four- and five-star hotel sector in KSA is one of the cases, while the four- and five-star hotel sector in the UAE is the other case. Four- and five-star hotels in KSA and the UAE were chosen as these countries have the longest established localisation policies of all the GCC states and there are more hotels of these classifications in these countries than in any of the other Gulf countries. A literature review was carried out to underpin the research. The empirical data were gathered in three phases. In order to gain a pre-understanding of the issues pertaining to the research context, Phase I involved eight unstructured interviews with officials from the Saudi Commission for Tourism and Antiquities (three interviewees); the Saudi Human Resources Development Fund (one); the Abu Dhabi Tourism and Culture Authority (three); and the Abu Dhabi Development Fund (one).

In Phase II, a questionnaire was administered to 24 managers and 24 employees in four- and five-star hotels in each country to obtain their beliefs, attitudes, opinions, preferences and practices concerning localisation. Unstructured interviews were carried out in Phase III with six managers in each country in order to allow them to express opinions that may not have been explored in sufficient depth in the questionnaire. The interviews in Phases I and III were analysed using thematic analysis and SPSS will be used to analyse the questionnaire data. It is recommended that future research be undertaken on a larger scale, with a larger sample taken from all over KSA and the UAE rather than from only four cities (i.e., Riyadh and Jeddah in KSA and Abu Dhabi and Sharjah in the UAE), as was the case in this research.

Keywords: KSA, UAE, localisation, hotels, Human Resource Management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2526
398 Mining and Visual Management of XML-Based Image Collections

Authors: Khalil Shihab, Nida Al-Chalabi

Abstract:

This article describes Uruk, the virtual museum of Iraq that we developed for visual exploration and retrieval of image collections. The system largely exploits the loosely-structured hierarchy of XML documents that provides a useful representation method to store semi-structured or unstructured data, which does not easily fit into existing database. The system offers users the capability to mine and manage the XML-based image collections through a web-based Graphical User Interface (GUI). Typically, at an interactive session with the system, the user can browse a visual structural summary of the XML database in order to select interesting elements. Using this intermediate result, queries combining structure and textual references can be composed and presented to the system. After query evaluation, the full set of answers is presented in a visual and structured way.

Keywords: Data-centric XML, graphical user interfaces, information retrieval, case-based reasoning, fuzzy sets

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1742
397 A BERT-Based Model for Financial Social Media Sentiment Analysis

Authors: Josiel Delgadillo, Johnson Kinyua, Charles Mutigwe

Abstract:

The purpose of sentiment analysis is to determine the sentiment strength (e.g., positive, negative, neutral) from a textual source for good decision-making. Natural Language Processing (NLP) in domains such as financial markets requires knowledge of domain ontology, and pre-trained language models, such as BERT, have made significant breakthroughs in various NLP tasks by training on large-scale un-labeled generic corpora such as Wikipedia. However, sentiment analysis is a strong domain-dependent task. The rapid growth of social media has given users a platform to share their experiences and views about products, services, and processes, including financial markets. StockTwits and Twitter are social networks that allow the public to express their sentiments in real time. Hence, leveraging the success of unsupervised pre-training and a large amount of financial text available on social media platforms could potentially benefit a wide range of financial applications. This work is focused on sentiment analysis using social media text on platforms such as StockTwits and Twitter. To meet this need, SkyBERT, a domain-specific language model pre-trained and fine-tuned on financial corpora, has been developed. The results show that SkyBERT outperforms current state-of-the-art models in financial sentiment analysis. Extensive experimental results demonstrate the effectiveness and robustness of SkyBERT.

Keywords: BERT, financial markets, Twitter, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 594
396 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: Latent Dirichlet allocation, R program, text mining, topic model, user generated contents, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1174
395 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision making has not been far-fetched. Proper classification of these textual information in a given context has also been very difficult. As a result, a systematic review was conducted from previous literature on sentiment classification and AI-based techniques. The study was done in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that could correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy using the knowledge gain from the evaluation of different artificial intelligence techniques reviewed. The study evaluated over 250 articles from digital sources like ACM digital library, Google Scholar, and IEEE Xplore; and whittled down the number of research to 52 articles. Findings revealed that deep learning approaches such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Bidirectional Encoder Representations from Transformer (BERT), and Long Short-Term Memory (LSTM) outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also required to develop a robust sentiment classifier. Results also revealed that data can be obtained from places like Twitter, movie reviews, Kaggle, Stanford Sentiment Treebank (SST), and SemEval Task4 based on the required domain. The hybrid deep learning techniques like CNN+LSTM, CNN+ Gated Recurrent Unit (GRU), CNN+BERT outperformed single deep learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of development simplicity and AI-based library functionalities. Finally, the study recommended the findings obtained for building robust sentiment classifier in the future.

Keywords: Artificial Intelligence, Natural Language Processing, Sentiment Analysis, Social Network, Text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489
394 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: Broadcasting contents, generalized P´olya urn model, scripts, text similarity, topic model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1775
393 Localizing and Recognizing Integral Pitches of Cheque Document Images

Authors: Bremananth R., Veerabadran C. S., Andy W. H. Khong

Abstract:

Automatic reading of handwritten cheque is a computationally complex process and it plays an important role in financial risk management. Machine vision and learning provide a viable solution to this problem. Research effort has mostly been focused on recognizing diverse pitches of cheques and demand drafts with an identical outline. However most of these methods employ templatematching to localize the pitches and such schemes could potentially fail when applied to different types of outline maintained by the bank. In this paper, the so-called outline problem is resolved by a cheque information tree (CIT), which generalizes the localizing method to extract active-region-of-entities. In addition, the weight based density plot (WBDP) is performed to isolate text entities and read complete pitches. Recognition is based on texture features using neural classifiers. Legal amount is subsequently recognized by both texture and perceptual features. A post-processing phase is invoked to detect the incorrect readings by Type-2 grammar using the Turing machine. The performance of the proposed system was evaluated using cheque and demand drafts of 22 different banks. The test data consists of a collection of 1540 leafs obtained from 10 different account holders from each bank. Results show that this approach can easily be deployed without significant design amendments.

Keywords: Cheque reading, Connectivity checking, Text localization, Texture analysis, Turing machine, Signature verification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608
392 Development of Improved Three Dimensional Unstructured Tetrahedral Mesh Generator

Authors: Ng Yee Luon, Mohd Zamri Yusoff, Norshah Hafeez Shuaib

Abstract:

Meshing is the process of discretizing problem domain into many sub domains before the numerical calculation can be performed. One of the most popular meshes among many types of meshes is tetrahedral mesh, due to their flexibility to fit into almost any domain shape. In both 2D and 3D domains, triangular and tetrahedral meshes can be generated by using Delaunay triangulation. The quality of mesh is an important factor in performing any Computational Fluid Dynamics (CFD) simulations as the results is highly affected by the mesh quality. Many efforts had been done in order to improve the quality of the mesh. The paper describes a mesh generation routine which has been developed capable of generating high quality tetrahedral cells in arbitrary complex geometry. A few test cases in CFD problems are used for testing the mesh generator. The result of the mesh is compared with the one generated by a commercial software. The results show that no sliver exists for the meshes generated, and the overall quality is acceptable since the percentage of the bad tetrahedral is relatively small. The boundary recovery was also successfully done where all the missing faces are rebuilt.

Keywords: Mesh generation, tetrahedral, CFD, Delaunay.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1468
391 The Effects of Weather Anomalies on the Quantitative and Qualitative Parameters of Maize Hybrids of Different Genetic Traits in Hungary

Authors: Zs. J. Becze, Á. Krivián, M. Sárvári

Abstract:

Hybrid selection and the application of hybrid specific production technologies are important in terms of the increase of the yield and crop safety of maize. The main explanation for this is climate change, since weather extremes are going on and seem to accelerate in Hungary too.

The biological bases, the selection of appropriate hybrids will be of greater importance in the future. The issue of the adaptability of hybrids will be considerably appreciated. Its good agronomical traits and stress bearing against climatic factors and agrotechnical elements (e.g. different types of herbicides) will be important. There have been examples of 3-4 consecutive droughty years in the past decades, e.g. 1992-1993-1994 or 2009-2011-2012, which made the results of crop production critical. Irrigation cannot be the solution for the problem since currently only the 2% of the arable land is irrigated. Temperatures exceeding the multi-year average are characteristic mainly to the July and August in Hungary, which significantly increase the soil surface evaporation, thus further enhance water shortage. In terms of the yield and crop safety of maize, the weather of these two months is crucial, since the extreme high temperature in July decreases the viability of the pollen and the pistil of maize, decreases the extent of fertilization and makes grain-filling tardy. Consequently, yield and crop safety decrease.

Keywords: Abiotic factors, drought, nutrition content, yield.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1865
390 The Emerging Central Business District (CBD) in Lafia Town, Nigeria, and its Related Urban Planning Problems

Authors: Barau Daniel, Bashayi Obadiah

Abstract:

A spatial analysis of a large 20th century urban settlement (town/city) easily presents the celebrated central Business District (CBD). Theories of Urban Land Economics have easily justified and attempted to explain the existence of such a district activity area within the cityscape. This work examines the gradual emergence and development of the CBD in Lafia Town, Nigeria over 20 years and the attended urban problems caused by its emergence. Personal knowledge and observation of land use change are the main sources of data for the work, with unstructured interview with residents. The result are that the absence of a co-ordinate land use plan for the town, multi-nuclei nature, and regional location of surrounding towns have affected the growth pattern, hence the CBD. Traffic congestion, dispersed CBD land uses are some of the urban planning problems. The work concludes by advocating for integrating CBD uses.

Keywords: Urban planning, Central Business District (CBD), downtown.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4014
389 Performance Analysis of Chrominance Red and Chrominance Blue in JPEG

Authors: Mamta Garg

Abstract:

While compressing text files is useful, compressing still image files is almost a necessity. A typical image takes up much more storage than a typical text message and without compression images would be extremely clumsy to store and distribute. The amount of information required to store pictures on modern computers is quite large in relation to the amount of bandwidth commonly available to transmit them over the Internet and applications. Image compression addresses the problem of reducing the amount of data required to represent a digital image. Performance of any image compression method can be evaluated by measuring the root-mean-square-error & peak signal to noise ratio. The method of image compression that will be analyzed in this paper is based on the lossy JPEG image compression technique, the most popular compression technique for color images. JPEG compression is able to greatly reduce file size with minimal image degradation by throwing away the least “important" information. In JPEG, both color components are downsampled simultaneously, but in this paper we will compare the results when the compression is done by downsampling the single chroma part. In this paper we will demonstrate more compression ratio is achieved when the chrominance blue is downsampled as compared to downsampling the chrominance red in JPEG compression. But the peak signal to noise ratio is more when the chrominance red is downsampled as compared to downsampling the chrominance blue in JPEG compression. In particular we will use the hats.jpg as a demonstration of JPEG compression using low pass filter and demonstrate that the image is compressed with barely any visual differences with both methods.

Keywords: JPEG, Discrete Cosine Transform, Quantization, Color Space Conversion, Image Compression, Peak Signal to Noise Ratio & Compression Ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
388 Investigation of the Effect of Grid Size on External Store Separation Trajectory Using CFD

Authors: Alaa A. Osman, Amgad M. Bayoumy, Ismail El baialy, Osama E. Abdellatif, Essam E. Khallil

Abstract:

In this paper, a numerical simulation of a finned store separating from a wing-pylon configuration has been studied and validated. A dynamic unstructured tetrahedral mesh approach is accomplished by using three grid sizes to numerically solving the discretized three dimensional, inviscid and compressible Euler equations. The method used for computations of separation of an external store assuming quasi-steady flow condition. Computations of quasi-steady flow have been directly coupled to a six degree-offreedom (6DOF) rigid-body motion code to generate store trajectories. The pressure coefficients at four different angular cuts and time histories of various trajectory parameters and wing pressure distribution during the store separation are compared for every grid size with published experimental data.

Keywords: CFD Modelling, Quasi-steady Flow, Moving-body Trajectories, Transonic Store Separation, Moving-body Trajectories.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2936
387 Developing an Audit Quality Model for an Emerging Market

Authors: Bita Mashayekhi, Azadeh Maddahi, Arash Tahriri

Abstract:

The purpose of this paper is developing a model for audit quality, with regard to the contextual and environmental attributes of the audit profession in Iran. For this purpose, using an exploratory approach, and because of the special attributes of the auditing profession in Iran in terms of the legal environment, regulatory and supervisory mechanisms, audit firms size, and etc., we used grounded theory approach as a qualitative research method. Therefore, we got the opinions of the experts in the auditing and capital market areas through unstructured interviews. As a result, the authors revealed the determinants of audit quality, and by using these determinants, developed an Integrated Audit Quality Model, including causal conditions, intervening conditions, context, as well as action strategies related to AQ and their consequences. In this research, audit quality is studied using a systemic approach. According to this approach, the quality of inputs, processes, and outputs of auditing determines the quality of auditing, therefore, the quality of all different parts of this system is considered.

Keywords: Audit quality, integrated audit quality model, audit supply, demand for audit service, grounded theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237
386 Development of Circulating Support Environment of Multilingual Medical Communication using Parallel Texts for Foreign Patients

Authors: Mai Miyabe, Taku Fukushima, Takashi Yoshino, Aguri Shigeno

Abstract:

The need for multilingual communication in Japan has increased due to an increase in the number of foreigners in the country. When people communicate in their nonnative language, the differences in language prevent mutual understanding among the communicating individuals. In the medical field, communication between the hospital staff and patients is a serious problem. Currently, medical translators accompany patients to medical care facilities, and the demand for medical translators is increasing. However, medical translators cannot necessarily provide support, especially in cases in which round-the-clock support is required or in case of emergencies. The medical field has high expectations from information technology. Hence, a system that supports accurate multilingual communication is required. Despite recent advances in machine translation technology, it is very difficult to obtain highly accurate translations. We have developed a support system called M3 for multilingual medical reception. M3 provides support functions that aid foreign patients in the following respects: conversation, questionnaires, reception procedures, and hospital navigation; it also has a Q&A function. Users can operate M3 using a touch screen and receive text-based support. In addition, M3 uses accurate translation tools called parallel texts to facilitate reliable communication through conversations between the hospital staff and the patients. However, if there is no parallel text that expresses what users want to communicate, the users cannot communicate. In this study, we have developed a circulating support environment for multilingual medical communication using parallel texts. The proposed environment can circulate necessary parallel texts through the following procedure: (1) a user provides feedback about the necessary parallel texts, following which (2) these parallel texts are created and evaluated.

Keywords: multilingual medical communication, parallel texts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441
385 Barriers to the Use of Factoring Accounts Receivables: The Ghanaian Contractor’s Perception

Authors: E. Kissi, V. K. Acheamfour, J. J. Gyimah, T. Adjei-Kumi

Abstract:

Factoring accounts receivable is widely accepted as an alternative financing source and utilized in almost every industry that sells business-to-business or business-to-government. However, its patronage in the construction industry is very limited as some barriers hinder its application in the construction industry. This study aims at assessing the barriers to the use of factoring accounts receivables in the Ghanaian construction industry. The study adopted the sequential exploratory research method where structured and unstructured questionnaires were conveniently distributed to D1K1 and D2K2 construction firms in Ghana. Using the one-sample t-test and Kendall’s Coefficient of concordance data were analyzed. The most severe challenge concluded is the high cost of factoring patronage. Other critical challenges identified were low knowledge on factoring processes, inadequate access to information on factoring, and high risks involved in factoring. Hence, it is recommended that contractors should be made aware of the prospects of factoring of accounts receivables in the construction industry. This study serves as basis for further rigorous research into factoring of accounts receivables in the industry.

Keywords: Barriers, contractors, factoring accounts receivables, Ghanaian, perception.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 469
384 A Design for Customer Preferences Model by Cluster Analysis of Geometric Features and Customer Preferences

Authors: Yuan-Jye Tseng, Ching-Yen Chen

Abstract:

In the design cycle, a main design task is to determine the external shape of the product. The external shape of a product is one of the key factors that can affect the customers’ preferences linking to the motivation to buy the product, especially in the case of a consumer electronic product such as a mobile phone. The relationship between the external shape and the customer preferences needs to be studied to enhance the customer’s purchase desire and action. In this research, a design for customer preferences model is developed for investigating the relationships between the external shape and the customer preferences of a product. In the first stage, the names of the geometric features are collected and evaluated from the data of the specified internet web pages using the developed text miner. The key geometric features can be determined if the number of occurrence on the web pages is relatively high. For each key geometric feature, the numerical values are explored using the text miner to collect the internet data from the web pages. In the second stage, a cluster analysis model is developed to evaluate the numerical values of the key geometric features to divide the external shapes into several groups. Several design suggestion cases can be proposed, for example, large model, mid-size model, and mini model, for designing a mobile phone. A customer preference index is developed by evaluating the numerical data of each of the key geometric features of the design suggestion cases. The design suggestion case with the top ranking of the customer preference index can be selected as the final design of the product. In this paper, an example product of a notebook computer is illustrated. It shows that the external shape of a product can be used to drive customer preferences. The presented design for customer preferences model is useful for determining a suitable external shape of the product to increase customer preferences.

Keywords: Cluster analysis, customer preferences, design evaluation, design for customer preferences, product design.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 709
383 A Persian OCR System using Morphological Operators

Authors: M. Salmani Jelodar, M.J. Fadaeieslam, N. Mozayani, M. Fazeli

Abstract:

Optical Character Recognition (OCR) is a very old and of great interest in pattern recognition field. In this paper we introduce a very powerful approach to recognize Persian text. We have used morphological operators, especially Hit/Miss operator to descript each sub-word and by using a template matching approach we have tried to classify generated description. We used just one font in two different sizes to verify our approach. We achieved a very good rate, up to 99.9%.

Keywords: A Persian Optical Character Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2270