Search results for: About four key words or phrases in alphabetical order
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5479

Search results for: About four key words or phrases in alphabetical order

5479 Photovoltaic Array Sizing for PV-Electrolyzer

Authors: Panhathai Buasri

Abstract:

Hydrogen that used as fuel in fuel cell vehicles can be produced from renewable sources such as wind, solar, and hydro technologies. PV-electrolyzer is one of the promising methods to produce hydrogen with zero pollution emission. Hydrogen production from a PV-electrolyzer system depends on the efficiency of the electrolyzer and photovoltaic array, and sun irradiance at that site. In this study, the amount of hydrogen is obtained using mathematical equations for difference driving distance and sun peak hours. The results show that the minimum of 99 PV modules are used to generate 1.75 kgH2 per day for two vehicles.

Keywords: About four key words or phrases in alphabetical order, separated by commas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1753
5478 From e-Government to e-Democracy Challenges and Opportunities for Development in Montenegro

Authors: Tamara Djurickovic MSc

Abstract:

Internet today has a huge impact on all aspects of life, and also in the area of the broader context of democracy, politics and politicians. If democracy is freedom of choice, there are a number of conditions that can ensure in practice the freedom to be achieved and realized. These preconditions must be achieved regardless of the manner of voting. The key contribution of ICT to achieve freedom of choice is that technology enables the correlation of the citizens and elected representatives on the better way than it was possible without the Internet. In this sense, we can say that the Internet and ICT are changing significantly, and potentially improving the environment in which democratic processes are taking place. This paper aims to describe trends in use of ICT in democratic processes, and analyzes the challenges for implementation of e-Democracy in Montenegro

Keywords: About four key words or phrases in alphabetical order, separated by commas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607
5477 Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words

Authors: Mohd Pouzi Hamzah, Tengku Mohd Tengku Sembok

Abstract:

Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.

Keywords: Information Retrieval, Malay Language, Semantic Relationship, Retrieval Effectiveness, Conceptual Indexing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
5476 Bose-Einstein Condensation in Neutral Many Bosonic System

Authors: M. Al-Sugheir, M. Sakhreya, G. Alna'washi, F. Al-Dweri

Abstract:

In this work, the condensation fraction and transition temperature of neutral many bosonic system are studied within the static fluctuation approximation (SFA). The effect of the potential parameters such as the strength and range on the condensate fraction was investigated. A model potential consisting of a repulsive step potential and an attractive potential well was used. As the potential strength or the core radius of the repulsive part increases, the condensation fraction is found to be decreased at the same temperature. Also, as the potential depth or the range of the attractive part increases, the condensation fraction is found to be increased. The transition temperature is decreased as the potential strength or the core radius of the repulsive part increases, and it increases as the potential depth or the range of the attractive part increases.

Keywords: About four key words or phrases in alphabetical order, separated by commas

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1207
5475 Cloning and Expression of D-Threonine Aldolase from Ensifer arboris NBRC100383

Authors: Sang-Ho Baik

Abstract:

D-erythro-cyclohexylserine (D chiral unnatural β-hydroxy amino acid expected for the synthesis of drug for AIDS treatment. To develop a continuous bioconversion system with whole cell biocatalyst of D-threonine aldolase (D genes for the D-erythro-CHS production, D-threonine aldolase gene was amplified from Ensifer arboris 100383 by direct PCR amplication using two degenerated oligonucleotide primers designed based on genomic sequence of Shinorhizobium meliloti Sequence analysis of the cloned DNA fragment revealed one open-reading frame of 1059 bp and 386 amino acids. This putative D-TA gene was cloned into NdeI and EcoRI (pEnsi His-tag sequence or BamHI (pEnsi-DTA[2]) sequence of the pET21(a) vector. The expression level of the cloned gene was extremely overexpressed by E. coli BL21(DE3) transformed with pEnsi-DTA[1] compared to E. coli BL21(DE3) transformed with pEnsi-DTA[2]. When the cells expressing the wild used for D-TA enzyme activity, 12 mM glycine was successfully detected in HPLC analysis. Moreover, the whole cells harbouring the recombinant D-TA was able to synthesize D-erythro of 0.6 mg/ml in a batch reaction.

Keywords: About four key words or phrases in alphabetical order, separated by commas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743
5474 Fuzzy Set Approach to Study Appositives and Its Impact Due to Positional Alterations

Authors: E. Mike Dison, T. Pathinathan

Abstract:

Computing with Words (CWW) and Possibilistic Relational Universal Fuzzy (PRUF) are the two concepts which widely represent and measure the vaguely defined natural phenomenon. In this paper, we study the positional alteration of the phrases by which the impact of a natural language proposition gets affected and/or modified. We observe the gradations due to sensitivity/feeling of a statement towards the positional alterations. We derive the classification and modification of the meaning of words due to the positional alteration. We present the results with reference to set theoretic interpretations.

Keywords: Appositive, computing with words, PRUF, semantic sentiment analysis, set theoretic interpretations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 840
5473 The Use of Emoticons in Polite Phrases of Greetings and Thanks

Authors: Zuzana Komrsková

Abstract:

This paper shows the connection between emoticons and politeness in written computer-mediated communication. It studies if there are some differences in the use of emoticon between Czech and English written tweets. The assumptions about the use of emoticons were based on the use of greetings and thanks in real, faceto-face situations. The first assumption, that welcome greeting phrase would be accompanied by positive emoticon, was correct. But for the farewell greeting are both positive and negative emoticons possible. The results show lower frequency of negative emoticons in this context. There were also quite often found both positive and negative emoticon in the same tweet. The expression of gratitude is associated with positive emotions. The results show that emoticons accompany polite phrases of greeting and thanks very often both in Czech and English. The use of emoticons with studied polite phrases shows that emoticons have become an integral part of these phrases. 

Keywords: Computer-mediated communication, emoticons, politeness, Twitter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2507
5472 Vocal Communication in Sooty-headed Bulbul; Pycnonotus aurigaster

Authors: Surakan Payakkhabut

Abstract:

Studies of vocal communication in Sooty-headed Bulbul were carried out from January to December 2011. Vocal recordings and behavioral observations were made in their natural habitats at some localities of Lampang, Thailand. After editing, cuts of high quality recordings were analyzed with the help of Avisoft- SASLab Pro (version 4.40) software. More than one thousand element repertoires in five groups were found within two vocal structures. The two structures were short sounds with single element and phrases composed of elements, the frequency ranged from 1-10 kHz. Most phrases were composed of 2 to 5 elements that were often dissimilar in structure, however, these phrases were not as complex as song phrases. The elements and phrases were combined to form many patterns. The species used ten types of calls; i.e. alert, alarm, aggressive, begging, contact, courtship, distress, exciting, flying and invitation. Alert and contact calls were used more frequently than other calls. Aggressive, alarm and distress calls could be used for interspecific communication among some other bird species in the same habitats.

Keywords: Vocal communication, Call, Bird, Sooty-headed Bulbul

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2631
5471 Words Reordering based on Statistical Language Model

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.

Keywords: Permutations filtering, Statistical languagemodel N-grams, Word order errors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
5470 Author's Approach to the Problem of Correctional Speech Therapy with Children Suffering from Alalia

Authors: Е. V. Kutsina, S. A. Tarasova

Abstract:

In this article we present a methodology which enables preschool and primary school unlanguaged children to remember words, phrases and texts with the help of graphic signs - letters, syllables and words. Reading for a child becomes a support for speech development. Teaching is based on the principle "from simple to complex", "a letter - a syllable - a word - a proposal - a text." Availability of multi-level texts allows using this methodology for working with children who have different levels of speech development.

Keywords: Alalia, analytic-synthetic method, development of coherent speech, formation of vocabulary, learning to read, , sentence formation, three-level stories, unlanguaged children.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941
5469 Neuro-Fuzzy Based Model for Phrase Level Emotion Understanding

Authors: Vadivel Ayyasamy

Abstract:

The present approach deals with the identification of Emotions and classification of Emotional patterns at Phrase-level with respect to Positive and Negative Orientation. The proposed approach considers emotion triggered terms, its co-occurrence terms and also associated sentences for recognizing emotions. The proposed approach uses Part of Speech Tagging and Emotion Actifiers for classification. Here sentence patterns are broken into phrases and Neuro-Fuzzy model is used to classify which results in 16 patterns of emotional phrases. Suitable intensities are assigned for capturing the degree of emotion contents that exist in semantics of patterns. These emotional phrases are assigned weights which supports in deciding the Positive and Negative Orientation of emotions. The approach uses web documents for experimental purpose and the proposed classification approach performs well and achieves good F-Scores.

Keywords: Emotions, sentences, phrases, classification, patterns, fuzzy, positive orientation, negative orientation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1079
5468 A Hybrid Ontology Based Approach for Ranking Documents

Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1630
5467 ORank: An Ontology Based System for Ranking Documents

Authors: Mehrnoush Shamsfard, Azadeh Nematzadeh, Sarah Motiee

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques for extracting phrases and stemming words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1888
5466 Intensifier as Changed from the Impolite Word in Thai

Authors: Methawee Yuttapongtada

Abstract:

Intensifier is the linguistic term and device that is generally found in different languages in order to enhance and give additional quantity, quality or emotion to the words of each language. In fact, each language in the world has both of the similar and dissimilar intensifying device. More specially, the wide variety of intensifying device is used for Thai language and one of those is usage of the impolite word or the word that used to mean something negative as intensifier. The data collection in this study was done throughout the spoken language style by collecting from intensifiers regarded as impolite words because these words as employed in the other contexts will be held as the rude, swear words or the words with negative meaning. Then, backward study to the past was done in order to consider the historical change. Explanation of the original meaning and the contexts of words use from the past till the present time were done by use of both textual documents and dictionaries available in different periods. It was found that regarding the semantics and pragmatic aspects, subjectification also is the significant motivation that changed the impolite words to intensifiers. At last, it can explain pathway of the semantic change of these very words undoubtedly. Moreover, it is found that use tendency in the impolite word or the word that used to mean something negative will more be increased and this phenomenon is commonly found in many languages in the world and results of this research may support to the belief that human language in the world is universal and the same still reflected that human has the fundamental thought as the same to each other basically.

Keywords: Impolite word, intensifier, Thai, semantic change.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309
5465 N-Grams: A Tool for Repairing Word Order Errors in Ill-formed Texts

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Konstantinos Mamouras

Abstract:

This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.

Keywords: Permutations filtering, Statistical language model N-grams, Word order errors, TOEFL

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1668
5464 Information Filtering using Index Word Selection based on the Topics

Authors: Takeru YOKOI, Hidekazu YANAGIMOTO, Sigeru OMATU

Abstract:

We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user-s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.

Keywords: Information Filtering, Sparse NMF, Index wordSelection, User Profile, Chi-squared Measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456
5463 Experimental and Numerical Study of The Shock-Accelerated Elliptic Heavy Gas Cylinders

Authors: Jing S. Bai, Li Y. Zou, Tao Wang, Kun Liu, Wen B. Huang, Jin H. Liu, Ping Li, Duo W. Tan, CangL. Liu

Abstract:

We studied the evolution of elliptic heavy SF6 gas cylinder surrounded by air when accelerated by a planar Mach 1.25 shock. A multiple dynamics imaging technology has been used to obtain one image of the experimental initial conditions and five images of the time evolution of elliptic cylinder. We compared the width and height of the circular and two kinds of elliptic gas cylinders, and analyzed the vortex strength of the elliptic ones. Simulations are in very good agreement with the experiments, but due to the different initial gas cylinder shapes, a certain difference of the initial density peak and distribution exists between the circular and elliptic gas cylinders, and the latter initial state is more sensitive and more inenarrable.

Keywords: About four key words or phrases in alphabeticalorder, separated by commas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1512
5462 Component-based Segmentation of Words from Handwritten Arabic Text

Authors: Jawad H AlKhateeb, Jianmin Jiang, Jinchang Ren, Stan S Ipson

Abstract:

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.

Keywords: Arabic OCR, off-line recognition, Baseline estimation, Word segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2206
5461 Extraction of Significant Phrases from Text

Authors: Yuan J. Lui

Abstract:

Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.

Keywords: classification, keyphrase extraction, machine learning, summarization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2051
5460 Text Retrieval Relevance Feedback Techniques for Bag of Words Model in CBIR

Authors: Nhu Van NGUYEN, Jean-Marc OGIER, Salvatore TABBONE, Alain BOUCHER

Abstract:

The state-of-the-art Bag of Words model in Content- Based Image Retrieval has been used for years but the relevance feedback strategies for this model are not fully investigated. Inspired from text retrieval, the Bag of Words model has the ability to use the wealth of knowledge and practices available in text retrieval. We study and experiment the relevance feedback model in text retrieval for adapting it to image retrieval. The experiments show that the techniques from text retrieval give good results for image retrieval and that further improvements is possible.

Keywords: Relevance feedback, bag of words model, probabilistic model, vector space model, image retrieval

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117
5459 The Phonology and Phonetics of Second Language Intonation in Case of “Downstep”

Authors: Tayebeh Norouzi

Abstract:

This study aims to investigate the acquisition process of intonation. It examines the intonation structure of Tokyo Japanese and its realization by Iranian learners of Japanese. Seven Iranian learners of Japanese, differing in fluency, and two Japanese speakers participated in the experiment. Two sentences were used to test the phonological and phonetic characteristics of lexical pitch-accent as well as the intonation patterns produced by the speakers. Both sentences consisted of similar words with the same number of syllables and lexical pitch-accents but different syntactic structure. Speakers were asked to read each sentence three times at normal speed, and the data were analyzed by Praat. The results show that lexical pitch-accent, Accentual Phrase (AP) and AP boundary tone realization vary depending on sentence type. For sentences of type XdeYwo, the lexical pitch-accent is realized properly. However, there is a rise in AP boundary tone regardless of speakers’ level of fluency. In contrast, in sentences of type XnoYwo, the lexical pitch-accent and AP boundary tone vary depending on the speakers’ fluency level. Advanced speakers are better at grouping words into phrases and produce more native-like intonation patterns, though they are not able to realize downstep properly. The non-native speakers tried to realize proper intonation patterns by making changes in lexical accent and boundary tone.

Keywords: Intonation, Iranian learners, Japanese prosody, lexical accent, second language acquisition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 988
5458 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: Cooccurrence graph, entity relation graph, unstructured text, weighted distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 684
5457 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2462
5456 Determining Senses for Word Sense Disambiguation in Turkish

Authors: Zeynep Orhan, Zeynep Altan

Abstract:

Word sense disambiguation is an important intermediate stage for many natural language processing applications. The senses of an ambiguous word are the classification of usages for that specific word. This paper deals with the methodologies of determining the senses for a given word if they can not be obtained from an already available resource like WordNet. We offer a method that helps us to determine the sense boundaries gradually. In this method, first we decide on some features that are thought to be effective on the senses and divide the instances first into two, then according to the results of evaluations we continue dividing instances gradually. In a second method we use the pseudo words. We devise artificial words depending on some criteria and evaluate classification algorithms on these previously classified words.

Keywords: Word sense disambiguation, sense determination, pseudo words, sense granularity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410
5455 Development of a Health Literacy Scale for Chinese-Speaking Adults in Taiwan

Authors: Frank C. Pan, Che-Long Su, Ching-Hsuen Chen

Abstract:

Background, measuring an individual-s Health Literacy is gaining attention, yet no appropriate instrument is available in Taiwan. Measurement tools that were developed and used in western countries may not be appropriate for use in Taiwan due to a different language system. Purpose of this research was to develop a Health Literacy measurement instrument specific for Taiwan adults. Methods, several experts of clinic physicians; healthcare administrators and scholars identified 125 common used health related Chinese phrases from major medical knowledge sources that easy accessible to the public. A five-point Likert scale is used to measure the understanding level of the target population. Such measurement is then used to compare with the correctness of their answers to a health knowledge test for validation. Samples, samples under study were purposefully taken from four groups of people in the northern Pingtung, OPD patients, university students, community residents, and casual visitors to the central park. A set of health knowledge index with 10 questions is used to screen those false responses. A sample size of 686 valid cases out of 776 was then included to construct this scale. An independent t-test was used to examine each individual phrase. The phrases with the highest significance are then identified and retained to compose this scale. Result, a Taiwan Health Literacy Scale (THLS) was finalized with 66 health-related phrases under nine divisions. Cronbach-s alpha of each division is at a satisfactory level of 89% and above. Conclusions, factors significantly differentiate the levels of health literacy are education, female gender, age, family members of stroke victims, experience with patient care, and healthcare professionals in the initial application in this study..

Keywords: Health literacy, health knowledge, REALM, THLS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2526
5454 Online Topic Model for Broadcasting Contents Using Semantic Correlation Information

Authors: Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script, which is a series of texts including directions and dialogues. The other is blogposts, which possesses relatively abstracted contents, stories, and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. When unseen words appear, it needs a method to reflect to existing topic. In this paper, we introduce a semantic vocabulary expansion method to reflect unseen words. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can discover more salient topics for broadcasting contents.

Keywords: Broadcasting script analysis, topic expansion, semantic correlation analysis, word2vec.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760
5453 A Novel Approach to Persian Online Hand Writing Recognition

Authors: Ramin Halavati, Mansour Jamzad, Mahdieh Soleymani

Abstract:

Persian (Farsi) script is totally cursive and each character is written in several different forms depending on its former and later characters in the word. These complexities make automatic handwriting recognition of Persian a very hard problem and there are few contributions trying to work it out. This paper presents a novel practical approach to online recognition of Persian handwriting which is based on representation of inputs and patterns with very simple visual features and comparison of these simple terms. This recognition approach is tested over a set of Persian words and the results have been quite acceptable when the possible words where unknown and they were almost all correct in cases that the words where chosen from a prespecified list.

Keywords: Image Processing, Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1330
5452 Carnatic Music Ragas and Their Role in Music Therapy

Authors: Raghavi Janaswamy, Saraswathi K. Vasudev

Abstract:

Raga, as the soul and base, is a distinctive musical entity, in the music system, with unique structure on its construction of srutis (musical sounds) and application. One of the essential components of the music system is the ‘tala’ that defines the rhythm of a song. There are seven basic swaras (notes) Sa, Ri, Ga, Ma, Pa, Da and Ni in the carnatic music system that are analogous to the C, D, E, F, G, A and B of the western system. The carnatic music further builds on conscious use of microtones, gamakams (oscillation) and rendering styles. It has basic 72 ragas known as melakarta ragas, and a plethora of ragas have been developed from them with permutations and combinations of the basic swaras. Among them, some ragas derived from a same melakarta raga are distinctly different from each other and could evoke a profound difference in the raga bhava (emotion) during rendering. Although these could bear similar arohana and avarohana swaras, their quintessential differences in the gamakas usage and srutis present therein offer varied melodic feelings; variations in the intonation and stress given to certain swara phrases are the root causes. This article enlightens a group of such allied ragas (AR) from the perspectives of their schema and raga alapana (improvisation), ranjaka prayogas (signature phrases), differences in rendering tempo, gamakas and delicate srutis along with the range of sancharas (musical phrases). The intricate differences on the sruti frequencies and use of AR in composing kritis (musical compositions) toward emotive accomplishments such as mood of valor, kindness, love, humor, anger, mercy to name few, have also been explored. A brief review on the existing scientific research on the music therapy on some of the Carnatic ragas is presented. Studying and comprehending the AR, indeed, enable the music aspirants to gain a thorough knowledge on the subtle nuances among the ragas. Such knowledge helps leave a long-lasting melodic impression on the listeners and enable further research on the music therapy.

Keywords: Carnatic music, Allied rags, Raga analysis, Music therapy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1546
5451 Concept Indexing using Ontology and Supervised Machine Learning

Authors: Rossitza M. Setchi, Qiao Tang

Abstract:

Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.

Keywords: Concepts, indexing, machine learning, ontology, tagging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678
5450 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 500