A Method for Compression of Short Unicode Strings
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 84472
A Method for Compression of Short Unicode Strings

Authors: Masoud Abedi, Abbas Malekpour, Peter Luksch, Mohammad Reza Mojtabaei

Abstract:

The use of short texts in communication has been greatly increasing in recent years. Applying different languages in short texts has led to compulsory use of Unicode strings. These strings need twice the space of common strings, hence, applying algorithms of compression for the purpose of accelerating transmission and reducing cost is worthwhile. Nevertheless, other compression methods like gzip, bzip2 or PAQ due to high overhead data size are not appropriate. The Huffman algorithm is one of the rare algorithms effective in reducing the size of short Unicode strings. In this paper, an algorithm is proposed for compression of very short Unicode strings. At first, every new character to be sent to a destination is inserted in the proposed mapping table. At the beginning, every character is new. In case the character is repeated for the same destination, it is not considered as a new character. Next, the new characters together with the mapping value of repeated characters are arranged through a specific technique and specially formatted to be transmitted. The results obtained from an assessment made on a set of short Persian and Arabic strings indicate that this proposed algorithm outperforms the Huffman algorithm in size reduction.

Keywords: Algorithms, Data Compression, Decoding, Encoding, Huffman Codes, Text Communication

Procedia PDF Downloads 315