Systems and Means of Informatics
2014, Volume 24, Issue 4, pp 124-134
ADJUSTABLE VARIABLE-LENGTH CHARACTER ENCODING SCHEME - ACE
- I. M. Adamovich
- D. V. Zemskov
Abstract
The article describes ACE (Adjustable Character Encoding) - a variable-length character encoding scheme, which is capable of encoding the
full range of UCS (Universal Coded Character Set, ISO/IEC 10646) code points as sequences of one to four octets (8-bit code units). The main reason of creating this encoding was to increase, in comparison with UTF-8 (Unicode Transformation Format, 8-bit), the number of code points encoded as one-octet code unit sequence, thus allowing more compact representation of texts containing characters of a chosen national alphabet, and also to increase the capability to preserve binary representation of encoded characters of such alphabet to match their binary values in a single-byte code table. This encoding retains such properties of the UTF-8 encoding as statelessness (the representation of an encoded character does not depend on the values of previous characters), selfsynchronization (none of the valid code sequences can occur inside the other one, nor inside any adjacent sequences across their boundaries), and the possibility to locate the beginning or the end of a code sequence at any place of encoded text.
[+] References (3)
- ISO/IEC 10646 - Information technology - Universal Coded Character Set (UCS). Available at: http://standards.iso.org/ittf/PubliclyAvailableStandards/ c056921_ISO_IEC_10646_2012.zip (accessed September 9, 2014).
- RFC 3629 - UTF-8, a transformation format of ISO 10646. Available at: https:// tools.ietf.org/html/3629 (accessed September 9, 2014).
- Unicode Technical Report #17 - Unicode Character Encoding Model. Available at: http://www.unicode.org/reports/tr17/ (accessed September 9, 2014).
[+] About this article
Title
ADJUSTABLE VARIABLE-LENGTH CHARACTER ENCODING SCHEME - ACE
Journal
Systems and Means of Informatics
Volume 24, Issue 4, pp 124-134
Cover Date
2013-11-30
DOI
10.14357/08696527140408
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
character encoding scheme; UCS; program localization; UTF-8
Authors
I. M. Adamovich and D. V. Zemskov
Author Affiliations
Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|