Japanese Encoding Standards
Currently, there are three major Japanese encoding standards in use to process Japanese characters:
- JIS - Japanese Industrial Standard
- Shift-JIS
- EUC-JP - Extended Unix Code
Japanese encoding standards require 2-bytes, as opposed Western languages which are based on 1-byte encoding.
In addition to these three, another international standard of growing importance is Unicode, designed by the
Unicode Consortium. It can
be used to represent most of the world's languages. Unicode unifies Chinese characters (kanji) used in traditional
and simplified Chinese as well as in Japanese and Korean. This unified set of kanji is referred to as CJK.
JIS
The Japanese Industrial Standard uses 7-bit bytes and works with ASCII characters as well as with
escape sequences to deliminate Japanese from other languages. It is mostly used for network transmissions such as sending and
receiving email or network news, since many networks do not read the eighth bit of 8-bit bytes. Japanese email clients on
a Japanese operating system will automatically convert messages into JIS and back. While most modern browsers
recognize all three encoding types ("Auto-Detect"), JIS will alert the browser to switch to Japanese. ISO-2022-JP
(JIS) encoding defines a standard way to send data in multiple character sets when the transmission medium supports
7-bit bytes.
Shift-JIS
Originally developed by Microsoft, Shift-JIS (also known as SJIS, X-SJIS or MS Kanji) is mainly used
for internal computer coding in PCs and Macs. It uses 8-bit bytes, resulting in double-byte dependencies: a given
byte may be a single byte ASCII character meant to stand alone, or it may be the second byte of a 2-byte character,
meant to be read together with the other byte (especially problematic if the eighth bit has been cut off by the
network, as mentioned above).
EUC-JP
Extended UNIX Code (formerly also called X-EUC-JP) is commonly used on Japanese UNIX systems.
Web pages that reside on UNIX systems are often encoded in EUC. EUC is very similar to JIS without the escape
sequences, and the 8th bit turned on in encoded bytes. It is highly recommended to use EUC-JP together with PHP and MySQL.
Last, but not least, XML will only support EUC-JP.
Please consult Ka-Ping Yee's
web site for more technical information. Ka-Ping Yee is also author of
Shodouka.
|