Japan Reference
DIRECTORY FORUM GALLERY CLASSIFIEDS PRACTICAL SOCIETY CULTURE LANGUAGE ENTERTAINMENT MORE...
All JREF Directory Forum EupediaGoogle
About JREF | Contact Us | JREF Shop | Topsites | Advertising | Sitemap | HelpJapanese information about JREF 
Site NavigationJREF Top > Language > Language Processing > Japanese Text Encoding

JREF Language | Language Processing | Encoding | Translation Tools | Resources | Downloads

Japanese Encoding Standards

Currently, there are three major Japanese encoding standards in use to process Japanese characters:

  • JIS - Japanese Industrial Standard
  • Shift-JIS
  • EUC-JP - Extended Unix Code

Japanese encoding standards require 2-bytes, as opposed Western languages which are based on 1-byte encoding. In addition to these three, another international standard of growing importance is Unicode, designed by the Unicode Consortium. It can be used to represent most of the world's languages. Unicode unifies Chinese characters (kanji) used in traditional and simplified Chinese as well as in Japanese and Korean. This unified set of kanji is referred to as CJK.

JIS

The Japanese Industrial Standard uses 7-bit bytes and works with ASCII characters as well as with escape sequences to deliminate Japanese from other languages. It is mostly used for network transmissions such as sending and receiving email or network news, since many networks do not read the eighth bit of 8-bit bytes. Japanese email clients on a Japanese operating system will automatically convert messages into JIS and back. While most modern browsers recognize all three encoding types ("Auto-Detect"), JIS will alert the browser to switch to Japanese. ISO-2022-JP (JIS) encoding defines a standard way to send data in multiple character sets when the transmission medium supports 7-bit bytes.

Shift-JIS

Originally developed by Microsoft, Shift-JIS (also known as SJIS, X-SJIS or MS Kanji) is mainly used for internal computer coding in PCs and Macs. It uses 8-bit bytes, resulting in double-byte dependencies: a given byte may be a single byte ASCII character meant to stand alone, or it may be the second byte of a 2-byte character, meant to be read together with the other byte (especially problematic if the eighth bit has been cut off by the network, as mentioned above).

EUC-JP

Extended UNIX Code (formerly also called X-EUC-JP) is commonly used on Japanese UNIX systems. Web pages that reside on UNIX systems are often encoded in EUC. EUC is very similar to JIS without the escape sequences, and the 8th bit turned on in encoded bytes. It is highly recommended to use EUC-JP together with PHP and MySQL. Last, but not least, XML will only support EUC-JP.

Please consult Ka-Ping Yee's web site for more technical information. Ka-Ping Yee is also author of Shodouka.

 



Site Sections

  • Japan Directory
  • Japan Forum
  • Japan Photo Gallery
  • Practical Guide
  • Cultural Guide
  • Entertainment
  • Society
  • Language
  • Glossary
  • More JREF

  • Japanese Friends
  • JREF Classifieds
  • JREF eCards
  • JREF Polls
  • JREF Shop
  • Penpal Forum
  • Sitemap
  • Site Help
  • Webmasters

  • Advertising
  • Japan Banner Exchange
  • JREF Award
  • Seti@JREF
  • Topsites Japan
  • Web Hosting
  • Webtools
  • Hosted & Recommended

  • Eupedia
  • Europe Directory
  • China Gallery
  • Kigawa.org
  • e-Wadachi.com
  • Tokyo Cycling Club
  • Jim Breen's Dictionary



  • About JREF - Contact JREF - Privacy Statement - Terms of Use - Advertising
    Copyright © 1999-2008 Japan Reference All Rights Reserved