Advanced Emacs Internationalization HOWTO

Alexander Mikhailian

2001.03.24 19:16

Revision History
Revision 0.12001.03.24
Started by Alexander Mikhailian.

This document describes advanced internationalization capabilities built into Emacs and explains how to effectively explore Emacs internationalization features on different software platforms. Some knowledge of Emacs environment and customization procedures is required. Knowledge of emacs-lisp is helpful.


Administrativia

Comments

Comments on this HOWTO may be directed to the author mikhailian@altern.org.


New Versions

The newest version can always be found at The LDP Project site.


Copyright

This document may be reproduced in whole or in part without restrictions.


Installing Emacs and Mule

Internationalization support in Emacs is provided by Mule (MULtilingual Environment) which is seamlessly integrated into Emacs.


Setting up fonts

At startup, Emacs tries to create two fontsets called startup and standard. Startup fontset is supposed to contain fonts that have their bold and italic variants while standard fontset is geared towards maximum i18n and contains fonts with as much of different encodings as Emacs could find. By default, startup fontset is active. To enable standard fontset, go to the menu item Mule-> Set Font/Fontset and select the fontset standard from the list. To make the change permanent, add

        (setq default-fontset "fontset-standard")
      
to the ~/.emacs file. Standard fontset does not always come up with the best choice. In this case, there exists the possibility to define a custom fontset.

To find out which system fonts are available to Emacs, get the list of all available fonts by calling

        (insert "\n" (prin1-to-string (x-list-fonts "*")))
      
Fonts from this list can then be used to create a customized fontset by calling w32-standard-fontset-spec on Win32 or create-fontset-from-fontset-spec on XFree platforms. The definition assigns fonts to handle each of character encodings. There are two important constraints. The first one is that Emacs can use only monospace fonts. These fonts have c in the ante-ante-penultimate field of their name, like in the name of the monospace Courier New:
        -*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-1
      
in constast with the proportional Helvetica font
        -*-Helv-normal-r-*-*-12-*-*-*-p-*-iso8859-1
      
The second constraint is that the fontset should contain fonts of similar or identic size as the size of the display space allocated to each glyph is determined by the font with the biggest size.

To get the list of charsets that are available to emacs and can be used for fontset creation, do M-x list-character-sets

The procedure of defining hand-crafted fontsets differ on different platforms.


XFree

Fontest are defined using create-fontset-from-fontset-spec e.g the following fontset will handle Latin-1, Latin-2 and Cyrillic encodings.

          (create-fontset-from-fontset-spec
          "-adobe-courier-medium-r-*-*-14-*-*-*-*-*-fontset-adobe,
          latin-iso8859-1:-adobe-courier-medium-r-*-*-14-*-*-*-*-*-*-1,
          latin-iso8859-2:-adobe-courier-medium-r-*-*-14-*-*-*-*-*-*-2,
          cyrillic-iso8859-5:-adobe-courier-medium-r-*-*-14-*-*-*-*-*-*-5")
        


Win32

Emacs can use either system TTF fonts or BDF fonts that are loaded directly from the file system. This feature allows Emacs to handle i18n separeatly from the operating system.

TTF fonts can be defined for each of Mule encodings using the variable w32-standard-fontset-spec. Maximum internationalization for the Windows NT 4.0 or Windows 2000 may be obtained by setting the following fonts:

(setq w32-standard-fontset-spec
 "-*-Courier New-normal-r-*-*-12-*-*-*-c-*-fontset-courier,
 ascii:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-1,
 latin-iso8859-1:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-1,
 latin-iso8859-2:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-2,
 latin-iso8859-3:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-3,
 latin-iso8859-4:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-4,
 latin-iso8859-9:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-9,

 cyrillic-iso8859-5:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-5,
 greek-iso8859-7:-*-Courier New-normal-r-*-*-12-*-*-*-c-*-iso8859-7,
 hebrew-iso8859-8:-*-Rod-normal-r-*-*-12-*-*-*-c-*-iso8859-8,

 ipa:-*-Lucida Sans Unicode-normal-r-*-*-12-*-*-*-c-*-muleipa*-*,

 thai-tis620:-*-Tahoma-normal-r-*-*-12-*-*-*-c-*-tis620-*,

 latin-jisx0201:-*-MS Gothic-normal-r-*-*-12-*-*-*-c-*-jisx0208-sjis,
 katakana-jisx0201:-*-MS Gothic-normal-r-*-*-12-*-*-*-c-*-jisx0208-sjis,
 japanese-jisx0208:-*-MS Gothic-normal-r-*-*-12-*-*-*-c-*-jisx0208-sjis,
 japanese-jisx0208-1978:-*-MS Gothic-normal-r-*-*-12-*-*-*-c-*-jisx0208-sjis,
 japanese-jisx0212:-*-MS Gothic-normal-r-*-*-12-*-*-*-c-*-jisx0212-sjis,
 korean-ksc5601:-*-Gulim-normal-r-*-*-12-*-*-*-c-*-ksc5601-*,
 chinese-gb2312:-*-MS Song-normal-r-*-*-12-*-*-*-c-*-gb2312-*,
 chinese-big5-1:-*-MingLiU-normal-r-*-*-12-*-*-*-c-*-big5-*,
 chinese-big5-2:-*-MingLiU-normal-r-*-*-12-*-*-*-c-*-big5-*")
        

In order to correctly display the HELLO file from the Emacs distrubution, it is necessary to use BDF fonts. A simple example of making BDF fonts available to Emacs is shown below:

;; Set up BDF fonts:
(setq bdf-directory-list
      '("c:/intlfonts/Asian"     "c:/intlfonts/Chinese"
      "c:/intlfonts/Chinese.X"   "c:/intlfonts/Ethiopic"
      "c:/intlfonts/European"    "c:/intlfonts/Japanese"
      "c:/intlfonts/Japanese.X"  "c:/intlfonts/Korean.X"
      "c:/intlfonts/Misc/"       "c:/intlfonts/Japanese.BIG"))

;; Read font names from disk
(setq w32-bdf-filename-alist (w32-find-bdf-fonts bdf-directory-list))

;; Some fonts have to be mapped explicitly
(setq font-encoding-alist
      (append '(("MuleTibetan-0"       (tibetan . 0))
                ("GB2312"              (chinese-gb2312 . 0))
                ("JISX0208"            (japanese-jisx0208 . 0))
                ("JISX0212"            (japanese-jisx0212 . 0))
                ("VISCII"              (vietnamese-viscii-lower . 0))
                ("KSC5601"             (korean-ksc5601 . 0))
                ("MuleArabic-0"        (arabic-digit . 0))
                ("MuleArabic-1"        (arabic-1-column . 0))
                ("MuleArabic-2"        (arabic-2-column . 0))
                ("MuleIndian-1"        (indian-1-column . 0))
                ("MuleIndian-2"        (indian-2-column . 0))
                ("is13194-Devanagari"  (indian-is13194 . 0))
                )
              font-encoding-alist))
        

Once BDF fonts have been made available to Emacs, they can be used for defining the fontset. The following fontset will display the HELLO file correctly:

;; hand-crafted bdf-only fontset
(create-fontset-from-fontset-spec
 "-*-fixed-medium-r-normal-*-16-*-*-*-c-*-fontset-bdf8,
 japanese-jisx0208:-*-*-medium-r-normal-*-16-*-*-*-c-*-jisx0208.1983-*,
 katakana-jisx0201:-*-*-medium-r-normal-*-16-*-*-*-c-*-jisx0201*-*,
 latin-jisx0201:-*-*-medium-r-normal-*-16-*-*-*-c-*-jisx0201*-*,
 japanese-jisx0208-1978:-*-*-medium-r-normal-*-16-*-*-*-c-*-jisx0208.1978-*,
 thai-tis620:-misc-fixed-medium-r-normal--16-160-72-72-m-80-tis620.2529-1,
 lao:-misc-fixed-medium-r-normal--16-160-72-72-m-80-MuleLao-1,
 tibetan-1-column:-TibMdXA-fixed-medium-r-normal--16-160-72-72-m-80-MuleTibetan-1,
 ethiopic:-Admas-Ethiomx16f-Medium-R-Normal--16-150-100-100-M-160-Ethiopic-Unicode,
 tibetan:-TibMdXA-fixed-medium-r-normal--16-160-72-72-m-160-MuleTibetan-0,
 indian-1-column:-misc-fixed-medium-r-normal--16-160-72-72-c-80-muleindian-1,
 indian-2-column:-misc-fixed-medium-r-normal--16-160-72-72-c-80-muleindian-2,
 indian-is13194:-misc-fixed-medium-r-normal--16-160-72-72-c-80-is13194-devanagari")
        


Handling input methods


Adapting to external environment


Advanced setup