Wednesday, November 10, 2010

At last... I have been suffering with XEmacs displaying odd characters instead of the quotation marks that are used in R help files. This was driving me up the wall because it makes the files (and R output in general) very hard to read; however, I finally diagnosed the problem: Xemacs was not recognizing UTF-8 encoding. Below is a quote from Marjan Parsa that describes how to set up Emacs and XEmacs to automatically detect UTF-8 files. My quality of life has already improved.


How can I get XEmacs to work with UTF-8 files?

* Set up XEmacs so that it autodetects UTF-8 encoded files.
* In the case of starting a new file in a non-UTF-8 locale, set the file coding system to UTF-8 using C-x RET f.
* If running XEmacs in non-graphical mode in a UTF-8 xterm, set the terminal coding system to UTF-8 using C-x RET t.

If you want XEmacs to load UTF-8 files correctly, add the following lines to your ~/.xemacs/init.el:

(require 'un-define)
(set-coding-priority-list '(utf-8))
(set-coding-category-system 'utf-8 'utf-8)

Note that Emacs does not deal well with these additions, so if you also run Emacs, then adding the following will keep Emacs from complaining:

;; Are we running XEmacs or Emacs?
(defvar running-xemacs (string-match "XEmacs\\|Lucid" emacs-version))

...

(if (not running-xemacs) nil
;; enable Mule-UCS
(require 'un-define)

;; by default xemacs does not autodetect Unicode
(set-coding-priority-list '(utf-8))
(set-coding-category-system 'utf-8 'utf-8))

These lines will get XEmacs to load UTF-8 files in UTF-8 mode (it will display a "u" in the bottom left corner of your status bar). If you have already loaded a file and would like to start inputting UTF-8, you can use C-x RET f, to set the file coding system to UTF-8. Note that you may additionally have to set the terminal coding system to UTF-8. This seems to be necessary, for example, in the case where XEmacs is run in non-graphical mode inside a UTF-8 enabled xterm. You can set the terminal encoding using C-x RET t.

Caution: I have had problems with XEmacs double encoding in the case where 1) the file contains UTF-8, 2) the file is loaded in non-UTF-8 mode, 3) the user switches to UTF-8 mode (using C-x RET f), 4) enters some text, and 5) saves. In other words, if your file already contains UTF-8 characters, make sure that it is loaded in UTF-8 mode before editing it.