HOWTO Make your system use unicode/utf-8
From Gentoo Linux Wiki
| Installation • Kernel & Hardware • Networks • Portage • Software • System • X Server • Gaming • Non-x86 • Emulators • Misc |
Contents |
The drawback is that this limits the number of characters that can be represented by the table. As long as the table contains all the characters you need, there are no problems. The moment one shares a file with someone who uses a different character table, things start going wrong.
Some tables (such as the ISO-8859-* tables) overlap with the same string representing the same characters. Other characters may exist in only one of the tables. These, naturally, are the main point of contention.
There are two solutions to this problem. Either one must have information about the character table used in each file that contains text, or have a table that incorporates each and every character in the world.
Unicode is an implementation of the latter. It allows users to write and exchange information without compatibility worries and with falling prices for storage, it has become very popular. Users only have to make sure that their software supports Unicode and they have fonts installed that can display all the characters they wish to use (as no single font implements all the characters in Unicode).
[edit] Kernel Stuff
To activate unicode in the kernel set the following in:
| Linux Kernel Configuration: Unicode support |
File systems ---> Native Language Support ---> (utf8) Default NLS Option <*> NLS UTF8 |
Now your filenames will be encoded in utf8 per default, after you re-compile your kernel.
If you compiled it as a module, be sure to load it:
modprobe nls_utf8
To avoid doing this every time you boot, add "nls_utf8" to your /etc/modules.autoload.d/kernel-2.6 or -2.4 file.
[edit] Kernel Bugs
Please note that there exists a bug in some Linux kernel versions which affects UTF-8 locales using dead keys. The issue has reportedly been solved since kernel version 2.6.11.
[edit] Installing locales
The system locales come with the glibc package. By default almost all possible locales are installed, though you can choose to install only the locales you need.
- See TIP Specifying only needed locales for instructions.
[edit] Console setup
In ~root/.bashrc add
| File: ~/.bashrc |
if [[ $TERM = "linux" ]]; then unicode_start fi |
to set the console into unicode mode on root's login (use "unicode_start foo_font" to set your custom font).
But, since "unicode_start" requires root privileges, you can instead configure your Gentoo system to default to unicode consoles for all logins. For this to work, you must have a recent version of sys-apps/baselayout installed (>=sys-apps/baselayout-1.11.9).
First, change the unicode setting in /etc/rc.conf
| File: /etc/rc.conf |
UNICODE="yes" |
Mind the case. UNICODE="YES" will NOT work.
Then, to install a good font for UTF-8 consoles called terminus
| Code: emerge terminus |
emerge -av media-fonts/terminus-font |
Also edit the following files, according to their comments:
/etc/conf.d/consolefont /etc/conf.d/keymaps
One example for setting the console font is
| File: /etc/conf.d/consolefont |
CONSOLEFONT="ter-v16b" #CONSOLETRANSLATION="" |
Now, reboot the system, and the system INIT will automatically enable UTF-8 capability on all console logins. However, a particular console login won't actually display in UTF-8 until receiving a switch-to-unicode escape sequence.
The last step is to make the following change so that the switch-to-unicode escape sequence executes at each login
| File: ~/.bash_profile |
if test -t 1 -a -t 2 ; then
echo -n -e '\033%G'
fi
|
This code instructs the console to switch to unicode if running from a console TTY (and not a terminal emulator or remote shell). In fact, this code block is directly from the internals of the "unicode_start" command.
Or, to make the switch to UTF-8 global for all users (could be problematic)
| File: /etc/profile |
if test -t 1 -a -t 2 ; then
echo -n -e '\033%G'
fi
|
As a final, last-ditch alternative, you can use this init.d script to set all consoles into unicode mode on bootup:
| File: /etc/init.d/unicode |
#!/sbin/runscript
conf=/etc/env.d/02locale
# Using devfs?
if [ -e /dev/.devfsd ] || [ -e /dev/.udev -a -d /dev/vc ]; then
device=/dev/vc/
else
device=/dev/tty
fi
depend() {
need localmount
after keymaps
before consolefont
}
checkconfig() {
if [ -r ${conf} ]; then
. ${conf}
encoding=
[ -n "${LC_ALL}" ] && encoding=${LC_ALL#*.} && return 0
[ -n "${LC_MESSAGES}" ] && encoding=${LC_MESSAGES#*. } && return 0
[ -n "${LANG}" ] && encoding=${LANG#*.} && return 0
fi
eend 1 "Locale is not configured, Please fix ${conf}"
return 1
}
start() {
ebegin "setting consoles to UTF-8"
checkconfig
if [[ "${encoding}" =~ [uU][tT][fF]-?8 ]]; then
dumpkeys | loadkeys --unicode
for ((i=1; i <= "${RC_TTY_NUMBER}"; i++)); do
echo -ne "\033%G" > ${device}${i}
done
eend 0
else
eend 1 "UTF-8 is not required"
fi
}
|
| Code: to make script executable |
chmod +x /etc/init.d/unicode |
and then
| Code: add the script |
rc-update add unicode default |
Sometimes it might be needed to set LC_ALL and LANG environmental options as well, it's easy to set them following the instruction on the page of Gentoo Linux Localization Guide.
[edit] Converting old files
Once Unicode support has been added, old files may need to be re-encoded to display properly.
To re-encode the contents of plain text files you have the choice of and iconv, recode and enconv which is in app-i18n/enca).
app-text/convmv is a perl script utility that re-encodes filenames, directory names, and entire subtrees. Emerge it with
| Code: |
emerge -av app-text/convmv |
To test re-encoding a filename from ISO-8859-15 to UTF-8, try
| Code: |
convmv -f iso-8859-15 -t utf8 file-name-with-รค |
and if the produced command seems sane, add --notest code> to actually re-encode the name.
[edit] Apps
[edit] Terminal emulators
[edit] xterm
xterm is running in unicode mode when started with one of:
| Code: |
|
xterm -u8 uxterm |
[edit] urxvt
Urxvt from x11-terms/rxvt-unicode is always running in unicode mode. If you want it to use UTF-8, you have to set your LANG accordingly (eg LANG="en_US.UTF-8")
[edit] GNU Screen
GNU Screen must be invoked with the -U command line option.
If you are using it as a login shell you will have to write a wrapper that calls screen with the -U option and the options that are called when screen is used as a login shell:
| Code: GNU Screen wrapper |
#!/bin/sh exec /usr/bin/screen -xRR -U |
For people using it for irssi and so on, making an alias is enough.
| File: ~/.bashrc |
alias screen="screen -U" |
However, if you are running screen from an SSH or RSH session, then editing the screen configuration should be enough.
Add the following to ~/.screenrc
| File: ~/.screenrc |
defutf8 on |
[edit] Editors
Vim should work out of the box, since version 6.3 or so.
Nano versions prior to 1.3.6 can't handle utf8 properly. At the time of writing, this is only needed for the alpha and ppc-macos platforms.
| Code: |
echo "=app-editors/nano-1.3.6 ~alpha" >> /etc/portage/package.keywords emerge -uDav nano |
Emacs, when run in console mode, can be configured to handle unicode by adding the following LISP instructions to its configuration file:
| File: ~/.emacs |
(setq locale-coding-system 'utf-8) (set-terminal-coding-system 'utf-8) (set-keyboard-coding-system 'utf-8) (set-selection-coding-system 'utf-8) (prefer-coding-system 'utf-8) |
Notice, however, that the console must handle unicode too.
[edit] LaTeX
Merge unicode support for LaTeX with
| Code: |
emerge dev-tex/latex-unicode |
[edit] Mutt printing
Mutt should work without a flaw on a unicode console. But if you want to use pretty-printing you need a few tricks as a2ps does not support utf-8. Your best bet may be using ebuild:app-misc/muttprint as it seems to work perfect both in unicode and single-byte environments and produces very elegant output. However it requires latex to be installed on your system.
Emerge the package and put this in your ~/.muttrc
| File: ~/.muttrc |
set print_command=muttprint |
Otherwise you may emerge recode and a2ps:
emerge recode a2ps
and use this in
| File: ~/.muttrc |
set print_command="recode UTF-8..Latin-1 | a2ps -1 --portrait --borders=no -X latin1 --pretty-print=mail --strip 1 --highlight-level=heavy -P printername" |
You may also use u2ps from the gnome-u2ps package (Debian gnome-u2ps package - don't know if it's also available in Gentoo). It has native Unicode support.
[edit] Shells
[edit] bash
Bash is unicode-aware since version 3 and when using readline version 5. Both are in portage.
emerge bash sys-libs/readline revdep-rebuild --soname libreadline.so.4 rm /lib/libreadline.so.4*
be sure you know what you do when you perform the last step (see the info from the readline ebuild).
You will also need to have the package gentoolkit installed as it contains the revdep-rebuild tool.
The above recommended manual deletion of libreadline.so.4 needs to be double checked!
When I do:
# qfile /lib/libreadline.so.4 sys-libs/readline (/lib/libreadline.so.4)
# eix -s readline sys-libs/readline-5.2_p12-r1
Apparently, libreadline.so.4 belongs to readline-5*! This is further verified with:
# qlist readline
I propose a "clean-up" on this article as further configuration files are recommended to be modified when further configuration might not be needed. See Talk/Discussion link at the top of this page for further info on these issues. I too believe a lot of this stuff should already be implemented within /etc/rc.conf and the unicode USE Flag.
[edit] zsh
Zsh handles UTF-8 perfectly since version 4.3.1. Older versions are not yet unicode aware. It still works as long as you dont use Backspace on unicode characters. (This deletes parts of the utf-8 character bytewise and confuses zle assumptions about the cursor position.)
[edit] mc
Mc must be compiled with the sys-libs/slang library for full unicode support.
emerge gentoolkit euse -E slang emerge -avDN mc
[edit] X
X usually obeys the LC_* environment variables; however, X is picky about how you spell your locale settings. What works in the console may not work in X. You can find a list of all acceptable locale aliases in /usr/lib/X11/locale/locale.alias. As always, CaSe matters. You should make sure that the locale you choose corresponds to one of the glibc locales "locale -a".
If you're doing advanced troubleshooting you may also be interested in the locale.dir file, in the same directory. It maps locale names to files. Make sure it maps your locale correctly (it usually does).
So to sum it up, the chain goes like this, and all of its links must be intact: LC_* -> locale.alias -> locale.dir -> [X locale definition file]
[edit] Fluxbox
BUG 1 Fluxbox doesn't fully support unicode yet. Some of its styles are selecting fonts that are not suitable for unicode. To fix this you will have to edit the Fluxbox's stylefile(s) in /usr/share/fluxbox/styles and add something like:
| File: /usr/share/fluxbox/styles/$YourStyle |
window.font: -*-*-*-*-*-*-*-*-*-*-*-*-*-u |
to at least fix the window title bug.
Solution by user Holms:
Another solution is to set locale in ~/.xinitrc For example I'm using Cyrillic most of a time. If you will write this in your ~/.xinitrc
| File: ~/.xinitrc |
export LANG="ru_RU.UTF-8" export LC_ALL="ru_RU.UTF-8" |
then all windows title will be in unicode and your locale will be Russian, set this to you country. Maybe it will be clever to put en_EN.UTF-8 instead of that, because all programs will start display everything in your language instead of english. UTF-8 shows to the system which encoding you'll be using by default so you want Unicode you get Unicode. By the way add same two line to the ~/.bashrc (at least some people prefer to do this, but didn't helped to me) and do not forget to configure your locales in /etc/locale.gen. If you haven't configured it yet, go to Gentoo handbook and read about locales. If this doesn't help try to read HOWTO_Xorg_and_Fonts. Do everything that written in "Emerging the necessary packages" section, at least that helped to me.
BUG 2 Fluxbox takes very long to load on a utf-8 locale http://bugs.gentoo.org/show_bug.cgi?id=71747
patch for fluxbox-0.9.11 here: http://www.fluxmod.org.ua/
(patch has been merged with mainline as of 0.9.14)
[edit] OpenOffice.org
To force OpenOffice.org to use UTF-8 (you'll have problems when entering unicode characters) you have to set the LANGUAGE variable to an appropriate value:
| File: /etc/env.d/02locale |
LANG="de_DE.UTF-8" ...a lot of LC-Variables... # For OpenOffice.org LANGUAGE="en_GB:en" |
Don't forget to run env-update && source /etc/profile after changing files in /etc/env.d/. Maybe you'll need to login again to apply the changes to your current environment.
