Enable UTF-8
From JinzoraWiki
Contents |
Introduction
I've written this how-to during the installation of my Jinzora-Jukebox. Noting that Jinzora can't handle german umlaut, french accents and other non-english characters I've started to tweak around to get the UTF-8 support working. In addition I setup all my system in UTF-8 mode for some years to be ready for the future. This how-to refers to Jinzora 2.7 nightly from 25/01/2007 running on a Gentoo-Linux with MySQL-Backend and MPD but it should be portable to other linux distributions.
NOTE: For JInzora 2.8 most of this Article is obsolete. All changes to Jinzora described in this article are included in version 2.8, so your only have to check the correct UTF-8 setup for your OS.
NOTE (for .m4a): I have installed the latest stable version of Jinzora 2 (2.8), and the problem persisted with international characters. Most notably with .m4a files. The MySQL package from ubuntu karmic is configured for latin1, so I changed it as explained below. I have also replaced the 'jinzora2/services/services/tagdata/getid3' directory with the latest getid3 (1.7.9) from sourceforge.net, and configured it as explained below. After those changes, international characters show correctly. I hope this is useful for somebody else while we wait for the next release.
Prerequisite
I suggest to install Jinzora on a OS with full UTF-8 support, which means console coding and font and especially filesystem coding. In addition, MySQL and MPD must be setup in UTF-8 mode.
Bad news for Windows Users: Windows filesystems (FAT, FAT32, NTFS) are not UTF-8 aware. Filenames are encoded in ISO-8859-1 or CP437/CP850/( or whatever suits your language) depending on filesystem and localisation. So you will probably get in trouble trying to match filenames from database to filesystem if you follow this guide. (Although I've never tried it.)
Windows workaround: Modern windows versions support UTF-8 on mounted shares. Make a share of your mediafolder and mount it as a network drive.
Setting up Linux
Setting up Linux to UTF-8 is very specific to the actual flavour you are using. The good news is, there are many distributions which are already setup to UTF-8 by default, so if you don't know how to do it, use one of these. Here just some things to check:
- The default codepage for the filesystem in the kernel is set to UTF-8. If you copy files via ssh or ftp and get scrambled names (mostly copying from windows) due to non-converted filenames, convmv will help.
- If you have scrambled filenames in your shell and you are sure your filesystem is set to UTF-8, you probably use a console font and / or encoding without UTF-8 support.
- If you have problems entering accent at the console, you a keymap without nodeadkeys.
Setting up MPD
Setting up MPD is rather straight forward. All you have to do: in /etc/mpd.conf set
filesystem_charset "UTF-8"
Attention: id3v1_encoding "ISO-8859-1" should not be changed because ID3v1 don't allow other encodings.
Setting up MySQL
Setting up MySQL to UTF-8 is done in most cases by the distributor, but you should check for the following lines in /etc/mysql/my.cnf :
character-set-server=utf8 skip-character-set-client-handshake default-character-set=utf8 init-connect='SET NAMES utf8'
Changing Jinzora sources
Some work to do here. Setting Webpages to UTF-8, patching display of ID3-Tags and changing the communication between Jinzora and MPD to get the UTF-8 support.
Webpages in UTF-8
To support UTF-8 on webpages we have to change a line in ./frontend/display.php and ./templates/slick/header-pre.tpl from
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
to
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This will announce the delivered page as UTF-8 encoded, so the webbrowser can use the right encoding for display.
Utf-8 aware shortening
This is needed to avoid the nasty effect of items like artist names shown trimmed very short and ending with an unrecognised character plus "...". This is due to php's substr is not utf-8 aware and treats every character as a single byte one, a fact that is not true in not latin utf-8 represented strings. Change ./lib/general.lib.php somewhere at line 1729 as follows:
Find the function returnItemShortName (.... } block and paste the following *before* the block
/* Make returnItemShortName UTF-8 Aware
* cearting a new substr_utf8 function
* mod by basOS func by lmak @php.net
*/
function substr_utf8($str,$from,$len){
# utf8 substr
# www.yeap.lv
return preg_replace('#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){0,'.$from.'}'.
'((?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){0,'.$len.'}).*#s',
'$1',$str);
}
Then *change* the line saying
return substr($item,0,$length). "...";
inside the returnItemShortName function block with these
//MODA //Make Utf-8 aware return substr_utf8($item,0,$length). "...";
basOS
Displaying ID3-Tags right
Now we have to correct the display of ID3-Tags on a UTF-8 webpage. To do so, we have to change the encoding ./services/services/tagdata/getid3/getid3.php :
var $encoding = 'UTF-8'
Please note that you don't change var $encoding_id3v1 = 'ISO-8859-1'.
Alternative getid3 version (1.7.9). Change the encoding in this function (insert the bold lines):
function SERVICE_GET_TAGDATA_getid3($fname, $installer = false) {
...
$getID3 = new getID3;
if (stristr($fname, "m4a") == false)
$getID3->encoding = 'UTF-8';
...
Making the MPD-Modul UTF-8 aware
After doing this, I've experienced the problem, that I can't play files with UTF-8 characters. So I need to patch ./jukebox/jukeboxes/mpd.php to get UTF-8 support.
In function PLAdd changing
$filename
to
utf8_decode($filename)
and in function PLAddBulk changing
$trackArray[$i]
to
utf8_decode($trackArray[$i])
should do the trick.
Quicklink in UTF-8
In ./frontend/blocks.php remove from function drawBreadcrumbs the call of htmlentities(). The page can now handle UTF-8 characters so we don't need the HTML-Entities anymore.
Last Thoughts on ID3-Tags
After doing this, you should see all titles on the webpage without scrambled characters and be able to play all these. If you do not, and have followed all instructions correctly, you probably have some issues with the ID3-Tag encoding. Be sure to use ISO-8859-1 for ID3v1-Tags only. For retagging I suggest using EasyTAG.
Known Issues
- Displaying lyrics is still showing scrambled characters, probably because the webservice used to fetch the lyrics also have some encoding issues.
--Quasimodo 07:41, 3 September 2007 (CEST)
