Enable UTF-8

From JinzoraWiki

Jump to: navigation, search

Contents

Introduction

I've written this how-to during the installation of my Jinzora-Jukebox. Noting that Jinzora can't handle german umlaut, french accents and other non-english characters I've started to tweak around to get the UTF-8 support working. In addition I setup all my system in UTF-8 mode for some years to be ready for the future. This how-to refers to Jinzora 2.7 nightly from 25/01/2007 running on a Gentoo-Linux with MySQL-Backend and MPD but it should be portable to other linux distributions.

Prerequisite

I suggest to install Jinzora on a OS with full UTF-8 support, which means console coding and font and especially filesystem coding. In addition, MySQL and MPD must be setup in UTF-8 mode.

Bad news for Windows Users: Windows filesystems (FAT, FAT32, NTFS) are not UTF-8 aware. Filenames are encoded in ISO-8859-1 or CP437/CP850/( or whatever suits your language) depending on filesystem and localisation. So you will probably get in trouble trying to match filenames from database to filesystem if you follow this guide. (Although I've never tried it.)

Setting up Linux

Setting up Linux to UTF-8 is very specific to the actual flavour you are using. The good news is, there are many distributions which are already setup to UTF-8 by default, so if you don't know how to do it, use one of these. Here just some things to check:

  • The default codepage for the filesystem in the kernel is set to UTF-8. If you copy files via ssh or ftp and get scrambled names (mostly copying from windows) due to non-converted filenames, convmv will help.
  • If you have scrambled filenames in your shell and you are sure your filesystem is set to UTF-8, you probably use a console font and / or encoding without UTF-8 support.
  • If you have problems entering accent at the console, you a keymap without nodeadkeys.

Setting up MPD

Setting up MPD is rather straight forward. All you have to do: in /etc/mpd.conf set

filesystem_charset "UTF-8"

Attention: id3v1_encoding "ISO-8859-1" should not be changed because ID3v1 don't allow other encodings.

Setting up MySQL

Setting up MySQL to UTF-8 is done in most cases by the distributor, but you should check for the following lines in /etc/mysql/my.cnf :

character-set-server=utf8
default-character-set=utf8
init-connect='SET NAMES utf8'

Changing Jinzora sources

Some work to do here. Setting Webpages to UTF-8, patching display of ID3-Tags and changing the communication between Jinzora and MPD to get the UTF-8 support.

Webpages in UTF-8

To support UTF-8 on webpages we have to change a line in ./frontend/display.php and ./templates/slick/header-pre.tpl from

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

to

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

This will announce the delivered page as UTF-8 encoded, so the webbrowser can use the right encoding for display.

Utf-8 aware shortening

This is needed to avoid the nasty effect of items like artist names shown trimmed very short and ending with an unrecognised character plus "...". This is due to php's substr is not utf-8 aware and treats every character as a single byte one, a fact that is not true in not latin utf-8 represented strings. Change ./lib/general.lib.php somewhere at line 1729 as follows:

Find the function returnItemShortName (.... } block and paste the following *before* the block

	/* Make returnItemShortName UTF-8 Aware
	 * cearting a new substr_utf8 function
	 * mod by basOS func by lmak @php.net
        */

	function substr_utf8($str,$from,$len){
	# utf8 substr
	# www.yeap.lv
	  return preg_replace('#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){0,'.$from.'}'.
	                       '((?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){0,'.$len.'}).*#s',
	                       '$1',$str);
	}
	

Then *change* the line saying

return substr($item,0,$length). "..."; 

inside the returnItemShortName function block with these

//MODA
//Make Utf-8 aware
return substr_utf8($item,0,$length). "...";			

basOS

Displaying ID3-Tags right

Now we have to correct the display of ID3-Tags on a UTF-8 webpage. To do so, we have to change the encoding ./services/services/tagdata/getid3/getid3.php :

var $encoding = 'UTF-8'

Please note that you don't change var $encoding_id3v1 = 'ISO-8859-1'.

Making the MPD-Modul UTF-8 aware

After doing this, I've experienced the problem, that I can't play files with UTF-8 characters. So I need to patch ./jukebox/jukeboxes/mpd.php to get UTF-8 support.
In function PLAdd changing

$filename 

to

utf8_decode($filename)

and in function PLAddBulk changing

$trackArray[$i] 

to

utf8_decode($trackArray[$i])

should do the trick.

Quicklink in UTF-8

In ./frontend/blocks.php remove from function drawBreadcrumbs the call of htmlentities(). The page can now handle UTF-8 characters so we don't need the HTML-Entities anymore.

Last Thoughts on ID3-Tags

After doing this, you should see all titles on the webpage without scrambled characters and be able to play all these. If you do not, and have followed all instructions correctly, you probably have some issues with the ID3-Tag encoding. Be sure to use ISO-8859-1 for ID3v1-Tags only. For retagging I suggest using EasyTAG.

Known Issues

  • Displaying lyrics is still showing scrambled characters, probably because the webservice used to fetch the lyrics also have some encoding issues.


--Quasimodo 07:41, 3 September 2007 (CEST)

Personal tools