MySql, PHP and UTF8

2 min read >

MySql, PHP and UTF8

Engineering Insights & Web Platforms

Nowadays UTF8 is far from being just “trendy”, it’s the de facto standard for information representation. There are a lot of discussions on the “why locales are bad” theme, and I’m not going to argue with that.

Getting right to the point, how can you use UTF8 in your PHP/MySql web app? First things first: use PHP 5 and MySql 4.1 as minimum requirements. The reason is the same for both of these apps: native/decent UTF8 support.

For MySql, define your tables with UTF8 character set and collation:

CREATE TABLE table_name (

....................

) CHARACTER SET utf8 COLLATE utf8_general_ci;

For PHP, set the UTF8 encoding before extracting your information from MySql

if (!$link = mysql_connect(‘mysql_host’, ‘mysql_user’, ‘mysql_password’)) {

echo ‘Could not connect to mysql’;
exit;
}

if (!mysql_select_db(‘mysql_dbname’, $link)) {

echo ‘Could not select database’;
exit;
}
$sql = “SET NAMES ‘utf8′”;
mysql_query($sql, $link);

//select your UTF8 data
$sql = “SELECT foo FROM table_name”;
$result = mysql_query($sql, $link);if (!$result) {

echo “DB Error, could not query the database\n”;
echo ‘MySQL Error: ‘ . mysql_error();
exit;
}?>

If you need to process your UTF8 strings use the multi-byte versions of the string functions (they have the same name but an “mb_” prepended, e.g. mb_substr) as they work at the character level rather than the byte level.

For omitting the string encoding parameter that can be passed to the multi-byte string functions is useful to set the default encoding to UTF8 by issuing:

mb_internal_encoding(“UTF-8”);