Getting Set up with Ogre 3D on Ubuntu
Using PHP pspell Spell Check Functions with a Custom Dictionary
ENUMs, User Preferences, and the MySQL SET Datatype
Nice n' Easy JQuery Image Rotator
Installing Xdebug for use with Eclipse or Netbeans on Linux
Symfony 2 Crash Course

Using Multi-Byte Character Sets in PHP (Unicode, UTF-8, etc)

Wednesday, 15 October 08, 9:15 am
The following list details the PHP string functions which could cause problems when handling multi-byte strings. The multi-byte safe alternative is given when available:

Try mb_send_mail() instead.

Try mb_strlen() instead.

Try mb_strpos() instead.

Try mb_strrpos() instead.

Try mb_substr() instead.

Try mb_strtolower() instead.

Try mb_strtoupper() instead.

Try mb_substr_count() instead.

Try mb_ereg() instead.

Try mb_eregi() instead.

Try mb_ereg_replace() instead.

Try mb_eregi_replace() instead.

To avoid having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8) e.g. '/(*UTF8)[[:alnum:]]/' will return true for 'é' where '/[[:alnum:]]/' will return false. Also the /u RegEx option provides UTF-8 awareness. The preg_* functions are contentious, because careful use can be safe. If you are unsure what to do, see mb_eregi() as a possible replacement.

Please investigate the /u option, as that provides UTF-8 awareness. The preg_* functions are contentious, because careful use can be safe. If you are unsure what to do, see mb_ereg_replace() as a possible replacement.

Try mb_split() instead.

Try mb_split() instead.

Try mb_stripos() instead.

Try mb_stristr() instead.

Try mb_strrchr() instead.

Try mb_strripos() instead.

Try mb_strstr() instead.

View comments for possible workarounds.

View comments for possible workarounds.

No known workarounds yet.

View the comment posted on "11-Feb-2008 04:31" for a possible workaround.

This function is flagged because its companion function (ucfirst) is not safe. However, this function is untested.

May be multi-byte safe if you use UTF-8 only (multi-byte UTF-8 characters contain no byte sequences that resemble white space). Avoid UTF-16 & UTF-32, among others.

It may be multi-byte safe if you use UTF-8 only (multi-byte UTF-8 characters contain no byte sequences that resemble less-than or greater-than symbols). Avoid UTF-16 & UTF-32, among others.

Try this code instead:
$str = mb_convert_case($str, MB_CASE_TITLE, "UTF-8");

Please enter your comment in the box below. Comments will be moderated before going live. Thanks for your feedback!

Cancel Post

/xkcd/ Atom