Global Site Tags with multi-byte text support?

Thanks very much for creating such a brilliant plugin, or I would rather say idea. Because on my newly created site http://zhongwenblog.com, I installed this plugin but found it's not compatible with Chinese Tags. I checked through the source code and found it is actually using multi-byte compatible functions, any ideas why it's not working?

For example:

All of the tags are correctly displayed at the bottom right corner.

No posts showed there with Chinese tags:
http://zhongwenblog.com/tags/%e4%bb%8e%e5%89%8d/

But English tags are working correctly:
http://zhongwenblog.com/tags/lulus/

  • drmike
    • DEV MAN’s Mascot

    Check to see what charset the created database tables are set to. Have a feeling that whatever charset was used, it wasn't utf8.

    No offense to Andrew or any other wordpress developer but it's a failing of them for not using the charset that's defined in the wp-config.php file. Many of them overlook it.

    Best bet, if you can get away with it, if the databases aren't set to utf8 is to dump the contents of those tables and reset them to utf8. If not, you'll have to convert them.

    And it;s a pain:

    http://www.haidongji.com/2008/11/11/convert-character-set-to-utf8-in-mysql/

  • allansun
    • New Recruit

    Thanks very much drmike, I just checked the databases and they seem to be fine.

    However your post reminded me there could be something wrong with the URL passed to PHP. I did find the reason and got it fixed.

    By default WordPress encodes the tags (or terms) by using urlencode function. in global-site-tags.php, on line 174

    $tag = urldecode( $tag );

    The tag passed by URL is firstly decoded! And then on the later codes (line 342 and 343):


    $tag_name = $wpdb->get_var("SELECT cat_name FROM " . $wpdb->base_prefix . "sitecategories WHERE category_nicename = '" . $global_site_tags['tag'] . "'");
    $tag_id = $wpdb->get_var("SELECT cat_ID FROM " . $wpdb->base_prefix . "sitecategories WHERE category_nicename = '" . $global_site_tags['tag'] . "'");

    Andrew is trying to use the decoded variable to search against category_nicename field which is an encoded one. That's why it's not working!

    I quickly modified the code by adding urlencode function:

    $tag_name = $wpdb->get_var("SELECT cat_name FROM " . $wpdb->base_prefix . "sitecategories WHERE category_nicename = '" . urlencode($global_site_tags['tag']) . "'");
    $tag_id = $wpdb->get_var("SELECT cat_ID FROM " . $wpdb->base_prefix . "sitecategories WHERE category_nicename = '" . urlencode($global_site_tags['tag']) . "'");

    And hooray it works now!

    Also I noticed on line 405 it using substr which is not multi-byte safe, so the sub string function isn't working nicely with Chinese posts which most of my sites' posts are written in. So I just changed it to:


    $content .= mb_substr(strip_tags($post['post_content'],''),0, 250) . ' (' . __('More') . ')';

    Now I'm happy!!

    Even though the mb_substr is a trivial thing I think most people here can ignore that, but I do consider the urlencode issue is a bug. I hope Andrew and have a look at this post and fix it in the next version.

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.