Chinese Character Support

I just installed the multi-db plugin with a 4096 configuration. Now, chinese characters are displayed as question marks. I believe this is due to collation being set to utf8_general_ci rather than utf8_unicode_ci.

Is there an easy way to make this change globally across all the databases?

  • drmike
    • DEV MAN’s Mascot

    There's discussion here on the procedure for changing the database charsets from latin to utf in general. That would probably be a good start.

    http://yoonkit.blogspot.com/2006/03/mysql-charset-from-latin1-to-utf8.html

    Be sure to backup first before doing anything of course.

    And just to throw this in, one of my installs has a rather strict "English only" policy. While I disagree with it, that is an option that I should suggest. (More for spammers though to be honest.)

  • VentureMaker
    • Site Builder, Child of Zeus

    Hello everyone!

    Here are some details on the subject.
    WPMU 2.9.2 is installed and uses a huge (255k+ tables) database.
    MultiDB 2.9.2 has been used to move into 4096 databases (in fact, into 4096 + 1 global).

    After the move all 'foreign' (Arabic, Chinese, etc) characters became '?' characters.

    When looking into old DB:
    SHOW FULL COLUMNS FROM wp_xxx_posts;
    shows collation either utf8_general_ci for text/longtext/varchar fields, or NULL for other fields.

    Looking into new DB with exactly same query shows 100% same result!
    But when new DB (via MultiDB) is used, '?' characters appear in posts.

    Where's the catch or am I missing something obvious?

  • VentureMaker
    • Site Builder, Child of Zeus

    Update.

    Examining move-blogs.php closely.

    Line 112 is:

    mysql_select_db($this_blog_new_db, $db) or die("Houston, we have a problem!
    <b> Looks like you need to create your new db's! If you're lucky, this link still works - click me </b>
    Database Error: ".mysql_error());

    Shouldn't this come after the above line?:

    mysql_set_charset('utf8', $db);

    What do experts think? :slight_smile:

  • Barry
    • DEV MAN’s Mascot

    Apols if I'm repeating things you've already answered - using a small screen :slight_smile:

    Can you check the collation of your tables. Multi-db doesn't touch any of the core wp query functions, etc it merely parses the query and connects to the correct db to run the query before passing control back to wp. I note you're using v.2.9.2, can you try the latest version of the multi-db plugin (despite what it says in the docs, it should work fine with previous mu versions)

    Can you check in phpmyadmin and compare the table structure and contents between two of the examples. What is the collation for the table for those that have a null collation, and the default collation for the database?

    As much info as you give us will help.

  • VentureMaker
    • Site Builder, Child of Zeus

    Hey Barry
    Thanks for jumping in :slight_smile:

    First of all, move-blogs.php script is identical in 2.9.2 and 3.0 versions of MultiDB.

    Second thing. I've been reading this http://ua2.php.net/manual/en/function.mysql-set-charset.php and found several interesting things.
    Check http://ua2.php.net/manual/en/function.mysql-set-charset.php#86455
    Sounds similar to the above situation, no?

    @Barry.

    Can you check the collation of your tables.

    I have checked collation of tables in old DB and in new DBs.
    Let's say, the blog in Arabic language has ID xxxx. This is what I've done.

    Step 1.
    - use old_db
    - SHOW FULL COLUMNS FROM wp_xxxx_posts;
    This shows me collation utf8_general_ci for text, longtext and varchar fields, or NULL for other fields (int, bigint, datetime).

    Step 2.
    - calculate MD5 hash of xxxx to determine suffix in MultiDB and find the database in which all tables of blog xxxx now are
    - use new_db
    - SHOW FULL COLUMNS FROM wp_xxxx_posts;
    This shows me exact same results as on step 1. I can attach screenshots from my console :slight_smile:

    Multi-db doesn't touch any of the core wp query functions, etc it merely parses the query and connects to the correct db to run the query before passing control back to wp.

    I know, but have a look at http://ua2.php.net/manual/en/function.mysql-set-charset.php#86455 and another long read at http://ua2.php.net/manual/en/function.mysql-set-charset.php#85932

    I note you're using v.2.9.2, can you try the latest version of the multi-db plugin (despite what it says in the docs, it should work fine with previous mu versions)

    Yes I can, but move-blogs.php script is identical in both, and I suspect this is where the problem resides.

    Can you check in phpmyadmin and compare the table structure and contents between two of the examples.

    I checked this from console. The structure is the same. Can it (should it) be somehow different? I don't think is can/should :slight_smile:
    Contents:
    mysqldump -c -uaaa -pbbb old_db wp_xxxx_posts > old.sql
    mysqldump -c -uaaa -pbbb new_db wp_xxxx_posts > new.sql
    Then compared both files. They are identical except comments with DB name, creation date, etc.

    Looks like that's it. Ideas? :slight_smile:

  • VentureMaker
    • Site Builder, Child of Zeus

    By the way, has anyone launched move-blogs.php script from console (command line)?

    In this way it won't depend on user's browser/connection and won't fail if the uses loses connection or if the browser hangs.

    Should one just do something like 'php -q /blah-blah-blah/move-blogs.php table=copy' in console?
    Or? :slight_smile:

  • drmike
    • DEV MAN’s Mascot

    Have to admit that I;ve always wondering why wpmu didn't hardcode in utf8 or strongly suggest it in the docs. There's too many hosting setups out there (ie CPanel) that default to something that doesn't support foreign character sets that cause trouble down the road.

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.