utf-8 issue with Multi-DB

Hello, I have an issue with utf-8 and the swedish chars for the main blog.
If I define in wp-config.php: define('DB_CHARSET', 'utf8');
Chars get scrambled (please see screenshot with_utf8.jpg)
However, imported posts from another install running utf-8 and no Multi-DB works fine (imported through sql and not through wordpress export/import).

If I define in wp-config.php: define('DB_CHARSET', ' ');
It's the other way around (please see screenshot without_utf8.jpg)

The only difference from the other install is that it's not running Multi-DB.

Is there a bug in Multi-DB or am I doing something wrong?
I'm running Version: 3.0.3

Many thanks in advance
Tomas

  • Heydar

    Hi drmike, thanks for the response.
    The old install uses utf-8 but I can see the db was created with latin1.
    However, a newly created blog on the multi-db install will have issues if I don't set:

    define('DB_CHARSET', ' ');

    i.e. the chat plugin will not display swedish chars correctly.
    the emails that are sent out when a new blog is created with also have issues with swedish chars

    An wp import works fine though but the chars are scrambled looking directly in the DB tables.

    I'm lost :wink:

  • drmike

    The old install uses utf-8 but I can see the db was created with latin1.

    You're going to have to convert (This is what I;m thinking is happening with the other utf8 thread from yesterday as well) but before we get into that, I would like someone else to weigh in.

    May want to grab another backup of your databases if you don;t have a recent one.

    I really hate how CPanel defaults to latin. Why can;t wp default to utf8 and save everybody the trouble...

  • Heydar

    Ok I have verified it now.
    If I disable Multi-DB and still run multi-site chars is not an issue, and in wp-config.php I can have

    define('DB_CHARSET', 'utf8');

    New and old posts works

    Earlier I tried to run Multi-DB without multi-site and it also worked.
    There are som references in db.php regarding utf8, could that be the issue:

    if ( is_multisite() ) {
                            $this->charset = 'utf8';
                            if ( defined( 'DB_COLLATE' ) && DB_COLLATE )
                                    $this->collate = DB_COLLATE;
                            else
                                    $this->collate = 'utf8_general_ci';
                    } elseif ( defined( 'DB_COLLATE' ) ) {
                            $this->collate = DB_COLLATE;
                    }
    
                    if ( defined( 'DB_CHARSET' ) )
                            $this->charset = DB_CHARSET;

    I now run Multi-DB v3.0.5

  • Heydar

    Ok I have solved it for now by exporting some content in latin1 and using iconv to convert:

    mysqldump -u root -p --default-character-set=latin1 --skip-set-charset dbname wp_commentmeta
    wp_comments wp_postmeta wp_posts wp_terms > dbname_content.sql
    
    iconv -f latin1 -t utf8 < dbname_content.sql > dbname_content_latin1_utf8.sql
    mysql> use new_dbname
    mysql> drop tables wp_commentmeta, wp_comments, wp_postmeta, wp_posts, wp_terms
    mysql> source dbname_content_latin1_utf8.sql

    Just wonder if it will give me problems later on?
    Still can't figure out why I have to set define('DB_CHARSET', ' '); to have it work with Multi-DB/Multi-Site

  • emorling

    I also have this problem and it's causing huge problems. There is a discrepancy between the actual blog tables and the wp_site_posts table, which can be a clue to what is going on.

    For example. The post_title in the wp_1_posts table displays just fine with Swedish characters.

    However, the same post_title in the wp_site_posts table is displayed garbled.

    The tables indeed have the same COLLATION, namely utf8_general_ci, and looking at the data through PhpMyAdmin they look the same....

    This may be a clue to an EXPERT who can help to fix this:

    If I utf8_decode() the post_title from the wp_site_posts it displays fine.

    Somehow the data in the wp_site_posts table is double UTF8 encoded??

    I really need a fix ASAP. Please help

  • emorling

    Intro
    Ok, I have solved it. It took extensive research and testing, approximately 20 hours. I will post my solution here if it helps anyone. If it helps you please share some points so that I can get my free lifetime subscription. :wink:

    The problem

    My problem was a mixture of character sets in the database tables. The tables were set to collation utf8_general_ci but some columns still had latin1 characters. If you want to learn more about these problems you can read this in-depth article about MySql Character Set Hell

    I never located exactly where the mixing of characters occured, and how to supress them. The multi-db plugin should probably be more anal about setting character sets and collation to UTF8. Considering that the tables work without multi-db, there is surely something the plugin can do.

    Not the Solution

    I was tempted to empty the character set like this: define('DB_CHARSET', ' ');

    It actually does not fix the problem. New problems came up and the data in the database is still a mess. So I suggest, against it.

    The Solution

    STEP1. ENSURE UTF8

    Find your MySQL configuration file (on most Linux/BSD systems it’s /etc/my.cnf) and make sure it’s got the following statements under the relevant headers. As described here

    [mysqld]
    default-character-set=utf8
    default-collation=utf8_general_ci
    character-set-server=utf8
    collation-server=utf8_general_ci
    init-connect='SET NAMES utf8'
    
    [client]
    default-character-set=utf8

    Restart MySQL :
    service mysql restart

    STEP2. CONVERT DATATABLES

    If you follow Step 1 then UTF8 is enforced in a way in which it was not being done before by the Multi-plugin. Probably because the Multi-plugin is relying on defaults, which in some cases was latin1 until the prior step was taken.

    The next step is to do something about the messy data already stored in the database, and enforce UTF8 in all datatables.

    I put together a Template. Simply create a page with this template. Set the visibility to Private. Then visit the page and it will loop through all your multi-databases and fix the encoding. After that you can delete the template and all the dumps which you will find residing in your web root.

    <?php
    /*
     * Template Name: Fix Database Characters
    */
    
    // Note: You must have access to the system() command, this does not work in PHP safe mode
    // Note: You may have to set another User and Password that has the required permissions to perform mysqldump
    $dbuser = DB_USER;
    $dbpass = DB_PASSWORD;
    
    // loop through all multi-databases
    global $db_servers;
    foreach($db_servers as $db_server){
    	$dbname  =$db_server[0]['name'];
    	_convert_db($dbuser, $dbpass, $dbname);
    }
    
    function _convert_db($dbuser, $dbpass, $dbname){
    	$command = "mysqldump -h ".DB_HOST." -u $dbuser -p$dbpass --opt  --quote-names  --skip-set-charset --default-character-set=latin1 $dbname > $dbname-dump.sql";
    	system($command);
    
    	$command = "mysql -h ".DB_HOST." -u $dbuser -p$dbpass --default-character-set=utf8 $dbname < $dbname-dump.sql";
    	system($command);
    }
    ?>

    The alternative is to manually convert all databases from the prompt, but that is a lot of work. If this does not work there is more reading material here:
    - Wordpress: Converting Database Character Sets
    - Fixing a MySQL Character Encoding Mismatch, I tried this manual approach and it works but is cumbersome

    Good luck!

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.