33 pointsStarting to get into this DEV thingI'm helpful
Tomas
Member
—
30th January 2011 (1 year ago)
Hello, I have an issue with utf-8 and the swedish chars for the main blog.
If I define in wp-config.php: define('DB_CHARSET', 'utf8');
Chars get scrambled (please see screenshot with_utf8.jpg)
However, imported posts from another install running utf-8 and no Multi-DB works fine (imported through sql and not through wordpress export/import).
If I define in wp-config.php: define('DB_CHARSET', ' ');
It's the other way around (please see screenshot without_utf8.jpg)
The only difference from the other install is that it's not running Multi-DB.
Is there a bug in Multi-DB or am I doing something wrong?
I'm running Version: 3.0.3
Hello, I have an issue with utf-8 and the swedish chars for the main blog.
If I define in wp-config.php: define('DB_CHARSET', 'utf8');
Chars get scrambled (please see screenshot with_utf8.jpg)
However, imported posts from another install running utf-8 and no Multi-DB works fine (imported through sql and not through wordpress export/import).
If I define in wp-config.php: define('DB_CHARSET', ' ');
It's the other way around (please see screenshot without_utf8.jpg)
The only difference from the other install is that it's not running Multi-DB.
Is there a bug in Multi-DB or am I doing something wrong?
I'm running Version: 3.0.3
6593 pointsLike some sort of WPMU DEV GodExceptionally helpfulLifetime member
Keeper of the Dark Chocolate
—
30th January 2011 (1 year ago)
#
Is this for a previous install that has already existed? I;m wondering if the databases are setup for latin8 or another charset and they need to be converted. Or did those databases start with utf8? I know a basic CPanel install assumes you want latin8 and defaults to that.
edit: Had this tab open for an hour now. Finally getting down this far. We've been busy lately.
Hi drmike, thanks for the response.
The old install uses utf-8 but I can see the db was created with latin1.
However, a newly created blog on the multi-db install will have issues if I don't set:
define('DB_CHARSET', ' ');
i.e. the chat plugin will not display swedish chars correctly.
the emails that are sent out when a new blog is created with also have issues with swedish chars
An wp import works fine though but the chars are scrambled looking directly in the DB tables.
Update, I think it has todo with multi-site (along with Multi-DB).
I did a new install and activated only Multi-DB (no use of course but just for test). Chars are working as expected with utf-8 set in wp-config.php
I then activated multi-site and new posts got swedish chars screwed up, also admin messages when new registrations etc...
6593 pointsLike some sort of WPMU DEV GodExceptionally helpfulLifetime member
Keeper of the Dark Chocolate
—
31st January 2011 (1 year ago)
#
The old install uses utf-8 but I can see the db was created with latin1.
You're going to have to convert (This is what I;m thinking is happening with the other utf8 thread from yesterday as well) but before we get into that, I would like someone else to weigh in.
May want to grab another backup of your databases if you don;t have a recent one.
I really hate how CPanel defaults to latin. Why can;t wp default to utf8 and save everybody the trouble...
Hi, I have all set to UTF-8 but if I set latin1 in wp-config it's the same problem.
But again, it worked before switching to multi-site. Strange huh?
Multi-DB w/o multi-site = ok
Multi-DB with multi-site = not ok
If I comment out the define in wp-config.php it will by default be set to utf8.
Still, if I don't worry about any old db's I have to set define('DB_CHARSET', ' ');
Otherwise it will not work with swedish chars on new posts when using multi-db/multi-site
Just wonder if it will give me problems later on?
Still can't figure out why I have to set define('DB_CHARSET', ' '); to have it work with Multi-DB/Multi-Site
utf8 encoding problem using multi db with multi-site in French
I have tried to modify the define('DB_CHARSET,' '); and I can create new content in the admin without the problem... but I can't forgot the actual content.
When I desactivate multi_DB and go back to a unique database, it works well.
I also have this problem and it's causing huge problems. There is a discrepancy between the actual blog tables and the wp_site_posts table, which can be a clue to what is going on.
For example. The post_title in the wp_1_posts table displays just fine with Swedish characters.
However, the same post_title in the wp_site_posts table is displayed garbled.
The tables indeed have the same COLLATION, namely utf8_general_ci, and looking at the data through PhpMyAdmin they look the same....
This may be a clue to an EXPERT who can help to fix this:
If I utf8_decode() the post_title from the wp_site_posts it displays fine.
Somehow the data in the wp_site_posts table is double UTF8 encoded??
Intro
Ok, I have solved it. It took extensive research and testing, approximately 20 hours. I will post my solution here if it helps anyone. If it helps you please share some points so that I can get my free lifetime subscription. ;)
The problem
My problem was a mixture of character sets in the database tables. The tables were set to collation utf8_general_ci but some columns still had latin1 characters. If you want to learn more about these problems you can read this in-depth article about MySql Character Set Hell
I never located exactly where the mixing of characters occured, and how to supress them. The multi-db plugin should probably be more anal about setting character sets and collation to UTF8. Considering that the tables work without multi-db, there is surely something the plugin can do.
Not the Solution
I was tempted to empty the character set like this: define('DB_CHARSET', ' ');
It actually does not fix the problem. New problems came up and the data in the database is still a mess. So I suggest, against it.
The Solution
STEP1. ENSURE UTF8
Find your MySQL configuration file (on most Linux/BSD systems it’s /etc/my.cnf) and make sure it’s got the following statements under the relevant headers. As described here
If you follow Step 1 then UTF8 is enforced in a way in which it was not being done before by the Multi-plugin. Probably because the Multi-plugin is relying on defaults, which in some cases was latin1 until the prior step was taken.
The next step is to do something about the messy data already stored in the database, and enforce UTF8 in all datatables.
I put together a Template. Simply create a page with this template. Set the visibility to Private. Then visit the page and it will loop through all your multi-databases and fix the encoding. After that you can delete the template and all the dumps which you will find residing in your web root.
<?php
/*
* Template Name: Fix Database Characters
*/
// Note: You must have access to the system() command, this does not work in PHP safe mode
// Note: You may have to set another User and Password that has the required permissions to perform mysqldump
$dbuser = DB_USER;
$dbpass = DB_PASSWORD;
// loop through all multi-databases
global $db_servers;
foreach($db_servers as $db_server){
$dbname =$db_server[0]['name'];
_convert_db($dbuser, $dbpass, $dbname);
}
function _convert_db($dbuser, $dbpass, $dbname){
$command = "mysqldump -h ".DB_HOST." -u $dbuser -p$dbpass --opt --quote-names --skip-set-charset --default-character-set=latin1 $dbname > $dbname-dump.sql";
system($command);
$command = "mysql -h ".DB_HOST." -u $dbuser -p$dbpass --default-character-set=utf8 $dbname < $dbname-dump.sql";
system($command);
}
?>
Responses (17)
Keeper of the Dark Chocolate — 30th January 2011 (1 year ago) #
Is this for a previous install that has already existed? I;m wondering if the databases are setup for latin8 or another charset and they need to be converted. Or did those databases start with utf8? I know a basic CPanel install assumes you want latin8 and defaults to that.
edit: Had this tab open for an hour now. Finally getting down this far. We've been busy lately.
Member — 31st January 2011 (1 year ago) #
Hi drmike, thanks for the response.
The old install uses utf-8 but I can see the db was created with latin1.
However, a newly created blog on the multi-db install will have issues if I don't set:
define('DB_CHARSET', ' ');
i.e. the chat plugin will not display swedish chars correctly.
the emails that are sent out when a new blog is created with also have issues with swedish chars
An wp import works fine though but the chars are scrambled looking directly in the DB tables.
I'm lost ;)
Member — 31st January 2011 (1 year ago) #
Update, I think it has todo with multi-site (along with Multi-DB).
I did a new install and activated only Multi-DB (no use of course but just for test). Chars are working as expected with utf-8 set in wp-config.php
I then activated multi-site and new posts got swedish chars screwed up, also admin messages when new registrations etc...
Developer — 31st January 2011 (1 year ago) #
What encoding do you have set for your existing databases and tables? What happens if you switch the encoding in WP to match the database encoding?
Keeper of the Dark Chocolate — 31st January 2011 (1 year ago) #
You're going to have to convert (This is what I;m thinking is happening with the other utf8 thread from yesterday as well) but before we get into that, I would like someone else to weigh in.
May want to grab another backup of your databases if you don;t have a recent one.
I really hate how CPanel defaults to latin. Why can;t wp default to utf8 and save everybody the trouble...
Member — 31st January 2011 (1 year ago) #
Hi, I have all set to UTF-8 but if I set latin1 in wp-config it's the same problem.
But again, it worked before switching to multi-site. Strange huh?
Multi-DB w/o multi-site = ok
Multi-DB with multi-site = not ok
Member — 1st February 2011 (1 year ago) #
Ran through an utf-8 convert of a mysql dump of the old db but result was the same.
What comes in play with multi-site and Multi-DB?
Member — 1st February 2011 (1 year ago) #
Ok I have verified it now.
If I disable Multi-DB and still run multi-site chars is not an issue, and in wp-config.php I can have
define('DB_CHARSET', 'utf8');
New and old posts works
Earlier I tried to run Multi-DB without multi-site and it also worked.
There are som references in db.php regarding utf8, could that be the issue:
I now run Multi-DB v3.0.5
Member — 1st February 2011 (1 year ago) #
Anyone to help out? Can I send you my Multi-DB config to check if something is wrong?
Need to get going with the project :)
Thanks
Lead Developer — 1st February 2011 (1 year ago) #
I don't know what's happening, something to do with the transfer of old to new dbs.
What happens if you comment out the define completely?
Member — 1st February 2011 (1 year ago) #
If I comment out the define in wp-config.php it will by default be set to utf8.
Still, if I don't worry about any old db's I have to set
define('DB_CHARSET', ' ');Otherwise it will not work with swedish chars on new posts when using multi-db/multi-site
Member — 2nd February 2011 (1 year ago) #
Sorry to bump but I really need to resolve this. Anyone with input?
Thank you in advance
Member — 3rd February 2011 (1 year ago) #
Ok I have solved it for now by exporting some content in latin1 and using iconv to convert:
Just wonder if it will give me problems later on?
Still can't figure out why I have to set define('DB_CHARSET', ' '); to have it work with Multi-DB/Multi-Site
Member — 14th February 2011 (1 year ago) #
It seems I have exactly the same problem :
utf8 encoding problem using multi db with multi-site in French
I have tried to modify the define('DB_CHARSET,' '); and I can create new content in the admin without the problem... but I can't forgot the actual content.
When I desactivate multi_DB and go back to a unique database, it works well.
Please help
Wordpress 3.0.5 french pack
Multi-DB 3.0.5
Member — 27th April 2011 (1 year ago) #
Has anyone found a fix for this one?
Member — 28th April 2011 (1 year ago) #
I also have this problem and it's causing huge problems. There is a discrepancy between the actual blog tables and the wp_site_posts table, which can be a clue to what is going on.
For example. The post_title in the wp_1_posts table displays just fine with Swedish characters.
However, the same post_title in the wp_site_posts table is displayed garbled.
The tables indeed have the same COLLATION, namely utf8_general_ci, and looking at the data through PhpMyAdmin they look the same....
This may be a clue to an EXPERT who can help to fix this:
If I utf8_decode() the post_title from the wp_site_posts it displays fine.
Somehow the data in the wp_site_posts table is double UTF8 encoded??
I really need a fix ASAP. Please help
Member — 29th April 2011 (1 year ago) #
Intro
Ok, I have solved it. It took extensive research and testing, approximately 20 hours. I will post my solution here if it helps anyone. If it helps you please share some points so that I can get my free lifetime subscription. ;)
The problem
My problem was a mixture of character sets in the database tables. The tables were set to collation utf8_general_ci but some columns still had latin1 characters. If you want to learn more about these problems you can read this in-depth article about MySql Character Set Hell
I never located exactly where the mixing of characters occured, and how to supress them. The multi-db plugin should probably be more anal about setting character sets and collation to UTF8. Considering that the tables work without multi-db, there is surely something the plugin can do.
Not the Solution
I was tempted to empty the character set like this: define('DB_CHARSET', ' ');
It actually does not fix the problem. New problems came up and the data in the database is still a mess. So I suggest, against it.
The Solution
STEP1. ENSURE UTF8
Find your MySQL configuration file (on most Linux/BSD systems it’s /etc/my.cnf) and make sure it’s got the following statements under the relevant headers. As described here
Restart MySQL :
service mysql restart
STEP2. CONVERT DATATABLES
If you follow Step 1 then UTF8 is enforced in a way in which it was not being done before by the Multi-plugin. Probably because the Multi-plugin is relying on defaults, which in some cases was latin1 until the prior step was taken.
The next step is to do something about the messy data already stored in the database, and enforce UTF8 in all datatables.
I put together a Template. Simply create a page with this template. Set the visibility to Private. Then visit the page and it will loop through all your multi-databases and fix the encoding. After that you can delete the template and all the dumps which you will find residing in your web root.
The alternative is to manually convert all databases from the prompt, but that is a lot of work. If this does not work there is more reading material here:
- Wordpress: Converting Database Character Sets
- Fixing a MySQL Character Encoding Mismatch, I tried this manual approach and it works but is cumbersome
Good luck!
Become a member