Funky characters in imported posts: ?

I have a lot of posts / blogs that now show ? instead of what I believe for the most part should just be ' apostrophes. Is there a way to see which table the individual blogs are in so I can see what these characters are supposed to be. Or have you come across this before and possibly have a script I could run to search and replace in all posts network wide?

  • Barry
    • DEV MAN’s Mascot

    Do you have access to phpmyadmin or some other software to look at the records in your database (not via wp)? If so, can you look at the posts table in your database and see if the invalid characters are also showing in that application. If they are then it's a problem with your data it seems, if not then it would look like a mismatch between the encoding you have wp set for tpand that set for your database and tables.

  • xlogz
    • Flash Drive

    I think I'm going to have to setup a query to check / fix the posts, orrrr, it could be quick cache causing the issue and I may have to rip apart that code a little.
    Is there somewhere I can locate which database my blogs are in? Some kind of map just so I can look up individual posts / blogs for these kind of problems I mean.
    Thanks for the help

  • Mason
    • DEV MAN’s Sidekick

    Hiya xlogz,

    We'll take another look here. I honestly don't know the ins and outs of the multi-db plugin - but can you tell us how many databases you have this configured for?

    We will get back to you shortly on this one. Also, did you happen to check for any discrepancy between the encoding in the database and as setup by WordPress? This is the most common reason for this type of error.

    Thanks for your patience.

  • xlogz
    • Flash Drive

    Hey Mason,
    I've got got it setup with 256 databases, I used the UTF-8 encoding type when creating the databases, figured just to keep it consistent with the install instructions. Here's a show I did just to be sure:
    CREATE DATABASE database_wpmu_ff /*!40100 DEFAULT CHARACTER SET utf8 */

    I checked inside my wp-config and I'm showing:
    define('DB_CHARSET', 'utf8');

    Is there somewhere else I should set / look for that at?

  • Barry
    • DEV MAN’s Mascot

    Guys, come on, their must be a way to find which blog is in which database.

    You need to md5 the blog id and then depending on the number of databases you have you take the first 1, 2 or 3 characters of the md5 and that is the database the blog is in.

  • Barry
    • DEV MAN’s Mascot

    Is there somewhere else I should set / look for that at?

    Ideally the encoding on the database / table and WP should be the same, but it's also important to check the content of the database from outside of WP to make sure it isn't actually storing the bad characters in the database.

  • Shawn
    • The Crimson Coder

    @barry, I'm experiencing the same problem. phpMyAdmin shows the data and fields as all UTF8 with no funky characters. WP with MultiDB 3.0.5 shows funky characters. Specifically with consecutive spaces, left and right quotes and apostrophes, various dashes and content copied from Word. Much of the content exhibiting problems is NOT copy & pasted, though, so it affects content typed directly into the WP editor as well.

    @xlogz, here's a script I use to obtain blog info based on it's ID (get the ID from the blogs list on the network dashboard). Change the settings, obviously, then run it on a site that has the same db rights as your WP site. A good place for this is generally 'wp-content/scripts/'

    /** safety dance */
    	@apache_setenv('no-gzip', 1);
    	@ini_set('zlib.output_compression', 0);
    	@ini_set('output_buffering', 'off');
    	@ini_set('implicit_flush', 1);
    /** settings for WP */
    	$DB_NAME     = 'EDIT_ME!';
    	$DB_USER     = 'EDIT_ME!';
    	$DB_HOST     = 'localhost';
    /** settings for MultiDB */
    	$tablepre = 'wp_';			//The table prefix
    	$dbpre    = 'EDIT_ME!_';	//The "multi" db prefix
    	$hashsize = '3';
    /** Begin */
    	$wpdb = mysql_connect($DB_HOST, $DB_USER, $DB_PASSWORD) or die("Invalid credentials for WP.");
    	mysql_select_db($DB_NAME, $wpdb);
    	if(isset($_GET['q'])) {
    		$q = $_GET['q'];
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
    <html xmlns="">
    <title>Multiple Databases Info Tool</title>
    <style type="text/css">
    table.stats {align:center; text-align: center; font-family: Verdana, Geneva, Arial, Helvetica, sans-serif ; font-weight: normal;font-size: 12px;color: #fff;width: 750px;background-color: #666;border: 1px solid #555;border-collapse: collapse;border-spacing: 1px;}
    table.stats td {background-color: #CCC;color: #000;padding: 6px;text-align: left;border: 1px #fff solid;}
    table.stats td.head {background-color: #666;color: #fff;padding: 6px;text-align: center;border-bottom: 2px #fff solid;font-size: 12px;font-weight: bold;}
    //Check to see if we are moving tables yet
    if (is_numeric($q)){
    	$blogs = mysql_query("SELECT <code>blog_id</code>,<code>domain</code>,<code>path</code> FROM <code>&quot; . $tablepre . &quot;blogs</code> WHERE blog_id=" . $q .  " ORDER BY <code>blog_id</code> ASC;", $wpdb);
    	$blogs = mysql_query("SELECT <code>blog_id</code>,<code>domain</code>,<code>path</code> FROM <code>&quot; . $tablepre . &quot;blogs</code> WHERE path='/" . addslashes($q) .  "/' ORDER BY <code>blog_id</code> ASC;", $wpdb);
    if (!$blogs) {
    	while ($row = mysql_fetch_row($blogs)) {
    		$blogid = $row[0];
    		$domain = $row[1];
    		$path   = $row[2];
    $h = $blogid;
    $hash = substr(md5($h), 0, $hashsize);
    echo('<form method="get" align="center" style="margin:0 auto; width:300px;">');
    echo('<p>Parse:&nbsp;<input type="text" name="q" value="' . htmlentities($q) . '" />&nbsp;<input type="submit" value="Get it" /></p>');
    echo "<table class='stats' align='center'>";
    	echo "<tr><td class='head' width='25%'>tag</td><td class='head' width='25%'>value</td></tr>";
    	echo "<tr><td>URL</td><td><a href='http://$domain$path' target='_blank'>" . $domain . $path . "</a></td></tr>";
    	echo "<tr><td>Blog ID</td><td>" . $blogid . "</td></tr>";
    	echo "<tr><td>Hash</td><td>" . $hash . "</td></tr>";
    	echo "<tr><td>MultiDB</td><td>" . $dbpre . $hash . "." . $tablepre . $blogid . "_*</td></tr>";
    digthis($wpdb, "SELECT count(*) FROM " . $DB_NAME . "." . $tablepre . $blogid . "_posts;", "SingleDB posts", 1);
    digthis($wpdb, "SELECT count(*) FROM " . $dbpre . $hash . "." . $tablepre . $blogid . "_posts;", "MultiDB posts", 1);
    echo "</table>";
    function digthis($connection, $sql, $desc, $count){
    	$result = mysql_query($sql,$connection);
    	$return = 0;
    		if (!$result) {
    			echo "<tr><td>$desc</td><td>Error: " . mysql_error() . "</td></tr>";
    			while ($row = mysql_fetch_row($result)) {
    				echo "<tr><td>$desc</td>";
    					for ($i = 0; $i < $count; $i++) {
    						$val = $row[$i];
    						echo "<td>$val</td>";
    				echo "</tr>";
    	}catch (Exception $e){}
    	return $return;
  • xlogz
    • Flash Drive

    Thanks Shawn,
    Any chance you can email me the file? It looks like the board chopped up most of your queries to fit to size.
    Really do appreciate you posting that and I'm sorry for being a pain in the ass.

    Guys, everything appears to be utf-8 for me as well. I didn't use a third party dump script. I used 256 databases and went straight from the old database to the new setup with the multidb script. Also I used the database script page linked to from the install instructions to pull the create databases.

  • xlogz
    • Flash Drive

    So there's 2 separate blog owners with one massive blog network and the other with a fairly large one having the same problem. Both of us didn't use third party scripts or plug-ins or editors to port our blog networks over to multi-db. Are you guys going to be able to help out?

  • Shawn
    • The Crimson Coder

    The script I wrote didn't work. Still working to find alternatives.

    I don't get it. The conversion script worked for almost all of the funky chars throughout all the archival content, but not on some. The other issue (more important), is that even *new* posts are getting funky characters inserted into them. If the whole system, at every level, is now using UTF-8, how is that possible?

    Any thoughts, ideas or recommendations of where to look next are very welcome. Thanks!

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.