Mixed collation now causing errors when trying to migrate site

Got Digital Ocean WordPress server up & running, then realized that the WordPress database (on the live site) for the site I’m trying to move has mixed collations!

When I migrated it to the D.O. WordPress installation, it started actually causing PHP exceptions/unrecoverable errors specifically pointing to incompatible collations.

So, I did a quick check and every table has mixed collations between utf8_general_ci, latin1_swedish_ci, and utf8mb4_unicode_ci. I never touched the encoding from the default and none of the wp-config.php files specify an encoding.

Three questions:

1> Why’now did it this happen?

2> How can I prevent it in the future?

3> How do I resolve this? Do I force collation all to e.g. utf8mb4_unicode_ci or utf8_general_ci and, if so, is there a tool for doing it? I’ve found various SQL two and three liners, but no comprehensive solution and changing the collation in PhpMyAdmin (table by table, the only way it works) would take *forever*.

Thanks,

Steve

  • Vaughan
    • Support/SLS MockingJay

    Hi ssteinerX

    How it happens is usually something doesn’t explicitly set a character set, inwhich case, the tables will be created using the default MySQL character set.

    The following guide should help, https://codex.wordpress.org/Converting_Database_Character_Sets

    It’s not a simple task though, as some characters might look weird and messed up wif you simply just change the charset without actually converting the characters.

    If you’re on windows, you could possibly do this with notepad+

    https://www.lexo.ch/blog/2016/07/howto-convert-a-mysql-mariadb-database-from-any-charset-into-utf-8-with-notepad-in-windows/

    Hope this helps

  • ssteinerX
    • Design Lord, Child of Thor

    Not on Windows.

    What is the “safe” collation to convert *to* given the mix that I have?

    utf8_general_ci

    latin1_swedish_ci

    utf8mb4_unicode_ci

    There is also, now that I’m looking at it, a mixture of InnoDB and MyISAM in the same database.

    What should be the goal, given that I’m getting errors between collations?

    Is it also possible to convert all to using the same storage engine and is that necessary or even worthwhile?

    Thanks,

    Steve

  • ssteinerX
    • Design Lord, Child of Thor

    According to this: https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/

    > In WordPress 4.2, we’re upgrading tables to utf8mb4[unicode_ci], when we can…

    This only occurs if the tables are using the utf8 character set so my old latin1_swedish_ci tables won’t be (and/or weren’t, when WordPress was upgraded before).

    Shouldn’t I go straight to utf8mb4_unicode_ci? Seems like, if I go half way, the database will be upgraded anyway since it will meet all the conditions.

    I have also seen articles suggesting that all varchar fields have to be changed to varbinary before the translation to avoid mangling multi-byte characters.

    Also, is there any disadvantage to normalizing the storage engine and which one would be considered more “modern” at this point?

    Sorry for all the questions, but I’m moving dozens of sites over the next month or so to get everyone up to date after letting some clients languish and I only want to have to go through this once.

    Thanks,

    Steve

  • Vaughan
    • Support/SLS MockingJay

    Hi,

    Yes, i would convert to utf8mb4_unicode_ci, it’ll save on wp doing it then.

    Yes, you would need to convert the data to binary to preserve characters, otherwise you’ll get weird characters like ??????? or something after conversion.

    There are scripts around which can do this, which are mentioned in the articles i posted above, it’s not a simple task.

    converting the storage engine I would not do, innodb & myisam work differently, and changing myisam to innodb & vice versa could cause all sorts of problems, so leave those as they are. A plugin will use innodb for a reason, indexing and other things.

    First major difference I see is that InnoDB implements row-level lock while MyISAM can do only a table-level lock. You will find better crash recovery in InnoDB. However, it doesn’t have FULLTEXT search indexes until v5.6, as does MyISAM. InnoDB also implements transactions, foreign keys and relationship constraints while MyISAM does not.

    The list can go a bit further. Yet, they both have their unique advantages in their favor and disadvantages against each other. Each of them is more suitable in some scenarios than the other.

    So to summarize (TL;DR):

    InnoDB has row-level locking, MyISAM can only do full table-level locking.

    InnoDB has better crash recovery.

    MyISAM has FULLTEXT search indexes, InnoDB did not until MySQL 5.6 (Feb 2013).

    InnoDB implements transactions, foreign keys and relationship constraints, MyISAM does not.

    Best to leave those as they are.

    Thanks

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.