Autoblog SQL DB size (wp_options, wp_autoblog_log)

@Hoang Ngo

Autoblog (now 4.1, but we've been using it for a while) seems to be responsible for an excessively large database footprint unless the DBA proactively manages it in a way that isn't necessary for most other applications (or plugins, for that matter).

It may be worth reconsidering how Autoblog uses transients, perhaps by using a separate table with an appropriately designed schema. It would also be nice to see an option that deletes log entries older than X days. (If there is that option, I guess I missed it.)

Firstly, it's not the number of posts created by Autoblog.

There's wp_autoblog_log, but there's nothing there that matters overmuch and it can be truncated without undue pain (though manually, for now, it seems, which is annoying).

The main problem seems to be what Autoblog (I think it's Autoblog) is doing to wp_x_options. See below for some data from our WPMS installation, the TL;DR- of which is:

• Site 4 is the one (and only one, of about 40 sites) running Autoblog.

• The DB engine in use is InnoDB, and we have innodb_file_per_table = true.

• Sites 2 and 4 have been around since 2009 and 2014 respectively (reflected in the number of posts and comments on site 2 and also in the size of tables wp_2_{posts,comments}. Although wp_4_posts has 2921 rows (vs 7638 rows in wp_2_posts), most of them are Autoblog posts and are therefore small.)

• There are only 345 rows in wp_4_options, the sum data length of which is about 1.7 MB yet MySQL reports that wp_4_options is consuming 3,251 MB. That's 3.2 gigabytes. That's over 4.6 times the size of the main posts table of a site that has been running for nearly 6 years.

• However, an OPTIMIZE TABLE command shrinks that down to a much more respectable 3.6 MB.

I mention InnoDB and the fact that we use innodb_file_per_table because I'm no MySQL guru (much less InnoDB guru) and it's possible (in fact, AIUI, likely) that optimize trick may not work unless innodb_file_per_table is set true. This is not the default for many installations, and Autoblog, as currently written, is likely adding to a lot of admins' (and WPMUdev customers', who don't have this level of understanding) headaches.

Either way, the upshot is that wp_x_options uses type LONGTEXT for option_value and although that may be useful for serialised PHP data structures, it's not very efficient for frequent INSERT/DELETE, especially with values as long as several hundred kB each.

Data from MySQL:

mysql> USE information_schema;
mysql> SELECT table_name, data_length FROM tables
                WHERE table_schema = 'wpms' AND data_length > 100000000
                ORDER BY data_length;
+---------------+-------------+
| table_name    | data_length |
+---------------+-------------+
| wp_2_comments |   255590400 |
| wp_2_posts    |   699039744 |
| wp_4_options  |  3251863552 |
+---------------+-------------+

mysql> use wpms;
mysql> SELECT count(*) FROM wp_4_options;
+----------+
| count(*) |
+----------+
|      345 |
+----------+

mysql> SELECT count(*) FROM wp_4_posts WHERE post_status = 'publish';
+----------+
| count(*) |
+----------+
|     2921 |
+----------+

mysql> SELECT count(*) FROM wp_2_posts WHERE post_status = 'publish';
+----------+
| count(*) |
+----------+
|     7638 |
+----------+

mysql> SELECT option_name, LENGTH(option_value) FROM wp_4_options
                WHERE LENGTH(option_value) > 100000
                ORDER BY LENGTH(option_value);
+--------------------------------------------------+----------------------+
| option_name                                      | LENGTH(option_value) |
+--------------------------------------------------+----------------------+
| _transient_feed_9cd2f75a9ebb72b18adbb123390a20d6 |               150022 |
| _transient_feed_d117b5738fbd35bd8c0391cda1f2b5d9 |               235282 |
| _transient_feed_da8cde559baa5aef366030cf29057b2e |               292635 |
| _transient_feed_0f2fcbbf42b03f8075711198a37f74ab |               292647 |
+--------------------------------------------------+----------------------+

mysql> SELECT SUM(LENGTH(option_value)) FROM wp_4_options;
+---------------------------+
| SUM(LENGTH(option_value)) |
+---------------------------+
|                   1736655 |
+---------------------------+

mysql> OPTIMIZE TABLE wp_4_options;
+-------------------+----------+----------+-------------------------------------------------------------------+
| Table             | Op       | Msg_type | Msg_text                                                          |
+-------------------+----------+----------+-------------------------------------------------------------------+
| wpms.wp_4_options | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| wpms.wp_4_options | optimize | status   | OK                                                                |
+-------------------+----------+----------+-------------------------------------------------------------------+

mysql> SELECT table_name, data_length FROM tables
                WHERE table_schema = 'wpms' AND data_length > 100000000
                ORDER BY data_length;
+---------------+-------------+
| table_name    | data_length |
+---------------+-------------+
| wp_2_comments |   255590400 |
| wp_2_posts    |   700088320 |
+---------------+-------------+

mysql> SELECT table_name, data_length FROM tables
                WHERE table_schema = 'wpms' AND table_name = 'wp_4_options';
+--------------+-------------+
| table_name   | data_length |
+--------------+-------------+
| wp_4_options |     3686400 |
+--------------+-------------+
    David King

    This is not urgent, and the solution will likely be neither easy nor quick.

    Mainly, I posted the results of my research for the benefit of others. Also maybe Hoang's (or maybe he already knew about this issue).

    At least if people know about the problem, what to look for and how to mitigate (where possible) — especially you fantastically helpful support folk (says he) — then it can be managed until Hoang can provide a more permanent solution.

    Oh, BTW, we're running MySQL v5.5.43 and WP 4.2.2.

    Hoang Ngo

    Hi @David King,

    I hope you are well today and many thanks for your information.

    The most concern case is the wp_4_options table. As the posts or comments contain the content, which use to display on the frontend, but the options table is use for cache.

    Actually, the options variables are not created by AutoBlog, they are created by WordPress, which specific is the function fetch_feed(). The purpose is to reduce the hit to the feed source.

    Btw, I want to run a query like this
    SELECT option_name FROM wp_4_options WHERE option_name LIKE '_transient_timeout%' AND option_value < now();
    This will return a list of expired cache, so you can have more information about them. For clean up, you can use this plugin https://wordpress.org/plugins/artiss-transient-cleaner/, but please make sure we got a backup before do anything massive to the database.

    Thanks again for your valuable information , which is not only benefit for Autoblog users, but also the WordPress users.

    Best regards,
    Hoang

      David King

      @Hoang Ngo

      Thanks for providing some insider's illumination over what's going on. Always the developer knows the internals better than an outsider

      Also, thanks for the points!

      As the posts or comments contain the content, which use to display on the frontend, but the options table is use for cache.

      Right, via the WP Transients API. The main difference is that wp_{posts,postmeta,comments,commentmeta} don't experience a whole lot of INSERT/DELETEs.

      This will return a list of expired cache, so you can have more information about them. For clean up, you can use this plugin https://wordpress.org/plugins/artiss-transient-cleaner/, but please make sure we got a backup before do anything massive to the database.

      These won't solve the problem. As can be seen from the data I provided in my OP, the actual number of rows in, and the aggregate data size of wp_options is relatively small (because WP automatically DELETEs expired transients. Eventually.)

      The problem is that MySQL, at least with InnoDB tables, is not very smart about storage of tables that experience frequent INSERT/DELETE cycles and it doesn't seem to re-use space. MyISAM tables might be better at this, in which case the problem may not have been historically apparent because MyISAM was the default engine until MySQL 5.5, and IIRC WordPress uses the default engine during installation. As MySQL 5.5 installations become more common, the problem of unoptimized InnoDB tables like wp_options (and wp_usermeta, BTW) will become worse and more apparent.

      The problem is exacerbated when the length of option_value is large (>1-10kB).

      Actually, the options variables are not created by AutoBlog, they are created by WordPress, which specific is the function fetch_feed().

      I see. Well, then it's a design fault within WordPress more than AutoBlog. The only thing AutoBlog could do to mitigate is implement its own feed_fetch() that used a different means of caching feed state. Perhaps it could even schedule a cron job to OPTIMIZE it in a manner that will avoid deadlock (if that is possible).

      I might file a ticket over with WordPress to alert them to the problem, if they don't already know about it. Actually, tbh, I can't imagine that WordPress aren't aware of the problem, because something similar happens with wp_usermeta, and there are much larger installations of WPMS out there than ours. But I guess WordPress.com have full-time, dedicated DBAs to manage this.

      One super important detail I have discovered since I wrote my OP, but I can't add to the OP because the edit window has expired — if you can amend my OP, I'd be grateful:

      1. OPTIMIZE TABLE can deadlock in the right circumstances. You are advised to put your site into maintenance mode and wait for any active requests to complete before running this command. IIRC, cPanel and similar admin tools offer an 'repair and optimize tables' feature as well, and I imagine the same applies to this.

      2. Beware of executing processes that take a long time to complete, especially via the web (including phpMyAdmin) because most web servers (and most PHP installations) will kill a process that doesn't complete in 30 seconds. Also, browsers have a timeout also. You can disable PHP's max_execution_time, you can relax timeout limits in the web server (how exactly you do this depends on what web SAPI you're using, which will in large part be dictated by your hosting company), and you can increase your web client's timeout (at least, in FireFox).

      But it's better to do all of these from the command line, if you're able, because it's much easier to kill a deadlocked process and you won't lose control because of a timeout.

      3. Evidently, the problem I outlined in my OP and in this post is is a well-known (although whether Automattic are aware that feed_fetch() spams the options table in quite the way it does is unclear). Readers may be interested in the following:

      https://premium.wpmudev.org/blog/squeaky-clean-database/
      https://premium.wpmudev.org/blog/optimizing-your-wordpress-database-a-complete-guide/
      http://www.wpoptimus.com/611/wordpress-plugins-cleanup-optimize-database-improve-performance/

      As a rule, I prefer to do such maintenance manually, or at least to write my own scripting, that way I know exactly what's going on and when. As mentioned above, OPTIMIZE TABLE can cause deadlock, and I would not want process that could do that to run automatically.

      I don't know how safe those plugins are, but for most WP admins, they are probably the only realistic option.

      The main thing for WP admins is to be aware of the problem and keep an eye on it. You can see the SQL queries that I used (in database information_schema) to find large tables.

      4. Here is a better still query that will show tables with the most wasted space (and only those over 10 MB):

      mysql> use information_schema;
      mysql> SELECT table_name, data_length, data_free FROM tables
                  WHERE table_schema = 'wpms' AND data_free > 10000000
                  ORDER BY data_free;
      +--------------+-------------+------------+
      | table_name   | data_length | data_free  |
      +--------------+-------------+------------+
      | wp_sitemeta  |    39895040 |   13631488 |
      | wp_2_options |    12730368 |   20971520 |
      | wp_usermeta  |    22560768 | 4730126336 |
      +--------------+-------------+------------+

      Yes, that's right, my wp_usermeta table is occupying 4.7 GB for a total data length of 22 MB, a factor of 20,966% larger than it need be!

      This is the table that deadlocked for me when I tried to optimize it. I guess I will have to schedule some downtime to do it.