Performance Issues

Hi,
this is my first thread here at wpmudev. I have read all the newest threads in "Advanced WordPress MU Discussion" as well as "Multi-DB Support" and searched the forum for topics on "performance".

We are running some of the bigger blog communities here in Europe (blog.de, blog.fr, blog.co.uk) as well as some other sites (blog.ca) and are considering using wordpress mu + buddypress for some of our sites in the future.

We are quite experienced with scalable hardware infrastructure (15+ webservers, 4+ db-server (each 64GB RAM), mysql replication setup, various memcache servers, fileservers, centralized storage engines, etc.)

I would like to find out if anyone has experience with large wordpress mu + buddypress setups. We are looking at:
- aprox. 500.000 Blogs (growing fast, so should be able to handle 1 Mio).
- aprox. 400.000 Visits per day
- aprox. 1.500.000 PIs per day.

Principally I would like to find someone who has a similar setup in terms of users/blogs/visits/pis and can tell me about their hardware infrastructure, db-setup and possible changes they made to caching mechanisms or db-changes (for example using InnoDB instead of MyISAM for some table with high write ratio (problem of table locking on MyISAM Tables).

Thanks for your help.

Flo

  • florian.wilken

    @Andrew:
    Thanks for your reply.
    I am more than happy to work with you guys at Incsub. However, I would very much like to get a feeling about the hardware resources I would need if I switched to wordpress with such a user- and blogbase.
    Maybe because of hardware requirements it might not be viable for me to pursue the wordpress idea further. So that is what I would like to find out before diving into more detailled consulting.
    Regards.Flo

  • drmike

    Mmmm, red heads and brownies. :slight_smile:

    That whole thread makes for good reading actually. I've seen too many wpmu site admins start off without a plan. +1 to FW for doing so.

    Biggest setup we have with a client is about 40k blogs on 2 servers: the front one being wpmu files and memcache while the off net box being the databases and uploads. (I think that's how it goes.)

    edit: Oh and the google mini as a search box but that's pretty much completely separate.

  • James Farmer

    I'd have to add that the most valuable thing we've ever done at Edublogs was to invest in an excellent SysAdmin.

    At the end of the day when you get to the level of complexity and load you;re talking about - straightforward (and even high level) hosting support just doesn't cut the mustard... for example I wouldn't recommend pSek or Peer1 for this setup.

    Instead, an excellent sysadmin and serverbeach (which, compared to peer1 pricing, actually meant that we saved money!) has done the job for us.

  • James Farmer

    Michael (the excellent sysadmin I mentioned :slight_smile: and I had a chat about this and he's worked up a draft of what we've done, in terms of how you could do it.

    We'd love to be able to flesh this out more and turn it into a wpmu dev guide so questions / additions / ideas around it all more than welcome:

    --

    The fundamental principle in scaling a large WordPress installation runs along the same basic principles of scaling any large site. The key component is to truly understand your application, the architecture and the potential areas of contention. For WordPress specifically, the two key points of contention and work is the page-generation time as well as the time spent with the database.

    Database Layer:

    Given the flexibility of WordPress, the database is the storage point not only for the "larger" items, such as users, posts, comments, but also for many little options and details. The nature of WordPress is that it may make many round trip calls to the database to load many of these options -- each requiring database and network resources. The first-level of "defense" on overloading the database would be to use the MySQL Query Cache. The Query Cache is a nifty little feature in MySQL, where it stores -- in a dedicated are within main memory -- any results of a query for a table which has not recently changes. That is, assuming a request comes in to retrieve a specific row in a table -- and that table has not recently been modified in any way -- and the cache has not filled up requiring purging/cleaning -- the query/data can be satisfied from this cache. The major benefit here of course is the to satisfy the request -- the database does not need to go to the disk (which is generally the slowest part of the system) and can be immediately satisfied.

    The other major boost for the database would be to keep the working set in memory. The working set is loosely defined as the current set of data which will be aggressively referenced in a period of time. Your database can have 500GB worth of data -- but the working set -- the data actually needed NOW [and in the next N amount of time] is only 5GB. If you can keep that 5GB within memory (either using generous key-caches & system I/O buffers for MyISAM or a large Buffer Pool for InnoDB) will of course reduce the required round-trip-time to the disk. If the contention in the database is write related, consider changing the storage engine for the WordPress tables to InnoDB. Depending on the number of tables -- this can lead to memory starvation, so approach with caution.

    The last point on databases is disks. In the even the working set doesn't fit in memory (which is most of the time usually), have the the disk sub-system be as quick as possible. Trade in those "ultra-fast 3.0GB SATA" disks for high-speed SCSI disks. Consider a striped array (RAID-0) -- but for safeties sake let it be a RAID-10. Spread the workload over multiple disks: for 150GB of disk space, consider getting several 50GB disks so that a large throughput can be obtained. If you will be doing heavy writes to this disk-subsystem, a battery-backed write-back cache. The throughput will be a lot higher.

    The really nice "defense mechanism" for the database is to avoid the database all-together. As mentioned earlier, per-page WordPress tends to make many many database calls. If these calls can be drastically reduced or eliminated the database time goes down and page-generation time goes up. This is usually done by using memcached. There are two types of cache: object-cache (which are loosely defined as be being things like options, settings, counts, etc.) and full-page cache. A full-page cache is a fully-generated page (HTML output and all) which is stuffed into cache. This type of cache of course virtually eliminates page-generation time altogether.

    We should not forget to mention MySQL slave replication. If your single database server cannot keep up -- consider using MySQL replication and using a plugin like MultiDB or HyperDB to split the reads and the writes. Keep in mind that you will always have to write to a single database -- but should be able to read from many/any.

    Page-Generation Time:

    WordPress spends a considerable amount of time compiling and generating the resultant HTML page ultimately served to the client. For many, the typical choice is using a server like Apache -- which with its benefits also brings some limitations. By default, in Apache the PHP processes are built into the processes serving all pages on the site -- regardless if they are PHP or not. By using an alternate web server (e.g. nginx, lighttpd, etc.) you essentially "box-in" all PHP requests -- and send them directly to a PHP worker pool which can work on the page-generation part of the request. This leaves the web server free to continue serving static files -- or anything else it needs to. Unlike Apache, the PHP worker pool does not even need to reside on the same physical server as the web server. The most widely used implementation is using PHP as a FastCGI process (with the php-fpm patches applied).

    File Storage:

    When using multiple web-tier servers to compile and generate WordPress pages, one of the issues encountered is uploaded multi-media. In a single-server install, the files get placed into the wp-content/blogs.dir folder and we forget about it. If we introduce more than one server -- we need to be careful that we no longer store these data files locally as they will not be accessible from the other servers. To work around this issue, consider having a dedicated or semi-dedicated file server running a distributed file-system (NFS, AFS, etc.). When a user uploads a file, write it to the shared storage -- which makes it accessible to all connected web-servers. Alternatively, you may opt to upload it to Amazon S3, Rackspace CloudFiles or some other Content Delivery Network. Either way, the key here is to make sure the files are not going to be local to a single web-server -- as if they are -- they will not be know to other servers.

    On a distributed file-system, refrain -- or never -- serve files off this system directly. Place a web-server or some other caching services (varnish, squid) who is responsible from reading the data off the shared storage device and returning it to the web server for sending back to the client. One advantage of using something like varnish is that you can create a fairly large and efficient cache -- in front of the shared file system. This allows the file-system to focus on serving new files and leaving the highly-requested files to the cache to serve.

    Semi-static requests:

    For requests which can be viewed as semi-static, treat them so. Requests such as RSS feeds, although are technically updated and are available immediately following the publishing of a post, comment, etc. consider caching those for a period of time (5 minutes or so) in a caching proxy such as varnish, squid, etc. This way you can have a high number of requests for things like RSS feeds be satisfied almost for "free" -- as they only need to be generated once and then fed by the cache hundreds or thousands of times.

    What we use:

    3x web-tier servers
    2x database servers
    1x file server

    The web-tier service each has an nginx running, a php-fcgi pool and a memcached instance. The Edublogs.org name resolves to three IP addresses - each being fronted by one of the nginx servers. The nginx is configured to distribute the PHP requests to one of the three servers (itself or the other two in the pool).

    The database servers in this case are functioning as a split-setup. The heavier traffic (e.g. blog content) is stored on one set of servers and the global data is stored on a separate set. "Global" data can be thought of options, settings, etc.

    The file server is fronted by a varnish pool and connected via NFS to all three web servers. Each web server has a local copy of the PHP files which comprise the site (no reading off of NFS). The user uploads a multi-media file which then gets copied over to the NFS mounts. Upon subsequent requests -- the data is server in return by varnish (who also caches it for future requests).

    --

    Looking forward to your feedback!

  • florian.wilken

    @James:
    Thanks so much for your long reply. There is some really usefull information for me.
    I have some resulting questions:

    - Would you set all (global tables and individual blog tables) the tables to InnoDB, or just the global tables?

    - Which if the memcache-plugins do you use? Batcache and WP Cache? (My understanding is that WP Super Cache is not suitable for a distributed setup as it is working with static files). What about W3-Total Cache?

    - Are there any special performance issues I have to keep in mind when using buddypress?

  • James Farmer

    Heya,

    I should note that I'm not the expert here :slight_smile: But from my experience...

    - Would you set all (global tables and individual blog tables) the tables to InnoDB, or just the global tables?

    Just the global ones as they get the most action (by far!)

    - Which if the memcache-plugins do you use? Batcache and WP Cache? (My understanding is that WP Super Cache is not suitable for a distributed setup as it is working with static files). What about W3-Total Cache?

    Yeh, super cache shouldn't be used for any complex or large scale deployment (learned that from hard experience!) I'm 95% we're Batcache and WP Cache

    - Are there any special performance issues I have to keep in mind when using buddypress?

    Nope :slight_smile:

    Hope that helps.

  • Michael

    Just to re-iterate / verify what James mentioned --

    The global tables are InnoDB as there are not that many of them and thus have better performance. One of the primary reasons for the individual blog tables are not InnoDB is because of InnoDB data dictionary issues. For large amounts of tables the dictionary can become too large and exhaust all memory on the system. Though there are patches available to change this behavior -- the individual tables are still mostly read-only which MyISAM does quite well.

    As for caching: We use the memcached-backed object cache and on top of that we also use Batcache (which utilizes the memcached-backed object cache).

    -Michael

  • nickd32

    bump for drmike

    It looks like WP-Cache hasn't been updated since 2007 and Batcache is now a year old too.

    @drmike - it looks like W3-total-cache now supports WPMU

    @james - Super cache shouldn't be used for any complex or large scale deployment (learned that from hard experience!)

    @james - can you elaborate on this? I've been trying (unsuccessfully) to get WP-super-cache to work on my WPMU install, but no joy. Is it because of Multi-DB? Couldn't it be run in "half-on" mode (i.e. uses WP-Cache but not the SuperCache part)?

    @michaelk - Is "memcached-backed object cache" a plugin or script? Where might we find documentation on how to download/implement it?

  • drmike

    These are the instructions and plugins that I've been following:

    http://www.lullabot.com/articles/how_install_memcache_debian_etch

    http://wordpress.org/extend/plugins/batcache/installation/

    edit: Haven't looked at the wp-cache side of it. Had planned on using wp-super-cache in half mode since that's what we use normally on our wpmu installs.

    We have 12 gigs of memory on our servers and they're underutilized. We've been thinking about rolling out memcache on them. Looking at what we'll need to do on all of our platforms.

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.