How We Host WordPress: Infinite Scale, Redundancy, and Code Audits
We host millions of sites across our CampusPress and WPMU DEV Hosting platforms. And we’re also behind Edublogs, likely the largest WordPress Multisite network in the world (if you don’t count WordPress.com).
Over the past 11+ years of building and growing these services, we’ve evolved everything that we do. We’ve made mistakes, learned a good deal, and will constantly be improving our infrastructure and practices to ensure an even better service for our customers.
We thought it would be good to share a little behind-the-scenes look at our current setup. We’ll also run down how much setting something like this up on your own will cost. If your site gets decent traffic, requires no downtime, or is designed for multiple users to be logged in at once (like Multisite, BuddyPress, or a Membership site), then this guide is for you!
Most of what we’re sharing in this post comes from our CampusPress service, which hosts Multisite networks small and large for schools and universities around the world. For example, we host sites like emergency.cornell.edu which must be able to handle hundreds of thousands of potential visitors in a short period of time should an awful event or natural disaster occurs on campus. Main websites (providence.edu) and news sites (thelantern.com) also can have high load and must be up and running without any downtime.
Surprisingly, these high traffic sites aren’t technically the hardest to host. Cache and CDNs can help with static content. However, membership sites, forums, BuddyPress social networks, and any other site that has logged in user activity creates a database load that grows significantly even with modest numbers of users. This is when redundancy and separating the database from web servers comes into play. And for us, Amazon Web Services has allowed us to create the ideal environment for WordPress and WordPress Multisite.
Why We’ve Moved To AWS
For many years, we leased servers in a data center from Peer1. This worked just fine, as our technical team could remotely manage the servers to build an optimized setup that served us well.
But, like much of the web, there are many factors that have resulted in us slowly moving all of our infrastructure over to Amazon Web Services (AWS) in the past few years. For example:
- Localization – AWS makes it easy to set up in regions all over the world, including the US, Canada, Australia, and the EU. Peer1 also had data centers in multiple countries, but not nearly as many, and it wasn’t as easy to move and manage between them. Many of our customers increasingly require hosting within specific countries due to laws and regulations around data. Others simply want to reduce latency.
- Pay-as-you-go – Our previous arrangement required us to lease servers on a monthly basis, regardless of if they were needed. We had to be ready for traffic spikes at all times. AWS is more of an on-demand model, allowing us to spin up virtual servers almost instantly, so we only pay for what we use. Similarly, when traffic is low, like when schools are out for many of our customers over the Christmas holidays, our bills go down.
- It’s freakin’ Amazon – AWS has become the most trusted and well-known cloud provider that there is. We’re able to leverage the trust (and security certifications) that AWS has in place to reassure our customers of the quality of the technology behind the sites that we host.
The Virtual Private Cloud
The diagram above shows the basic structure of each Virtual Private Cloud (VPC) cluster that we use. We can host multiple WordPress Multisite networks on each, though some customers will need or want their own dedicated cluster. We use similar VPCs to host this blog and also our Edublogs.org (which has over 4 million sites!).
Let’s look at the VPC in some detail…
The first thing each visitor will hit will be a Content Delivery Network or CDN. We are a CloudFlare hosting partner, so most of our customers use CloudFlare, which includes some additional security benefits like a WAF (web application firewall) and DDoS protection. Others choose AWS Cloudfront, and others still will enable any of the countless CDN services out there. The CDN serves images and static content from whichever data center is closest to a visitor, which limits the traffic that actually makes it to the web servers and can speed up your page load times. Check out our review of some of the CDN options here.
EC2 and Elastic Load Balancing
For the actual web servers, we use at least 2 EC2 large C4 instances running Linux with 8GB memory each. Within each AWS region, there are multiple “availability zones”, which are separate physical data centers. This builds in redundancy, should there be an outage or natural disaster that affects one location, the other can take over.
Directing traffic to these EC2 instances is an Elastic Load Balancer that determines which EC2 virtual server should handle each page view or action from a visitor.
Docker containers keep different WordPress installations separate from each other across the instances.
For the database, which houses the content, comments, and user data, we use two RDS M4-Standard instances running MySQL. These are setup in a ‘master/standby’ arrangement with a failover to the standby should something go wrong with the master.
S3 File Storage
Using S3 for user file uploads like images and files was our first experience with AWS – and it is something you can (and should) do even if you are hosting your site somewhere other than Amazon. S3 is fast, redundant, and downright cheap for storage and bandwidth.
Your codebase, including WordPress core, plugins, and themes needs a home. We’ve become partial to the relatively new Elastic File System (EFS) on AWS to handle this. We use Bitbucket.com for code management and version control, and an in-house deployment application to make updates across all of the sites that we host. You could also use Git or other code hosting and management services.
Adding AWS Elasticache service to the mix means that we can serve any static HTML content to visitors without requiring any work in the database. Keep in mind that usually, logged in users aren’t served cached content. So if your entire site is private or a membership site, cache isn’t going to do much for you.
Ec2 instances can send emails from WordPress too, like comment notifications or password resets. But if your site sends a lot of emails, especially if you are using something like Subscribe By Email, you are better off using the service specifically designed to handle email. If nothing else, SES allows you to increases your odds of emails being delivered (and not being flagged as spam).
Cloudwatch Alarms and Logs
Watching over the entire VPC like a hawk is Cloudwatch. Collecting logs and monitoring resources, Cloudwatch alarms can automatically add (or remove) EC2 instances when load warrants it, so that you aren’t paying for virtual services when they aren’t needed, and you can also scale to handle the highest of traffic you can imagine.
Beyond The Infrastructure
The servers are just one part of hosting WordPress high availability sites that scale. Sites can go offline for many reasons, including plugin/theme conflicts, user error, a 3rd party service you rely on, and more. This is why we have pretty strict procedures in place to help prevent any of these possibilities from ever happening.
Code Guidelines For Plugins and Themes
For any of the enterprise sites that we host, one of the big differences the average user will notice is that plugins and themes can’t be added directly from the WordPress dashboard.
Over the years, we’ve created a list of functions and code requirements that must be met for any plugin or theme that we host. For those used to being able to just add any and all plugins willy-nilly to their sites, this can sometimes be a point of contention.
But we’re after high performance and secure code. And not all plugins and themes are created equal. So our team of developers manually reviews every single theme and plugin that we host.
Here’s a list of what we look for – all plugins and themes that we support must:
- adhere to the WordPress Theme Guidelines and WordPress Coding Standards;
- not rely on 3rd party services (unless we can ensure it fails gracefully and/or approve otherwise for well-established services);
- not automatically upgrade or modify files;
- not change timeout of wp_remote_* calls;
- not ever change wp_feed_cache_transient_lifetime (hook to the filter);
- not use SHOW TABLES, instead use SHOW TABLES LIKE ‘wp_xyz’;
- not use DESC to describe table, instead use DESCRIBE;
- not change WP_DEBUG, error_reporting or display_errors;
- not remove default roles (remove_role);
- not flush rewrite rules ($wp_rewrite->flush_rules is not allowed);
- not flush cache (wp_cache_flush is not allowed);
- not contain SQL queries. Should use WordPress built-in functions for fetching post, pages, attachments, users and respective meta tags;
- not create new tables or modify table schema;
- not use filesystem functions listed here;
- not store files in the server file system. Must always make use of WordPress attachments if it accepts file uploads;
You might be surprised at how many plugins and themes that we evaluate don’t pass these guidelines. Custom SQL queries is the most common problem that we see.
And each update of plugins and themes are checked to ensure nothing gets by.
Quality Assurance and Testing
We also turn off auto-updates of WordPress core, plugins, and themes. We want to thoroughly test updates before they go live. For most customers, we run a weekly ‘change management’ cycle where updates are pushed out to each region early on Tuesday mornings. This way, our customers know when to expect updates, and we can plan our team to be around and monitor. There are never any surprises.
Before a change or update can make its way through the process, it must:
- Be manually tested and reviewed fully in local testing environments by at least two developers
- Pass any possibly automated and/or unit testing in multiple development environments
- Pass manual testing by QA/support team in multiple development environments
- Be deployed to a small subset of live sites and all customers’ development/test sites that willingly participate in beta testing program for a minimum of 72 hours
- Pass a final manual code and performance review by technical team leadership
Putting It Together – The Costs
When you combine the technical infrastructure of AWS with the strict practice of code management, you get sites where you can expect 99.99% uptime or higher, and that can handle any traffic volume that you can throw at it.
But everything comes with a price. Just how much are we looking at if you try and set something up like this yourself?
Let’s start with the AWS private cloud cluster. Here is a rundown of current prices for the US-Virginia region:
Two RDS M4 Large instances for the database – $126.00 each.
Two EC2 C4 Large instances for the web servers – $144.00 each.
One ElastiCache M3 Large instance – $131.04
One Elastic Load Balancer instance w/ minimum 10GB data processed monthly – $18.08
One EFS file storage instance with 100GB – $30.00
This alone is $575.12 per month – and we have yet to pay for a single visitor, upload file storage, or even 1mb of bandwidth. You could easily add hundreds, if not thousands per month depending on your traffic.
We also have yet to factor in costs for the multiple developers and DevOps engineers you’d certainly need. Yikes!
Is There Another Way?
For the DIY type that was hoping this post would be a little more detailed, check out this post, which walks you through step-by-step how to configure AWS almost identically to what we described here.
For others of you, you too can benefit from the same scale and procedures with our enterprise hosting services – for a fraction of the cost of going at in on your own. ;)