site maxing out server CPU - not sure where to look

I have a general WP troubleshooting question. First off - we are new to WP and are just now trying to migrate current sites to the WP platform.

We have a shared web server we set up. If you are familiar with Amazon's AWS, it is a Red Hat 6.5 EC2 instance. The server had 3 sites on it already and was running at around 20% CPU.

Once we took this new site out of beta on Friday and set it live for the world the CPU tacked out at 100%. We tried troubleshooting but were not getting very far. So in order to make it through the weekend we upgraded the EC2 instance from a m1.large to a m1.xlarge. The CPU then dropped to between 40% and 50%. We were satisfied that we could make it through the weekend and would continue troubleshooting today.

Over the weekend I randomly checked the CPU. It made its way back up to 80%-85% as of yesterday. Now looking again this morning I see that some time between yesterday evening and this morning the site is now tacking the processor at 100% again.

Can you suggest some methods/plugins/places to look in order to determine what with this site is causing it to kill the server? There is no way that a server of this size should be taken down by one site like this. We know there is something desperately wrong - but are just unsure where to look. The site did not show any signs of trouble while in limited access beta so leaving it live while troubleshooting is necessary to know when any changes actually affect the server. So any methods that wreck the way the site looks for the users the least would be best Thank you for any assistance you can offer!!

Jim

  • Jack Kitterhing

    Hi there @OgdenNews,

    Hope you're well today and welcome to WPMU DEV! :slight_smile:

    Could you let me know how many plugins you have in total? I'd highly recommend using something such as new relic http://newrelic.com/php/wordpress to monitor the server and see where the usage spikes come in.

    How many concurrent connections do you have to the server at any one time? Is this a mostly image or text based site? Could you link me to the site please to take a look.

    Thanks!

    Kind Regards
    Jack.

  • OgdenNews

    Hi Jack!

    Thanks for responding so quickly. At your request I have setup a New Relic account and have it up and running and gathering information on the server. Unfortunately it just reiterates that the problem is the website (as apache is the CPU hog). If you need any other information from New Relic to help let me know.

    There are currently 50 active plugins. It sounds like a lot - but 10 of them are just Google Ad Positions we wrote. These basically just drop JavaScript code on the site using a shortcode plugin we also have installed. If you would like to see the plugins installed I can provide a list or I would be glad to enable the Support Access module in the WPMU console.

    We are using an AWS MySQL RDS database. Looking at the DB connections on it the count stays pretty steady at around 20 connections. So - not a lot of connections.

    The site is http://www.nh.com and although it has many images I would say it is mostly a text based site since there are not actually image galleries. Just story and event related images. Images are stored on Amazon's S3 servers.

    Again - I appreciate any input you have! Thanks!

    Jim

  • Timothy Bowers

    Hey there,

    There are a number of things you can look at on the site side of things.

    You mentioned that you have 50 plugins, 10 are simple ones. But the other 40?

    Any heavyweight ones?

    YSlow gives you a grade D - See screenshot.

    Google Analyze also has some suggestions including your images.

    You could also check your logs to see what's being accessed and causing the load. Is it spam bots trying to comment, or trackbacks?

  • OgdenNews

    Jack and Timothy,

    I wanted to thank you both for your efforts in helping track down this issue. I also wanted to update this on what the solution was.

    Jack - New Relic was a nice addition to have. You can never have too much information to look at. And a New Relic account is free with Amazon's AWS so that worked out nicely.

    Timothy - while we did see the ySlow results (which we will work those out as well) we knew something else had to be wrong. The server this is running on is a very powerful server - much too big to be brought down by slow loading scripts and images.

    I finally got mod_status installed (I had been trying to get it set up successfully but was failing). Once this was up and running we were able to easily see the problem. This site was a rebuild from a previous version built by another company - so we had little knowledge of what all was tied into the old version. The company previously had something set up with Local.com that would come to the site and grab an RSS feed. Apparently this was set to hit with EVERY page view on Local.com.

    At any point we could see upwards on 17-20 different connections at a time coming from Local.com trying to pick up that RSS feed - which no longer existed on this new format. But instead of just getting a 404 error and stopping, the site was apparently attempting to se the automatic URL search for links that did not exist.

    Because of this - each one of those 17-20 connections every second were listed as using 50+% CPU processing. We contacted the company, who in turn contacted Local.com to tell them to stop scooping us that RSS feed. We also blocked their IP's from the box for the time being to get it to stop. The server CPU is now running at a cool 15-20%, down from the 95-100% it was prior.

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.