Server Crashes: Multisite Or Server Limits?

My server is habitually reaching capacity specifically in what looks like PHP or RAM (managed services).

We ran the gamut in the causal agent from brute attacks to 'just outgrowing the server".

We started to see our server slowing down (site load time) tonight and contacted managed services. One of the guys I respect said what he called a "black hole" app. Where a site just gradually keeps consuming resources until the server crashes. He admitted it's not "for certain" but he is definitely leaning in that direction.

So I need a way to troubleshoot this and either confirm its multi-site or not. The problem is trying to get a realistic methodology for troubleshooting a live network with real traffic while trying to detect a subtle consumption of resources, while at the same time, not even knowing if its our multisite network or not.

Managed Services dug into the error logs and spent time looking over the server, but there are no clues we can find. It seems like the server is using around 40% of the resources and in a matter of minutes we need to perform a graceful reboot of one or more services (LAMP). The do say they can see php services active.

Now we run almost 100% developer grade licensed products and we just finished trimming old plugins out of the mix.

So I am looking for suggestions to give managed services and troubleshooting our multisite network to see if there some type of plugin or core issue that is slowly eating resources until the server hits its limit.

A great toy to show which plugins and multisite resources are consuming the most server resources would be perfect, but I am not crossing my fingers on that one. :slight_smile:

As always... thanks...