Monitoring

I have been getting 'website down' alerts for pretty much all my sites recently, multiple times per day, including emails that suggest they've been down for over 24h when they come back online. When I check the sites after receiving the notification, they are still online...

The hosting company say their service has not been down.

Can you shed any light as to why the uptime monitoring is incorrectly flagging my sites?

Thanks folks

  • Adam Czajczyk

    Hi Ramsay

    I hope you're well today and thank you for your question!

    The Uptime monitoring attempts to check your site very two minutes. What it does is sending a HTTP request to your site (in a way similar to how a browser would do this), asking it to respond with HTTP headers.

    At this point, two things must happen for Uptime monitor to consider the site up and running: it has to respond with a right HTTP status (e.g. 200 or 3xx) and it has to do it within a specific timeframe, which is, I believe, up to 30 seconds.

    If one or both of these doesn't happen, the site is considered down and you got a notification.

    Basically, apart from site being really down, often the reason for such "false notification" is the site temporarily performing very slow (e.g. to some unexpected server resource limits being exhausted) but also there might be some additional security layer (a plugin or some tool on a server) applied that, for example, rejects connections from Uptime monitoring.

    That being said, did you configure recently any security tool on site? Are you aware of any changes of that kind on a server?

    Could you also get in touch with your host again and ask them:

    - if they made any security-related changes recently/if they can find out if connections from Uptime service weren't blocked on their end?

    - if they could check server logs to see if around the times when you got these e-mails, some server resource limits could have been reached for your site?

    Best regards,
    Adam

  • Ramsay

    HI Adam,
    Since our last update, there was a stable spell, but I've since continued getting site down alerts, especially over the weekend there - all site were apparently down...

    I contacted the host and they say the sites has been up the whole time after checking their logs.

    Is there anything else that could be causing this?

    Every time I check to see if the site is actually down, it isn't. Now I'm starting to ignore the emails they're coming in so often, which makes them kind of useless!

  • Predrag Dubajic

    Hi Ramsay,

    I had a look at your site and even though it does load in my browser there are some serious issues happening with it, I have tested it with multiple ping tools and all the results are showing that your site is down, these are the test results:
    GeoPeeker:

    Uptrends:

    GeoScreenshot

    There's definitely something going on with your site and your hosting provider should check out these reports in more depth to see what's going on, because it's not just our Uptime showing wrong results.

    Best regards,
    Predrag

  • Ramsay

    Their reply:

    We have very extensive firewalls in place on the server, one of which is designed to determine a difference between 'bot' traffic and 'human' traffic, and excessive access from bot traffic (including monitoring tools) may ultimately end up being blocked whereas you will usually find that the sites are absolutely fine when accessed in your browser.

    The firewalls are in place to keep your sites as secure as possible, as 99% of hack attacks come from malicious bots.

    I'm absolutely confident this is the issue here.

    For instance the IP address: 34.196.51.17

    Which was greylisted last time in your previous ticket, is now on a blacklist. This is because our firewall has identified that it has made over 4,144 requests to sites on our network since the 17th October, and it was identified as a bot rather than a human, and is not already whitelisted as a known safe IP.

    So, the good news is that I don't believe your websites are actually going down at all.

    In order to resolve this, we will need to ensure that our firewalls whitelist any IP addresses that are used by uptime tools that you are using.

    Would you be able to request a list of all of their check nodes and corresponding IP address so we may get them whitelisted? Presumably they check from multiple locations and there will be a number of nodes that we will need to add to ensure that this issue is resolved once and for all.

    Similarly, regarding their post - the issue there is again almost certainly that those solutions are also firewalled. We are extremely strict on bot activity accessing our network again due to security concerns these pose, and we only whitelist bots that we identify as being safe.

    Just to add that we have now added:

    52.57.5.20 and 34.196.51.17

    Have both been added to our firewall whitelist.

    These are the IP addresses found here (https://premium.wpmudev.org/forums/topic/need-ip-to-whitelist-uptime-as-my-firewall-do-not-allow-whitelisting-user-agent).

    That post is fairly old now though so if there are any additional addresses that they now use since that post was made, please let us know what these are and we will be able to add them for you.

    Have they got everything covered here now?

  • Adam Czajczyk

    Hello Ramsay

    Thanks for getting back to us with host response. I think they did pinpoint the issue, they also properly identified our Uptime monitoring IPs so if they already whitelisted them permanently, that should be fine. Just to confirm, these two IPs should be whitelisted:

    34.196.51.17
    52.57.5.20

    The Uptime monitoring will cause some frequent hits to the server as it does a check every 2 minutes, I believe, so that would give 30 (per hour) * 24 = 720 hits per each site/a day. Assuming that site is up because when it detects it being down it might start making more requests in order to immediately detect when it goes up (and if requests are blocked it never detects it back up so keeps checking over and over again).

    Since they got those "heavy firewalls" I think it would also be good if they could whitelist some additional IPs as well:

    66.135.60.59
    66.135.49.214
    66.135.60.64
    104.236.238.22
    104.236.50.140

    Those are used for some other services that you might already be using or you may use in future, such as communication with The Hub, managed backups, some Defender, Smush and Hummingbird features that require connection to our API.

    Whitelisting them all should solve both Uptime montoring issue and possible WPMU DEV API/Cloud connection issues in future.

    Best regards,
    Adam

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.