Replacing 404 pages with 410's automatically

Hi there,

I was wondering if anyone would be able to assist me with the following issue.

A few months ago, whilst our new site was under construction, our current site at the time got hacked. The hack seemed to replace pages with Japanese site alternatives.

It seems now that the issue is mainly sorted (Checked on Google Webmaster Tools and followed through the steps of registering as not being hacked, as well as updating sitemaps etc.)

The problem I am having now is that links are still being indexed on Google, and other search engines, with Japanese titles and are producing 404 Page Not Found errors.

See site:justaccounts.com in Google.

I want to removed these from being indexed as we are receiving a lot of crawl errors and a very high bounce rate because of this.

Is it possible to have any new 404 errors we produce be automatically changed to a 410 so Google will un-index it quicker? Doing this for every page manually would be quite a task as we have hundreds of these coming through.

If not, do you have any suggestions on helping with this issue as I want to start looking at reducing the bounce rate and the amount of traffic that comes from Japan.

Thank you in advance!

    Dimitris

    Hey there Peter Rowlands,

    hope you're doing good and thanks for reaching us!

    Google is expecting some steps to be made in order to remove the "This site may be hacked" message as you can also find in "Remove this message from your site" tab of next link:
    https://support.google.com/websearch/answer/190597?hl=en

    Having said that, if more pages troubles you, you can try the "410 for WordPress" plugin https://wordpress.org/plugins/wp-410/
    even though the problem with serving 410 content deleted headers is that Google’s support for it is incomplete. It will delete pages that serve a 410 from its index faster. Yet, in Google Search Console, Google will report 410s under “Not found” crawl errors, just like 404s.

    Finally, here's another blog post on how to to track 404s and proper redirect them.
    http://www.wpbeginner.com/plugins/how-to-track-404-pages-and-redirect-them-in-wordpress/

    Hope that was some help, let me know if further assistance is required!
    Warm regards,
    Dimitris

    Peter Rowlands

    Hi Dimitris,

    I am doing fine thank you, how about yourself?

    I have gone through the step to remove the 'This site may be hacked' message, so this seems to be okay now.

    I have also attempted to use the '410 for WordPress' plugin. I set this up a week or so ago and so far haven't noticed any of the 404 pages being switched to 410's.

    Is there any way I can check if this plugin has been set up properly? The function seems really useful and exactly what I am after, I just cannot seem to get it to work.

    Thank you!

    Adam Czajczyk

    Hello Peter Rowlands!

    I checked the plugin and it seems it will only work for posts/pages that were removed while the plugin is already active. The way it works is that it records those deleted URLs and sets 410 status for them. .

    For other posts/pages and content (e.g. attachments/media files) etc, it might be necessary to add all URLs manually in plugin's settings. Did you set that up already?

    Best regards,
    Adam

    Peter Rowlands

    Hi Adam,

    Thank you for the explanation, that makes sense!

    I have looked into that, but the problem I'm having is that dud URLs are still being generated pointing to our domain from the hack we encountered a few months ago.
    That hack itself has been cleared, but these links are still continually being indexed by Google.
    i.e.
    ourdomain.com/servant_gWG77G
    ourdomain.com/iterate_7GbNnrG7.
    ourdomain.com/snoop_nrnOG0bG.
    All of these links immediately become 404 errors, because they don't actually go anywhere.

    My aim was to have these immediately become 410's so Google removes them from the index faster and therefore we wouldn't be getting penalised for a high bounce rate.

    Unfortunately, manually adding them in isn't a great option because the links are generated fairly regularly and would take time monitoring this constantly.

    If it seems like this Plugin isn't an option, do you have an other suggestions on how I could go about having a 410 automatically generated for these link?
    Or if you know any way to combat the after effects of this hack, and stop the links being generated all together?

    If you need any more information, please let me know.

    Any assistance in this issue would be massively appreciated as we are still suffering for the backlash of this happening.

    Thanks!!

    Adam Czajczyk

    Hello Peter!

    Matt Cutts, a former SEO "guru", suggests here to serve 410 "if you know that page is gone for good" but concludes that it's also absolutely fine to serve 404's and that doesn't make that much difference:

    https://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes

    I gave it another thought and I think 410's might help in your case but it won't be an "ultimate solution" anyway since these links come from outside and your cannot "force" 3rd-party sites to stop "generating" them, probably.

    "404'd" pages will eventually be removed from Google index so the goal here is to speed that up. I'd go a bit different route if it was my (or managed by me) site:

    1. I'd make sure that sitemaps are always up to date and submitted to Google Webmaster central
    2. The robots.txt file is also a good way in that sense that I'd put some of those URLs there (those that occur most often) there to disallow them from being indexed:

    User-agent: *
    Disallow: /servant_gWG77G
    Disallow: /iterate_7GbNnrG7
    Disallow: /snoop_nrnOG0bG

    You'd want to put entire list of most common "broken" URLs in there the way shown above.

    3. Finally, on top of that you could use a little "trick" to make sure that for 404 page there's 410 status served. It's however a bit of a risk because if for some reason your WP generates 404 Not Found temporarily for an existing page, that would also work and serve Google 410 instead. As a result, that could slowly de-index your site. Therefore that's optional and I'd only use that if I was 100% sure that my host doesn't get "break downs" and that my site is proven to work stable.

    The trick is to change default 404 page template so it would only serve this code:

    <?php header($_SERVER['SERVER_PROTOCOL'].' 410 Gone');

    That alone would unfortunately break users experience as it would also break a nice "Not Found" error but should feed Google crawlers with 410 statuses on non-existent page.

    Best regards,
    Adam