Problems with www. and robots.txt

Hi WPMU Dev,

I have a strange issue with my Wordpress site (not MU). I have connected the site in the search console with both www. and without www. I have also set my preferred domain to be without www.

In the search console, I have no errors under the version without www. But under the www. version, I have an error saying this: Severe health issues are found in your property. - Check property health.

When I click "Check property health, It says: Is robots.txt blocking important pages?

The robots.txt file is exactly the same for both www. version and without.

Something that is of concern though. I can access the robots.txt file from a browser with both http://www.mydomain.com/robots.txt and mydomain.com/robots.txt?

Never had this problem before and am pretty stuck, I've also waited three days now to see if there is any change, and there is not.

Does anyone have any idea what this could be?

Thank you in advance.
Paul

  • Adam Czajczyk

    Hey Paul,

    I hope you're well today and thanks for your quesiton!

    Could you please point me to a site in question? I took a look at one of the domains registered with your WPMU DEV account but I'm not sure if this was a good shot :slight_smile:

    That said, usually Google gives you this notice to let you know that it has indexed some pages before and now these pages are being blocked. Most likely this shouldn't be a problem, I'd say that Google is just "concerned" about the "health" of your site but I'd like to take a look anyway.

    Also, could you please let me access your site's dasbhoard? To do this, please follow the guide here:
    https://premium.wpmudev.org/manuals/wpmu-dev-dashboard-enabling-staff-login/

    Thanks,
    Adam

  • Adam Czajczyk

    Hey Paul,

    Thanks for granting access!

    I took a look at your site and also run some additional tests.

    I've noticed that there's a "301" (moved permanently) redirect from "www" version to "non-www" version. That's good for SEO as it allows you to avoid duplicate content however it might be preventing Google from accessing robots.txt. I wouldn't worry that much about that though as Google will now that it should use your "non-www" site (and this robots.txt is accessible).

    Another thing I noticed though is that your sitemap seemed to be broken: validation tools reported incorrect content-type header and a quick look at sitemap's source code showed that there was no <?xml... declaration at all.

    I'm not sure why did it happen but re-saving Yoast Seo's settings helped (I assume it had just regenerated the sitemap).

    That all being said here's what I would do:

    1. Re-add sitemaps to your sites
    2. Once these are fetched and processed, try "Fetch as Google" tool to see if there are any errors reported

    I'm not sure if this will work "out of the box" as Google may need some time to go through your site. Overtime though Google should re-index your sites and these "errors" should go away. I suppose that your site was previously indexed twice (once as http://www.... and second time as non-www) so now Google "knows the address" but cannot access it because there's a redirect.

    Resubmitting sitemaps (which are correct now) and giving it some time should solve the issue in my opinion.

    I hope that helps!

    Cheers,
    Adam

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.