Smartcrawl sitemap error causing problems

The sitemap shows this error in a browser

XML Parsing Error: not well-formed
Location: https://MYDOMAIN/sitemap.xml
Line Number 11, Column 39:<loc>https://MYDOMAIN/?p=1973&preview=true</loc>

Also when I navigate to the location
/httpdocs/wp-content/uploads/sitemap.xml
the sitemap .xml file and also the sitemap.gz.xml file cannot be seen, but they reappear after requesting them in a browser, but of course I have the error above.

I often also see "File does not exist" in logs from Googlebot.

My question is therefore why is Smartcrawl making these errors, causing the parsing error, the sitemap not being in the uploads directory on every occasion I look, and then the files to be missing from Google?

It seems very hit and miss whether or not the sitemap is actually available when Googlebot comes looking.

  • Adam Czajczyk

    Hello Joe,

    I hope you're well today and thanks for your question!

    The sitemap shows this error in a browser

    I checked the sitemap and can confirm that. The reason for an error is that there's an amperstand ("&") character in URL and as to my knowledge this should be automatically encoded to either "&" or "&#38" HTML entity.

    This is certainly something that should be further investigated so let me please ask our 2nd-line support team for help on this one. Hopefully, they'll be able to find a solution soon, please note though that their response time may be a bit longer than ours here on support forum as they deal with a lot of complex issues on daily basis. I've however forwarded that issue to them.

    Also when I navigate to the location
    /httpdocs/wp-content/uploads/sitemap.xml
    the sitemap .xml file and also the sitemap.gz.xml file cannot be seen, but they reappear after requesting them in a browser, but of course I have the error above.

    I often also see "File does not exist" in logs from Googlebot.

    The sitemap file reappears after loading the site into the browser because SmartCrawl regenerates sitemaps then. The physical location of file doesn't matte for Google if only file is accessible from outside your server, also script or .htaccess redirect for that file is good enough to make it work.

    Also, you should be able to change the default path for sitemap file on your site's "Settings -> SmartCrawl -> Sitemaps" page.

    However, what puzzles me here is that, as you say, sitemap "disappears" so Google cannot find it. That said, could you please go to your SmartCrawl settings page ("Sitemaps" tab) and check if "Disable automatic sitemap updates" option is set to "No"?

    You may also want to make sure that the "Automatically notify search engines when my sitemap updates" option for Google to yes.

    Finally, I think that the "File not found" Google issue may be also closely related to that error you mentioned as since browser is not able to read file, Google most likely is not able to load and parse it too. That said, as I've already forwarded this to our 2nd-line support, I hope they'll be able to lend as a hand here.

    Please keep an eye on this thread!

    Best regards,
    Adam

  • Adam Czajczyk

    Hello again Joe!

    So much typing just to find out that I could type much less :slight_smile: I'm sorry if I brought you into confusion. I just got a message from plugin's developer and it turns out that there's a bug in SmartCrawl that's causing amperstand character not to be properly encoded. It rarely affects sites as mostly there are "pretty permalinks" that do not include that but unfortunately it does happens.

    That said, I was told that the fix for this is ready and should be released with the nearest plugin update. I think it should also automatically fix the Google "File not found" error as well.

    An update should be release withing next few days so please keep plugin up to date as update rolls-out and let me know please if it solved the problem.

    Best regards,
    Adam

  • Adam Czajczyk

    Hello Joe!

    As for update from SmartCrawl 1.7.6 to 17.7 version. In case you are not able to do this via WPMU DEV Dashboard, you could perform update manually by downloading the plugin and overwriting folder inside /wp-content/plugins/ directory on your server via FTP.

    https://premium.wpmudev.org/project/smartcrawl-wordpress-seo/

    I have just visited your site's main sitemap and took a look at it. It was shown as a proper XML-valid document. W3 XML validator tool returned no errors and warnings as well.

    This would mean that the sitemap is fine now. Is that possible that it just get regenerated since your last post? Did you make sure in Google Console an old sitemap has been removed and a new one submitted?

    Let me know please!
    Best regards,
    Adam

  • joe

    Hello Adam

    I have version 1.7.7
    I tried regenerating the sitemap yesterday via Smartcrawl but it reproduced the error

    The reason you got a correct sitemap when you visited is because after the regeneration failed to correct the error I downloaded the sitemap and manually removed the following incorrect entry...
    <url>
    <loc>http://%20%20Add%20Med</loc>
    <lastmod>2016-01-23T23:20:50+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
    </url>

    I then uploaded the corrected sitemap and have attached a screenshot where you can see the different timestamps of the sitemap and the .gz compressed version which would normally be the same.

    I have just a few moments ago used Smartcrawl to regenerate the sitemap and the error has reappeared. Confirmed by Google Console
    "This is not a valid URL. Please correct it and resubmit."

    The timestamp of the newly generated version of the sitemap (and the .gz version) are now the same around 10.35 am my time, but the error seems the same as before

    <loc>http://%20%20Add%20Med</loc>
    <lastmod>2016-01-23T23:20:50+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
    </url

  • joe

    Thank you Predrag, I very much appreciate your assistance.

    The sitemap is still generating this rubbish...
    http://%20%20Add%20Med

    I'm trying to stay loyal to WPMU, but when I can find a another sitemap plugin in less than 10 seconds which has been downloaded 10 million times and has more than 1+ million active installs I have to question whether Smartcrawl is the right product for me.

    I realise Smartcrawl has other features in addition to Sitemap generation, but I have now seen other plugins with extensive SEO features and I am obliged to now start testing them with a view to reducing my reliance on WPMU plugins.

    Once again Predrag, thank you for your help.
    It will be interesting to see if the developer responds in a timely fashion.

    [EDIT]
    I forgot to ask, is it possible to turn off the Sitemap section of Smartcrawl?

  • joe

    Hi Predrag

    I want to turn of the sitemap functionality of Smartcrawl and install another standalone sitemap plugin.

    I would also like to know if this will affect the page titles, descriptions, index, noindex, follow, nofollow etc etc on existing pages and if these options will still be available in Smartcrawl if the sitemap is turned off.

    One more question regarding Smartcrawl...

    On Step 4 of the Smartcrawl Wizard "Moz"
    When I installed the plugin I chose not to bother with that option at that time.
    The button is still there but when I click it I receive this message...

    You've chosen not to set up 'Moz', please move onto next step.

    I have no option to change my mind and activate MOZ settings.

  • Adam Czajczyk

    Hello Joe,

    I hope you're well!

    I can understand you decision of course, I hoped that latest release solved the issues but it turned out it didn't. As my colleague @Predrag Dubajic said, he already talked to plugin's developer about it. Just in case you'd like to give it another try or would like to use it on other sites of yours, we'll let you know once the developer responds with a solution.

    Best regards,
    Adam

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.