[SmartCrawl] Sitemap is not being generated

Hi!

I’m having a problem with SmartCrawl on my customers website. I have automatic sitemap enabled but I get a 404 error when I visit the sitemap file. I also noticed, that the crawl doesn’t start. I get a success message that the crawl started, but nothing happens.

I checked the server logs but there’s nothing that would point to anything related to this.

I have a ubuntu server running nginx and php 7.3. The website routes via cloudflare.

  • Adam Czajczyk
    • Support Gorilla

    Hello Thomas

    I hope you’re well today!

    It seems there’s no sitemap created at all there.

    The domain.com/sitemap.xml file is never created by SmartCrawl and it’s always rewritten by WP to the domain.com/wp-content/uploads/sitemap.xml file. This file, however, is also returning 404 status so it’s not a rewrite issue.

    I think the key part is that crawl not running – so the file is not even generated.

    I would like to take a closer look at that then so could you please enable support access to the site? To do this, please go to the “WPMU DEV -> Support -> Support Access” page in site’s back-end and click on “Grant support access” button there, then let me know here once it’s done as I won’t be notified automatically.

    I’ll then access the site and investigate the issue so we could find solution.

    Best regards,

    Adam

  • Thomas
    • New Recruit

    Hi Adam, thanks for the quick response!

    Crawl is running, but somehow neither hub nor the dashboard pick it up ????

    I have enabled support access for you. Hopefully you can find something I’ve missed :smiley:

    Best regards, Thomas

  • Adam Czajczyk
    • Support Gorilla

    Hi Thomas

    Thank you for granting access!

    I checked the site and noticed that the crawler doesn’t actually seem to be running. It does say it starts but doesn’t really work – so that would explain why sitemap’s not created. Still though, I don’t see anything “wrong” with site configuration that would possibly break the crawl.

    I’m wondering though – you mentioned you got server running nginx. Is there any server-side cache and/or any security/firewall tool active there?

    Do you have access to server “acces log” that you could take a look and see if there are any requests that are rejected/failed?

    Best regards,

    Adam

    • Thomas
      • New Recruit

      Hi Adam,

      I tried to add the plugin via the dashboard where it didn’t show up although it was installed – now at least the crawler works. Sitemap doesn’t get generated. I created an empty xml file with the www-data user but it stays empty.

      I also looked at the access logs of the server as you suggested: nothings out of the ordinary there either. Still getting no errors in nginx or php logs, too.

      There’s no local caching on the server. I use Hummingbird static files cache and cloudflare caching which shouldn’t interfere – or at least I would see a difference when I go into the root folder of the website via ssh… ????

      Best regards,

      Thomas

  • Adam Czajczyk
    • Support Gorilla

    Hello Thomas

    The crawler does seem to work now indeed so that’s fine. The file that you created: did you put it in a root folder of the WP install (so yoursite.com/sitemap.xml)?

    If yes, let’s try it again but a bit different way: remove the file from root folder and instead create it in /wp-content/uploads/ folder, where SmartCrawl would normally put it, making sure that it’s got write permissions.

    If that still doesn’t work and we’ll take a different route but try this first, please and let me know.

    Best regards,

    Adam

  • Thomas
    • New Recruit

    Hi Adam,

    So I’ve created the sitemap.xml file under wp-content/uploads as you suggested – now SmartCrawl says that there are 21 URLs of total 21 crawled URLs missing in the sitemap. Still it’s not adding any links to the sitemap. The www-data user has write permissions for the file.

    Slowly we are getting there :smiley:

    Best regards,

    Thomas

  • Adam Czajczyk
    • Support Gorilla

    Hi Thomas

    Yeah, we are slowly getting there but I think it’s that one critical step now – making crawl results to be actually written into the file :slight_smile:

    Based on your test, I’d say there’s still something “not quite right” with permissions because:

    – the file exists and is in a right location

    – crawler works and discovers URLs

    – plugin does discover that sitemap is there and doesn’t “complain” about it

    – it even properly behaved when I set it to add those “missing URLs” to the sitemap

    – and yet the sitemap is still empty

    Let’s just do one last test/check then:

    1) Please go to the /wp-content/uploads/ folder, enter any of images’ folders and double-check

    – file owner and group names on any image file (so we could confirm what exact owner/group is used for files uploaded via WordPress)

    – and what are current permissions for it

    2) if any of those is any different for the sitemap.xml file, try adjusting them for sitemap.xml file (the one in /wp-content/uploads folder; there should be no “sitemap.xml” file in root folder)

    3) enable WP debugging by editing the “wp-config.php” file and adding following lines

    define( 'WP_DEBUG', true );
    define( 'WP_DEBUG_LOG', true );
    define( 'WP_DEBUG_DISPLAY', false );

    right above the “/* That’s all, stop editing */” line there

    4) after that run crawl again and check sitemap

    If the sitemap is generated fine finally, that would mean that adjusting that ownership/permissions fixed the issue (and you can also undo step 3 above).

    If the sitemap is still empty, please go to the “/wp-content/uploads” file and:

    – confirm if the sitemap.xml file is indeed empty (I mean physically, the file on the server)

    – and if there’s a file named “debug.log”, please download it, rename to “debug.txt” and attach to you response here.

    Note: in case you weren’t able to attach the file to your post (it might be too big), please upload it to some file storage of yours (like Google Drive or Dropbox) instead and share a link to it with me.

    Best regards,

    Adam

  • Thomas
    • New Recruit

    Hi Adam,

    So it appears it has to be something regarding permissions. I followed the steps and now there’s neither a sitemap.xml file nor a debug.log file. I can upload files to the media library and every image in there belongs to www-data and the user has read and write permissions to the uploads folder. I’m actually at a loss here :disappointed:

    Best regards,

    Thomas

    • Thomas
      • New Recruit

      Ok so minor correction: I noticed the the debug file is being created, but in the wp-content directory directly – I thought it would be created in the uploads directory. However, there were no errors in there that could have to do anything with the sitemap not beeing created. I’m currently trying to get a completely fresh log from the crawler and the sitemap generation, however the crawler times out atm…

  • Thomas
    • New Recruit

    Hi Adam,

    Finally here’s the log file. I noticed a database error because there was a table missing for woocommerce… well that’s what I get from taking over projects I guess :smiley:

    But still: the crawler now seems to get stuck after some time and exits itself without any log entry. Also there’s still no sitemap.xml file… :disappointed:

  • Thomas
    • New Recruit

    Hi,

    Looks like I figured it out eventually. Appearently I had some mismatching paths in the options table from the migration and the paths did not get updated correctly. Maybe there should be some kind of check to prevent something like this :smiley:

    Anyhow: thanks for the quick help.

    Best regards,

    Thomas

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.