Why do the number of URLs on sitemaps out of Smartcrawl

Why do the number of URLs on sitemaps out of Smartcrawl often equal 1001? Is there a setting somewhere to not allow the whole site? I can't find one.

Thank you

  • Michael
    • The Crimson Coder

    Hello Milan,

    Thank you for this.

    I would actually like to include everything but it seems that some of the sitemap.xml files get chopped off at 1,001 URLs even though there are thousands more.

    I tried to find a setting which may be causing this but could not.

    Cheers,
    Mike

  • Predrag Dubajic
    • Support

    Hi Michael,

    To avoid any performance issues Smartcrawl uses only 1000 posts, to increase this you can add this to your wp-config.php file and adjust the value to your liking:
    define( 'WDS_SITEMAP_POST_LIMIT', 1000 );
    You should note that this can cause performance issues and it also depends on your site timeout limit how much it can handle.
    If you're going for a really high number you may want to include this in wp-config.php as it can improve performance by not generating sitemap in admin section:
    define( 'WDS_SITEMAP_SKIP_ADMIN_UPDATE', true );

    Let us know how it goes :slight_smile:

    Best regards,
    Predrag

  • Michael
    • The Crimson Coder

    Hello Predrag,

    Thank you for letting me know.

    Is there any particular location within wp-config.php where I should specify, or doesn't matter?

    Would you be able to tell me how many minutes it would take to generate a sitemap of 200,000 posts? That would be the critical answer.

    Is Smartcrawl having the capability to pause processing after so many minutes and then pick up again later like a trigger/execution toggle?

    What does this mean exactly:
    'not generating sitemap in admin section'?

    All posts on the site are created by a cron by running WP All Import using a .txt file so not sure which user that gets assigned against if applicable but I think it may be admin, but for sure these are the most vital posts.

    I don't let people sign on and leave posts or comments .. the site is purely from imported data and that is where I must have the sitemap.xml generated from.

    Thank you,
    Mike

    Kind regards,
    Mike

  • Adam Czajczyk
    • Support Gorilla

    Hello Michael!

    Is there any particular location within wp-config.php where I should specify, or doesn't matter?

    No, there's no any particular location as long as you'll put it somewhere after the opening

    <?php

    tag and before the

    /* That's all, stop editing...

    line. I'd suggest adding it right above that line :slight_smile:

    Would you be able to tell me how many minutes it would take to generate a sitemap of 200,000 posts? That would be the critical answer.

    That's a lot of posts. I realize how important this information is to you but at the same time it's very difficult to estimate. The time that's necessary to create a sitemap depends on many various factors including (among others) the current performance of your WordPress install, an amount of memory allowed for WP and PHP, server performance etc. As a result this time may be very different on different setups and I'd say that the best (if not only) way to find out would be to give it a spin.

    Is Smartcrawl having the capability to pause processing after so many minutes and then pick up again later like a trigger/execution toggle?

    No, there's no such setting currently.

    What does this mean exactly:
    'not generating sitemap in admin section'?

    SmartCrawl updates sitemaps when it's loaded and this disables that feature for admin dashboard as on some sites (especially those containing a huge number of posts) this could slow down the site. You will then be able to re-generate sitemap's manually using the Dashboard widget.

    Best regards,
    Adam

  • Michael
    • The Crimson Coder

    Thanks for all the great info Adam.

    Sorry I am still not getting the section regarding:

    "If you're going for a really high number you may want to include this in wp-config.php as it can improve performance by not generating sitemap in admin section:
    define( 'WDS_SITEMAP_SKIP_ADMIN_UPDATE', true );"

    Do you think I still need it for 200,000 URLs?

    Does it still generate the sitemap to [domain]/sitemap.xml?

    I guess I am still a bit unclear what it is skipping.

    Kind regards,
    Mike

  • Adam Czajczyk
    • Support Gorilla

    Hello Michael!

    Do you think I still need it for 200,000 URLs?

    It's difficult to say. The simplest way to check it would be to skip adding this setting and see if it affects either site or a plugin performance. If it doesn't then it won't be necessary. If you feel however that all gets significantly slower or sitemaps are broken, it's worth giving a try.

    Does it still generate the sitemap to [domain]/sitemap.xml?

    SmartCrawl will still generate sitemaps in the same form and location.

    P.S. Also, is it possible to choose the time when Smartcrawl actually runs on the server? P.P.S. Can Smartcrawl be set to run weekly as well?

    Currently SmartCrawl doesn't work "periodically" so it's not possible to set it this way. Rather than this it's generating/updating sitemaps "on the fly".

    Best regards,
    Adam

  • Michael
    • The Crimson Coder

    Thanks very much Adam, I made the changes but I can't really tell if Smartcrawl has run or not as yet, is there any way to know when the next run will be? I know you said it's not periodic, which I assume means at the exact same time each day ... but does it run once per 24 hour period at least? Kind regards,

  • Adam Czajczyk
    • Support Gorilla

    Hello Michael!

    I know you said it's not periodic, which I assume means at the exact same time each day ... but does it run once per 24 hour period at least?

    It's a bit different than this :slight_smile: The initial sitemap is created after you install a plugin and set Sitemap options. It then gets updated every time you publish/delete post/page. The plugin keeps the sitempa file up to date this way "in real time".

    Best regards,
    Adam

  • Michael
    • The Crimson Coder

    Hi Adam,

    Ah I see. That makes sense. I wish I realised about the 1000 limitation originally.

    Would it mean that the next time a post is applied, then because I've changed the limit from 1000 to 250000 then all URLs will be generated the next time that a single post is updated?

    Kind regards,
    Mike

  • Michael
    • The Crimson Coder

    Thanks Milan,

    But in the case where the limitation was in place for 1000 links, will it realise about all the missing pages now that the limit has been increased and put those on the sitemap next time it runs?

    (That is critical)

    I am a bit confused.

    Regards

  • Milan
    • WordPress Wizard

    Hello again Michael ,

    As far as I know plugin should realise about all the missing page's links. But I think we should also get confirmation from developer. So I have pinged him for this. He is not online currently. But as soon as he update me about this, I will update you. Meantime we appreciate your patience.

    Cheers,
    Milan

  • Milan
    • WordPress Wizard

    Hello Mike,

    Hope you are well today :slight_smile:

    Our developer just pinged me and informed me about how plugin works.

    Actually plugin will always regenerate sitemaps from scratch. Since they're regenerated from scratch, the previously omitted posts/pages will definitely be included the next time the sitemap is generated. So sorry for guessing that it will check against existing generated maps. :slight_smile:

    Hope this clarify things for you :slight_smile:

    Cheers,
    Milan

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.