Smartcrawl sitemap creates multiple entries of same url

I have multiple pages where I have WPML and Smartcrawl Pro installed. For some pages Smartcrawl sitemap have multiple entries.

  • Adam Czajczyk
    • Support Gorilla

    Hi drzivil

    I hope you’re well today and thank you for reporting this.

    I checked your site and while the “same” URL showing up multiple times in sitemap would be expected if it would contain language indication (as there’s WPML and multiple language versions), I must admit this is quite surprising.

    Usually, we’re dealing with missing URLs rather than additional ones, if anything. I’ve checked site configuration and tried to find out why this is happening but I admit I didn’t manage to. Therefore, I’ve asked our developers for some help on this.

    I’ve passed all the information along with access credentials to them and they’ll check the site again and investigate the issue and I’m sure we’ll find the reason and solution.

    Please keep an eye on this ticket for further information.

    Best regards,
    Adam

  • Adam Czajczyk
    • Support Gorilla

    Hi drzivil

    I’m not sure about Yoast, to be honest, as I don’t really know how – technically speaking – it “crawls” the site but it might be related e.g. to caching. For sure though it wouldn’t be good to keep Yoast and SmartCrawl enabled on site at the same time :slight_smile:

    I just noticed, though, that you’re also trying “Simple XML Sitemap Generator” plugin on the staging (WPMU DEV hosted) site. What about this sitemap then? Is it adding URLs properly or there are also some URLs missing or multiplied?

    Best regards,
    Adam

  • drzivil
    • WPMU DEV Initiate

    Hi Adam,

    I don`t exactly know what caused the problem.

    All Sitemap Plugins i tried had different problems.

    In the end i customized “Simple XML Sitemap Generator” for my needs. (wpml + custom post types)

    Maybe i will polish up my version and share it with the community over the plugin repository if i have quiet minute. for now here is my code for everybody to use. :wink:

    function sg_create_sitemap() {
    	global $sitepress;
    
      $sitemap = '<?xml version="1.0" encoding="UTF-8"?>';
      $sitemap .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
    
    	$langs = array('de', 'tr', 'en');
    
    	foreach($langs as $lang){
    
    //changes to the default language
    	$sitepress->switch_lang( $lang );
    
    	  $postsForSitemap = get_posts(array(
    	    'numberposts' => -1,
    	    'orderby' => 'modified',
    	    'post_type'  => array('page','product'),
    	    'order'    => 'DESC',
    	    'suppress_filters' => 0
    	  ));
    
    	  foreach($postsForSitemap as $post) {
    	    setup_postdata($post);
    
    	    $postdate = explode(" ", $post->post_modified);
    	    $permalink = get_the_permalink($post->ID);
    
    		  // URLs die Produktkategorien enthalten werden mit dem jeweiligen kürzel erstellt von dem aus der hook aus wp kommt!!!
    	    $sitemap .= '<url>'.
    	      '<loc>'. $permalink .'</loc>'.
    	      '<lastmod>'. $postdate[0] .'</lastmod>'.
    	      '<changefreq>daily</changefreq>'.
    	      '<priority>0.8</priority>'.
    	    '</url>';
    	  }
    	}
    
      $sitemap .= '</urlset>';
    
      $fp = fopen(ABSPATH . "sitemap.xml", 'w');
      fwrite($fp, $sitemap);
      fclose($fp);
    }
    
    add_action("publish_page", "sg_create_sitemap");
    add_action("publish_product", "sg_create_sitemap");
  • Adam Czajczyk
    • Support Gorilla

    Hi drzivil

    Thanks for sharing the code!

    I understand that this, customized, code finally does what you need, right? I’d still like to find out what the initial issue was caused by though so as I already asked our developers for help, I think it would be good to hear their opinion on this.

    I did, however, update them letting them know that you’re using this customized code now so they wouldn’t remove it or change :slight_smile:

    Best regards,
    Adam

  • Panos
    • SLS

    Hi drzivil !

    I temporarily activates SmartCrawl and ran a quick scan on your site. For some reason 141 urls could not be crawled. In the past, similar cases were related to some invalid formatting. I can’t be sure yet so I have asked the devs to have a look at the crawl logs on your site, in case they contain any sort of information that could point to the root of this issue.

    Until we have news, I could run new scans with some plugins deactivated or might need to switch theme temporarily. Let me know if you agree to that :slight_smile:

    Kind regards!

  • Panos
    • SLS

    Hi again drzivil ,

    The dev pointed out the robts.txt which I totally missed. It contains :

    
    User-agent: *
    Disallow: /
    

    which doesn’t allow SmartCrawl’s crawler to crawl pages. You can try changing that from SmartCarwl’s option to:

    
    User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    

    as an example.

    Then run another crawl and see if those urls still couldn’t be crawled.

    Kind regards!

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.