Many things to cover: mainly crawl errors, & buddypress problems

Please, if anyone can provide me with some steps to resolve these issues, I will be eternally grateful.

Crawl Errors: My recent visit to GWT produced 31 sitemap errors, 1,223 404 errors, and 11 unreachable errors. I discovered that a ton of non-existent pages are being crawled.....either they no longer exist OR the page url has changed because of changed permalink settings. UGHHHH!!!!! This is completely overwhelming and frustrating.

Buddypress: Apparently, Google is indexing ALL Buddypress Data: Activity, Groups, Members. I cannot tell you how many SPAM registrations I have had, yet those Spammers some how manage to get into my sitemap and get crawled. Also, I have no desire to have google or any other robot crawl my buddypress content. This content is restricted anyway. I only want google indexing my regular posts and pages. Buddypress really made a mess for me with regards to Google Indexing.

Duplicate Title Tags: They are everywhere because of buddypress and having changed permalink settings.

Sitemap: It appears that my sitemap is out-dated. Is there a way to refresh this? Is it normal that tags are pulled to the sitemap?

For now, I have deactivated buddypress. Almost immediately after doing so, I received an email that someone was trying to register on my site to make a book purchase. They cannot signup because the signup page is messed up :slight_frown:

HELLLLLP!

  • Philip John

    Hiya!

    Crawl Errors
    Okay, permalinks changes should just be handled and redirected appropriately by WordPress which suggests the actual content is gone. Are the URLs with errors of a similar nature? I.e. posts, pages or something else?

    BuddyPress
    That should be easy to fix. You could just add certain areas of your BP site to the robots.txt. For example, adding this line will stop member profiles getting crawled;
    Disallow: /members/*
    Similarly, you could add rules for other areas such as activity or groups if you wanted.

    Duplicate Titles
    That's not a huge issue to be honest, but what kind of duplicates are you seeing?

    Sitemap
    What are you using to create your sitemap? If WPMU DEV SEO the new version has a manual refresh option.

    Honestly, having a bunch of errors in GWT isn't the end of the world. Can users sign up and do what they want? If they can, that's the most important thing.

    Phil

  • tutuology

    Crawl Errors
    Permalinks changes should just be handled and redirected appropriately by WordPress which suggests the actual content is gone. Are the URLs with errors of a similar nature? I.e. posts, pages or something else?

    Yes, similar nature mostly. For example, old permalinks and new permalinks (to replace the old) are being crawled. The old one's are producing the crawl error. Also, old and new spam registrations are being crawled are producing a 404 when crawled.

    ok, this I just figured out and will perhaps give it a go, thinking about just using Communities & Forums instead:

    BuddyPress
    That should be easy to fix. You could just add certain areas of your BP site to the robots.txt. For example, adding this line will stop member profiles getting crawled;
    Disallow: /members/*

    Similarly, you could add rules for other areas such as activity or groups if you wanted.

    So, I am on it.

    Duplicate Titles
    That's not a huge issue to be honest, but what kind of duplicates are you seeing?

    For example:
    VONETTA | Activity | Tutuology – Business Development for Tutu Girly Girl Fairy Princess Couture
    /activity/p/4/

    /members-2/twinkils11/
    /members-2/twinkils11/activity/
    /members-2/twinkils11/activity/friends/
    /members-2/twinkils11/activity/groups/
    /members-2/twinkils11/activity/just-me/

    and,
    Tutu.ology |

    /classifieds/my-classifieds/activity/activity/profile/friends/
    /classifieds/my-classifieds/activity/activity/profile/groups/
    /classifieds/my-classifieds/activity/classifieds/friends/groups/
    /classifieds/my-classifieds/shops/

    and,
    Book Updates for hair bows, tutus & more!
    /2011/09/24/book-update-for-september-2011/
    /2011/09/book-update-for-september-2011/

    Sitemap
    What are you using to create your sitemap? If WPMU DEV SEO the new version has a manual refresh option.

    I am using WPMU DEV SEO Version 1.2.1...don't see a manual refresh option. But while we are visiting this, I see that I can exclude taxonomies from the sitemap. I previously did not understand this. I do now, so I will make use of this. It will definitely help.

    So, I agree Phil about GWT. I am working with an SEO freelancer (oDesk) who is insisting we clear some of these things up. I think I am in "crawl shock"...lol

  • tutuology

    phil- Is the asterisk (*) necessary, i.e /members/* ? Does that instruct robots to ignore all members I am assuming?

    Here is my file:
    User-agent: *
    Disallow: /activity/*
    Disallow: /members/*
    Disallow: /checkout/
    Disallow: /forum-2/*
    Disallow: /marketplace/*
    Disallow: /groups/*
    Disallow: /store/*
    Disallow: /prices-fees/
    Disallow: /tags/*
    Allow: /
    Disallow: /members-2/*
    Disallow: /wp-signup.php/*
    Disallow: /wp-signup.php?

    the 4th from the bottom shows Allow: /
    Is this correct? I believe this means to allow all of the directory to be crawled but the Disallow: / specifies robot shields. Am I on the right track?

  • tutuology

    My sitemap is not current. There are many non-existent urls being captured in my sitemap. Also, certain pages such as: checkout & cart are captured but I don't want them to be. My first concern is finding out how to update my sitemap, my second is figuring out how to skip those two mentioned pages from being captured. My WPMU DEV SEO plugin does not offer a "refresh sitemaps" option and I am using version 1.2.1.

    Please help.

  • Philip John

    VONETTA | Activity | Tutuology – Business Development for Tutu Girly Girl Fairy Princess Couture
    /activity/p/4/
    /members-2/twinkils11/
    /members-2/twinkils11/activity/
    /members-2/twinkils11/activity/friends/
    /members-2/twinkils11/activity/groups/
    /members-2/twinkils11/activity/just-me/

    Hmm that's a BuddyPress title-naming thing. I.e. it doesn't provide separate titles for each of those. It's not a major issue, as I said, but you might want to look for something that will allow you to change it.

    Tutu.ology |
    /classifieds/my-classifieds/activity/activity/profile/friends/
    /classifieds/my-classifieds/activity/activity/profile/groups/
    /classifieds/my-classifieds/activity/classifieds/friends/groups/
    /classifieds/my-classifieds/shops/

    That's definitely not ideal. Can you start a new thread about the Classifieds plugin for this and we'll look at getting that sorted?

    Book Updates for hair bows, tutus & more!
    /2011/09/24/book-update-for-september-2011/
    /2011/09/book-update-for-september-2011/

    That certainly shouldn't happen - what are your permalinks settings?

    phil- Is the asterisk (*) necessary, i.e /members/* ? Does that instruct robots to ignore all members I am assuming?

    No sorry, the asterisk isn't required. Blocking an entire directory is as simple as;
    Disallow: /members/

    the 4th from the bottom shows Allow: /
    Is this correct?

    Syntax wise, yes but it's unnecessary so you can remove it.

    My first concern is finding out how to update my sitemap

    Sorry, should have been clearer that the new version *will* have a manual refresh option when it's released (should be this week).

    my second is figuring out how to skip those two mentioned pages from being captured

    For the time being you could just add those into your robots.txt and Google will stop crawling them.

    Phil

  • tutuology

    Ok. I eliminated the asterisk from my robots.txt file a few days back, so I think we're good there. I see however that google is still crawling some of these in spite of the "disallow":slight_frown:

    Tutu.ology |
    /classifieds/my-classifieds/activity/activity/profile/friends/
    /classifieds/my-classifieds/activity/activity/profile/groups/
    /classifieds/my-classifieds/activity/classifieds/friends/groups/
    /classifieds/my-classifieds/shops/
    That's definitely not ideal. Can you start a new thread about the Classifieds plugin for this and we'll look at getting that sorted?

    yes, I will start a new thread for this

    Book Updates for hair bows, tutus & more!
    /2011/09/24/book-update-for-september-2011/
    /2011/09/book-update-for-september-2011/
    That certainly shouldn't happen - what are your permalinks settings?

    month and name, see screenshot

  • Philip John

    You will need to allow Google time to update it's copy of your robots.txt. It may still keep checking those pages for a while until it realises.

    As for those permalinks, it does seem as though there is an issue with your install there. When I use month and name permalinks in my test install, the equivalent year/month/day permalink is redirected.

    Can you check that your .htaccess is using the standard WordPress rules please?

    Phil

  • tutuology

    ugh, not sure why this is happening here but my pages are all 404ing :slight_frown:.
    My .htaccess file:

    # BEGIN WordPress
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]

    # uploaded files
    RewriteRule ^([_0-9a-zA-Z-]+/)?files/(.+) wp-includes/ms-files.php?file=$2 [L]

    # add a trailing slash to /wp-admin
    RewriteRule ^([_0-9a-zA-Z-]+/)?wp-admin$ $1wp-admin/ [R=301,L]

    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteRule ^ - [L]
    RewriteRule ^([_0-9a-zA-Z-]+/)?(wp-(content|admin|includes).*) $2 [L]
    RewriteRule ^([_0-9a-zA-Z-]+/)?(.*\.php)$ $2 [L]
    RewriteRule . index.php [L]
    # END Wordpress

    If you'd like me to send over a private email with my wp-config.php file, lemme know.

  • tutuology

    Phillabuster, where r u??????

    So, here is what I've done: stripped my .htaccess file. It's bare-bones. The only way my pages are loading to the actual content is by using ugly default permalink.

    Please please please, someone help me fix this :slight_smile: My original goal here was to fix the scores of issues I am having with google crawl stats, now I am dealing with a crawl error/permalink/.htaccess conundrum. I need a bottle of wine!

  • Philip John

    Okay, so do you have a .htaccess at all now, or is it blank?

    I'd suggest removing it (I hope you have backups!) and then turning pretty permalinks on again. WP should either just do it, or prompt you to create a .htaccess.

    Do you have any caching plugins installed? If so, make sure you flush the cache.

    If you still get 404s, reset to default permalinks again, delete your htaccess, de-activate all plugins and then try pretty permalinks again.

    Something is fundamentally messing with permalinks - you just need to find out what...

    Phil

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.