I know there are several robots.txt plugins out there that you could add to your plugins collection within your MU intsall and they would work very well. They allow each individual blog to customize their own robots.txt file individually. But in my opinion, if you are managing a mu install there are several pitfalls to this method.
1. You have to activate the plugin
Let’s say you manage all of the blogs on your MU. Yes, I know you could use plugin commander to auto activate, or activate all and activate none, but I still find this somewhat of a manual job and anything that I can do to automate my job makes my life that much easier.
2. You have to modify the plugin to customize robots.txt file
If you want to customize the robots.txt file, then you either have to go in to each individual blog or modify the plugin so that when it is activated, the customization is already there. Both methods are not preferable.
So why would you want to create a global robots.txt file for you mu install?
1. Same directory structure for every blog
Because you are running multiple blogs off the same install, the directory structure for every blog will be exactly the same.
2. Many of your users might not have a clue as to what a robots.txt file is
Again if you are managing the MU install and want to make sure you have the best platform/solution for your users, why not help them out by already having a robots.txt file created when their blog is created.
3. Better protect yourself
I see a lot of wp-login.php pages appearing the search engines. Not that this is extremely bad, but as the main admin, you can feel more secure knowing that every blog is blocking the spiders from crawling pages that you don’t want crawled.
4. The main site’s seo effects everyone’s seo
Whether you are using subdomains, or subfolders for your domain structure, the seo authority of several of your main blogs will effect every other blogs seo authority. Look at wordpress.com. Now you can sign up for a wordpress.com blog and instantly have a better seo start than someone that starts up on their own. This is because they have a domain name jason.wordpress.com which just being associated with the wp.com domain already have some authority.
So how can you do this?
First create a file called global.php. This is going to be placed in your mu-plugins directory and we are going to add all of our global settings to this file.
Next we actually want to create our robots.txt function.
$blog = $wpdb->blogid;
echo “Disallow: /wp-admin\n”;
echo “Disallow: /wp-includes\n”;
echo “Disallow: /wp-login.php\n”;
echo “Disallow: /wp-content/plugins\n”;
echo “Disallow: /wp-content/cache\n”;
echo “Disallow: /wp-content/themes\n”;
echo “Disallow: /trackback\n”;
echo “Disallow: /comments\n”;
echo “Disallow: */trackback\n”;
echo “Disallow: */comments\n”;
echo “Disallow: /*?*\n”;
echo “Disallow: /*?\n”;
echo “Allow: /wp-content/blogs.dir/” . $blog . “/files/*\n\n”;
echo “Sitemap: ” . get_bloginfo(‘url’) . “/sitemap.xml”;
We first start by grabbing the blog id to ensure that we direct the spiders to the correct upload path for the blogs images.
And finally we need to grab the blogs url to correctly direct the spiders to the correct sitemap location.
Now to tie everything together we are going to plugin in to the wordpress do_robots action.
Now anytime someone visits any one of your blogs robots file, you will have it already set up and blocking them from spidering any content you do not want them to crawl.
Here are some live examples:
Some information on robot files:
What say you?