If you have videos hosted over on Vimeo and have tried adding these to your video sitemap and submitting that sitemap in Google Webmaster Tools you may have ran into a problem where the video URL you are trying to add is blocked by robots.txt. Fortunately, this is easily resolved.
Error Messages
You will usually get an error message along the lines of the following:
Url Blocked by Robots.txt
sitemap contains URLs which are blocked by robots.txt
A Quick & Easy Fix
The fix for this is quick and simple (for a change) and you just need to update the paths you use to point to the video file.
The standard link you will have will be something like:
http://player.vimeo.com/video/12345678
You want to change these so they look like this:
http://www.vimeo.com/moogaloop.swf?clip_id=1245678
So, you should end up with your video:player_location tag looking something like this:
<video:player_loc allow_embed="yes">http://www.vimeo.com/moogaloop.swf?clip_id=VIDEO_ID</video:player_loc>
To state the obvious, just in case, obviously, you should replace the VIDEO_ID value with the numeric ID of your video. Obvious right but just in case.
Why this happens
Ahh, the inquisitive type, congratulations, you are part of the 1% that reads past the quick fix and wants to know why you can’t add the Vimeo Videos to the video sitemap in the first place.
RTFEM – Read The F***ing Error Message
As is often the case, any kind of computing error message is supplied with some signalling that turns of the brains of around 90% of people and if we can dig past the harsh, cryptic exterior of the message and this obvious mental trickery we can get to the heart of the problem.
Url Blocked by Robots.txt
This is exactly what it says on the tin. The URL you are trying to submit is blocked by the Robots.txt file. The URL is not on your site so the URL on the player.vimeo.com domain must have a rule preventing access to search engine spiders.
Sitemap contains URLs which are blocked by robots.txt
This is more of the same and again points us to the robots.txt file on the player.vimeo.com site. So, if we take a look at the robots.txt on the domain for the original link:
http://player.vimeo.com/robots.txt
Then, you can see it has a brief and comprehensive go away to all and sundry:
User-agent: *
Disallow: /
If we look a the robots.txt file on the alternative address we can see there are no such restrictions:
http://vimeo.com/robots.txt
So, with our entry unbarred, our friendly neighbourhood web spiders from Google, Bing etc can crawl and index the URL happily.
That’s a wrap!
If you have got any video indexing questions then drop a comment below or give me a shout on Twitter and please remember to be sociable and share this on your favourite social network via the sharing icons below. 🙂
8 Responses
That was a great, concise and helpful description. It helped me keep my stuff clean after hours of pulling out my hair. Never realized it was the robots.txt from player.vimeo.com that was causing all the pain. Thanks for this useful article.
Hey Pankaj, glad it helped. Marcus
I’ve been stuck on this for two days! I saw that the robots.txt was disallowing everything. Just couldn’t figure out if/how I could work around it to include my vidoe in a sitemap. Your quick fix worked perfectly! Thanks for writing up this awesome solution and explanation.
Hey Matt, my pleasure!
Hey Marcus,
Good answer. I was wondering, since they will be doing away with the old embed code (which I think is where the http://www.vimeo.com/moogaloop.swf?clip_id=1245678 comes from), will this still work once they trash it? Or, are we screwed? Also, if I was to create a CNAME for a subdomain name, let’s say video.theurl.com that points to http://player.vimeo.com/external/ (which is the url I’m getting the error with) would that override the robots.txt file? Or would it still be the same problem, just a different path?
I just set up a subdomain using Vimeo Portfolio. I am not a computer whiz kid. I am getting a message from Google that there is a 100% error for Googlebot to crawl the site. Of course I want the spider to come to my new subdomain. How can I fix this?
Hey Emily. It should just be a case of using the correct link structure as detailed in the article. Happy to take a quick look though and feedback if you want to drop me an email via the contact form. Happy to help! 🙂
Hi Marcus,
Great article! Very clear and concise. I’m in the process of implementing video sitemaps for my site. I was wondering has https://vimeo.com/robots.txt changed since the writing of this article?
It appears that vimeo.com allows the Google Adwords bot to spider:
User-agent: Mediapartners-Google
Disallow:
But for everything else, including the Googlebot for search, it disallows to /moogaloop/ which is where the player location is stored:
User-agent: *
Disallow: */format:thumbnail
Disallow: /download/
Disallow: /*/download?
Disallow: /moogaloop/ <——— blocks it here
…
Am I missing something? If not, any ideas on how to get around this?
Also, what is your opinion or storing the video thumbnail on the server vs just pointing the sitemap to the thumbnail url stored on vimeo?
Thanks in advance for your help. Cheers.