Search engine friendly cloaking - removing session ids
Motivation - or: Why should you do search engine friendly cloaking and remove the sessionid?
It's not that hard to help the search engines indexing your website.Sebastian writes an nice article about search engine friendly cloaking.
I needed something that can handle url parameters directly when somebody hits the webserver. So the first idea was to use .htaccess and mod_rewrite to remove session ids out of urls.
Szenario
You might use session ids to track your user, maybe your cms uses this - ... . Nearly everywhere you find sessions and their ids. But normally - the result are ugly urls within the search engine result pages - which are ugly for your ranking, too - since the search engines now have instead of 1 url 100 urls - which share the position for the "only one".You can improve this when - everytime a search engine is visting your website with such a "broken" url - you redirect (301) them to the url without the session id.
Solution for removing session ids - search engine friendly
Hopefully you're running Apache - if not, hmmm. This solution will not work for you - but maybe you're familiar with your webserver and he's offering something similiar - than give me a note when you have something working.The .htaccess file
RewriteCond %{HTTP_USER_AGENT} "Google" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Slurp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSNBOT" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "teoma" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Scooter" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mercator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "FAST" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MantraAgent" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Lycos" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC]
RewriteCond %{QUERY_STRING} SESSIONID
RewriteRule ^(.*)$ $1? [L,R=301]
So what does this .htaccess do? It looks for search engine bots and spiders (see the names in the quote) and checks if the url, they are asking for have a parameter in the query string called "SESSIONID". If this is true, the query string is removed and given along with a 301 to the search engine.
If you read carefully, you've spotted a problem: the whole query string will be removed. It depends on your website - but normally, this might be problematic.
But - as Sebastian writes, you shouldn't give spiders any url with a sessionid it. So it's necessary to implement somethink as he describes in SmartIT consulting: Steering and Supporting Search Engine Crawling or just disable that php appends the sessionid to urls.
Discuss about the search engine friendly url cleaning
If you have some ideas, code, questions or problems - leave it at the enarion.net forum

