If you ask any SEO their worst nightmare is, most will say ‘the entire site getting blocked in robots.txt’. While this is a very real possibility, the chances of this happening are pretty low, most developers know the implications of such changes and many that don’t simply wouldn’t have a reason to do it.
This doesn’t mean however that because you’re not blocking the entire site in robots.txt file, it can’t do you any harm. Certain rules within these files that may have been set up with good intentions could inadvertently stymie your visibility for certain keywords.
A great example of this is currently potentially hampering Vapouriz, a site selling vaping products (you probably guessed that from the name). Let’s take a look at their robots.txt file (or at least the offending section of it);
All looks fine, better go back to learning Python. Not so fast. URLs with the ‘?’ are parameters i.e. facets and filters that usually dynamically change the URLs and sometimes the content on the page based on things like colour, size, material, and, in the world of vaping, all the stuff above.
Standard procedure is usually to block these filters, because they can be crawled and mixed and matched with each other (i.e. you could have a page being crawled that’s filtered by case material, nicotine strength, flavor, battery size, wire gauge and case material) creating an almost infinite combination of URLs. Here’s one I created just be clicking around;
What’s the problem with that? Well, Google could spend way to much time crawling these useless URLs instead of the URLs that you really want it to.
On the other hand, SOME of these filters might contain content that you actually want to be indexed and rank, because it has the potential to get traffic. Filtered/parameterised URLs can indeed rank well if they’re handled in the right way. Filters are common for e-commerce sites, and fashion sites often make liberal use of filters. Let’s take a look at the search results for [black jeans] (22,200 ASV).
Both River Island and H&M have parameterised URLs on page #1, so it can absolutely be done. Next have the best solution because the H1 changes based on the filter used, see below;
So we’ve established that parameterised URLs can rank (and rank well), but what does this mean for Vapouriz? Let’s have a look at one of their landing pages. This is the E-Liquids page, which has plenty of potential to rank for generic ‘top level’ keywords, but we’re more interested in the long tail for the purposes of this investigation.
I’ve highlighted the filtering options on the left and opened the ‘flavour profile’ one. Look at all those delicious flavour options… Mmmmm confectionary. There’s also some delicious search volume to capitalise upon here. For example, the ‘tobacco’ filter URL could be used to target [tobacco e liquid] (480 ASV) and other related keywords.
Similarly, the ‘nicotine strength’ filtered URLs could be used to target these kinds of terms;
So why don’t Vapouriz just remove all the parameter strings from the robots.txt file and allow these URLs to be indexed and potentially rank? Well, when you start to combine filters, you can get many different variations of URLs that Google could spend a lot of time crawling (as mentioned above). URLs like this can be created;
You don’t really want potentially endless amounts of these URLs being indexed, so the best thing to do would be to disallow the ampersand symbol in the robots.txt file. This would mean that only one filter could ever be crawled.
This means that any URL containing an ampersand would not be crawled (I think). Vapouriz would get the benefit of new URLs to target long-tail queries e.g. [tobacco e liquid], but avoid the disadvantage of spending crawl budget on useless URLs.
ALTERNATIVELY, Vapouriz could set up static URLs to target these longer tail keywords, for example;
This is definitely worth doing if there is significant-enough search volume to target. In theory, static URLs would probably have a slightly better chance of ranking as all the meta data and content could be specifically tailored to target the desired keyword. It would also mean internal links could be implemented with keyword-rich anchor text.
You’re welcome Vapouriz.