About Barry Adams
Barry Adams has been building and ranking websites since 1998. Over the years he’s honed his skills in a wide range of businesses, from small agencies to Fortune 500 multinationals. In 2014 Barry founded specialised consultancy Polemic Digital, delivering specialised SEO services to clients such as the UK’s most-read newspaper The Sun and America’s most popular news channel Fox News. In 2016 his work was recognised with two UK Search Awards, including Best Small SEO Agency.
In addition to his consulting work, Barry lectures on SEO and digital marketing for Ulster University, Queen’s University Belfast, and the Digital Marketing Institute. He travels the world to speak at digital conferences in London, Amsterdam, Dublin, Paris, Milan, New York, and Las Vegas. Barry also serves as Co-Chief Editor at award-winning European digital marketing blog State of Digital, and is on the judging panel for the annual European and US Search Awards.
In his Learn Inbound talk, Barry shows five real-life examples (or six, or maybe seven, who knows) of websites that have totally fucked up their technical SEO without even realising it. From redirect cockups to URL shenanigans, from crawl traps to server configurations, Barry is going to show some of the worst examples of technical SEO tomfoolery he’s come across.
- A successful website is composed of good aesthetics (web design), usability (UX), and performance (marketing)
- Technical SEO needs web developer input and should be built-in from the start
- One piece of content should have one URL, and duplications should be fixed with 301-redirects or ‘rel=canonical’ tags
- Respect Google’s crawl budget: optimise your load speed with time to first byte, total page weight and optimal transfer
Yeah, finally, I made it. It took me, what, seven or eight times. I finally got onstage so here we go This might be a little bit sweary, I apologize.
No, I don't apologize actually. Fuck that. I've been doing SEO in one way, shape, or form since 1998 and as a result, I've lost quite a lot of my sanity and I have no social filter left. So this is basically just me ranting for about half an hour, or until they drag me off-stage, about all the fuck-ups I've seen over the years with technical SEO things go wrong on websites.
I have massive beef with that because it's just not good enough anymore. First, I'll talk a little bit about why things like technical SEO go wrong so often, because there are some really powerful underlying causes there, primarily because web developers don't see it as their remit. A good, successful website, in my view, has three main components.
First of all, it has to look good. It has to be a nice, pretty, aesthetically-pleasing website because that sends the right trust signals to your audience, it makes sure that your audience believes you are a trustworthy supplier or whatever it is that you're selling. Of course, it also needs to be useable. Being pretty alone is not enough. It needs to be a functional website. People need to be able to click the right buttons, they need to be able to navigate it effectively, and find what they're looking for in a very short amount of time.
And lastly, it needs to perform and by that I mean it needs to be able to get traffic because, "Build it and they will come" is simply not true online. You need to earn that traffic. You need to get traffic to that, which is exactly why we're here. Technical SEO is part of that last aspect. It is a performance marketing-based element and most web developers, if you've got a good one, you'll have one who looks at both the aesthetics and the usability of it, although most of them won't even do the second step.
They'll just look at the web design of it and make sure that it's built exactly as the web designer wants it to look. Digital marketing is seen as an afterthought or outside of the remit of what a web developer is supposed to do. And because technical SEO is part of that digital marketing remit, most web developers pay fuck-all attention to it. And that's where people like I come in.
I come in, I do audits on websites and then I have to tell web developers that they fucked up. And not just in little ways, in quite massive ways. And sometimes, in little details that they really genuinely think aren't much of an issue but they really are. It's little oversights that causes massive, massive problems.
And I'm gonna show you five examples of things I've seen in the wild. Now, I won't name any names, I won't name any websites, but these are all examples of things I've seen in the wild that have gone horrendously wrong and have caused all kinds of problems for these websites when they try to rank in Google search results. The first one is one of those ones that you might not necessarily think is a big issue because it's not something that's on people's radar an awful lot.
This is about duplicate URLs. You take a website with a normal address and a normal page and you get something like that, www.website.com/page. Looks perfectly normal. However, this website might also accept that same URL with a slash at the end.
So you've got two versions there already, both of them serve the exact same content with and without a trailing slash. And then that same website might also be configured to accept versions without the www subdomain. So that's version number three. But, of course, with and without trailing slash so that gives you four versions right there. And now with all this kerfuffle about adopting HTTPS, adopting encryption, we have the version with HTTPS there with and without a trailing slash, and of course with and without the subdomain.
One page, eight fucking versions. This happens, people, way too often which people don't even realize it's a problem. Most people don't realize that this can be any sort of issue. Search engines will treat a URL as different if it's even one character different.
So with and without a trailing slash, technically two pages, www subdomain or without it, two pages. HTTP, HTTPS, two pages, although Google generally is fairly good at solving that one and they prefer the HTTPS version. But this is all extra work for search engines. This is extra work to solve the problem, to untangle the knot, and then I need to decide which one of those versions they're actually going to rank.
Now, you'd see a lot of websites being very inconsistent as well with which version they use when they link internally to different pages, which one they put into the canonical tag, etc. etc. etc. And then, of course, there's another problem that you sometimes get in that you can add index of PHP or default of ASPX to the end of the URL to get the same page. Default with ASPX especially is a real pain in the ass because it means you run on a Microsoft IAS website.
Who here runs their website on Microsoft IAS or .net? Come on, don't be shy. Because I really pity you. As a general rule, when I quote for an SEO audit on a website, I do a quick check to see what it runs on, and if it's a .net or IAS, I double the quote. I kid you not, I double the quote because it means it'll be a clusterfuck from the start and it just takes a lot more effort to untangle that particular knot.
So how do you solve this? Well, as a general rule, one piece of content, one page, should have one URL, one canonical URL, as we call that. You should see one redirect for things like the trailing slashes, the www subdomains, HTTP and HTTPS so that whenever a user or a Google crawler tries to access the non-canonical version, you just 301 redirect into the canonical version.
And then, of course, have rel=coninical tag in your metatags to make sure that any other inadvertent duplication is caught and doesn't cause any potential problems. You've gotta be careful with the canonical tag too, by the way, because what I sometimes see go wrong is that they put the wrong URL in the canonical tag. For example, all the URLs default to with the trailing slash but then the canonical tag has the one in it without the trailing slash.
It's those little things that lazy web developers tend to miss that I'm like "Come on, people. This isn't rocket science." There's a tool called DeepCrawl, which I'm a big fan of, which you can use to find these sorts of problems. This one is the dashboard of the DeepCrawl account of a website that lots of problems, URL duplication being only one of them.
A good website report in DeepCrawl, by the way, is mostly green, a nice little green circle. You can barely even see the fucking green part in this circle. 288,000 duplicate pages, 289,000 non-200 pages. Those non-200 ones, by the way, we'll come onto that later.
But the duplicate pages, that's the URL duplication I was talking about. Same content, different URLs. You don't want to see that. That's terrible. So this website required quite a lot of untying of knots. The second problem I come across is one of my favorite ones because most people, again, don't realize it's actually a problem.
Take, for example, a website that sells shoes. This is quite a useable website, you can select what kind of brand of shoes you want, you can select what kind of type of shoe you want, you can select by sizes, etc. etc. That one on the left-hand side goes down a lot more. There's a lot more options but for the purpose of this example, I'll stick to these because it's bad enough with these examples.
So we have 12 types of shoes, we have 30 different brands, and 20 different sizes. That gives us 7,200 different filter options. Each of those is a separate page that Google can crawl and index. That's just one category, 7,200 pages. But that's just the start of it because the category itself has 108 pages of products.
That's a lot of shoes. So we end up with 720,000 potential pages, more or less, for Google to crawl. That gets worse, doesn't it? Oh no, we're not done yet. You can also sort it in all kinds of different ways. By title, by price, the most recent new additions, etc. etc. You've got 11 different ways to sort those 100 pages of products, in addition to all the filters we can apply to them.
So I end up with 7.9 million pages for Google to crawl in one single category on a website that sells shoes. Do you really think Google's going to do that? I wouldn't. I've got better things to do with my time. Google has what it calls "crawl budgets" set aside for every different website, which is a time-based metric and we'll talk about that a little bit more later.
Crawl budget basically, in brief, is Google will spend half an hour, 40 minutes, whatever it is, trying to crawl your website and then it just says "Fuck it" and goes and does something else. Whether or not it has crawled every page on your website is entirely irrelevant. It just does its best in the allotted time and then goes on and does something else. So you can imagine, in 8 million pages in just one category, Google is not going to have enough time to crawl all of those.
So the problem there is when you make updates to your product pages, which are beneath the category pages, Google can't see that in any reasonable amount of time. It might take weeks or months for search engines to crawl those updated product pages, see your updated content, and be able to translate that into improved rankings.
So you need to make sure Google doesn't waste time crawling those pages. As a general rule, I'm a fan of using what I call the "hard approach" to these sorts of crawl traps by just blocking in your robots.txt file. Identify the patterns in the URLs that make up those little filters and just make sure Google never even sees them in the first place so that the only thing Google sees is the standard category page that they would see if you would click on the navigation at the top and then follows the 108 pages underneath there.
And again, you want to use pagination metatags to tell Google that it is looking at a series of paginated pages. So again, it will treat it in the right way and give those URLs the proper priority when it crawls them. Again, you can use DeepCrawl to identify whether or not you actually have this problem.
This is a graph, again, you don't want in the DeepCrawl report because that's an exponential growth of pages. Now, I'm lucky enough, I don't have a paid DeepCrawl account. They like me a lot so they give me a free account. I had to stop this crawl at 12 million URLs because it was just killing their servers.
That graph just kept on going and going and going six levels deep and with five and a half million pages already. That's not good, that's not good. That's not the sort of graph you want to see. Most of your pages, by the way, should be two or three levels deep anyway so this has, again, a lot more problems than just faster the navigation issues. Make sure that you only allow Google to crawl those pages that you actually want it to do something with.
All those different filter options, all those different sorting options, there's no need for search engines to see those. They're great for users, they're great for usability, but search engines have absolutely fuck-all use for them so make sure they don't waste their time crawling them. Send those search engine crawlers directly to the right pages. Next one is one I came across recently which really cracked me up.
This is when internal redirects go wrong. Everybody knows the standard redirects. You have an old page, you have a 301 redirect to the new page. There's roughly two kinds of redirects. There's a 301, there's 302. 301 indicates a permanent redirect. Basically, you're telling Google "This old page is gone. I want you to permanently replace it with this page in your index."
Your 302 redirect is a temporary redirect where you tell Google "Yes, I'm sending you to this page but I want you to keep the old page in the index." Most of the time, you'll want to use the 301 permanent redirect but there are some use cases for using the 302 temporary redirect and I'll show you one of those later on. So this is a fairly okay redirect.
Old page redirects to the new page. Ideally, when you change pages on your own website, you also want to make sure you update all your navigation links and all your internal links so that you don't have any internal redirects, that only links from external websites end up in redirects, because every redirect loses a little bit of link value. It's roughly 15%.
It's the page when dampening effect. Long story, won't go into that, but as a general rule, you want to minimize the amount of redirects that both users and search engines have to hop through before they end up on your final destination page. This one's okay. This one's not so okay. It's two hops right there. You go from old page to new page but then the new page also has an HTTPS redirect. So that's an extra hop.
That's not good because then you lose double the link value that you would have with a single redirect. And then it can get worse. You can have a 302 redirect from the old page to the HTTPS version of the old page, then have a 301 redirect to the new page and then another redirect to the HTTPS version of the new page. You think "Really? Does that happen?" Yes, it happens just by lazy web developers putting lazy redirect rules into the web server configuration.
And then I'm going to show you one which I've actually seen recently that made me want to cry. I think I might actually have shed a few tears when I saw this one. And I had to present this to the client as part of the audit and I put it up on a slide and it just stared at them with this look of like... And they realized it was bad. They didn't know what the problems with redirects but they realized it was bad.
It was "Yeah, that's not good." Search engines don't bother crawling these. After about three or four hops, a search engine is going to be like "You know what? I don't have a good feeling about this. I'm going to stop this. I might be in a redirect crawl, a redirect loop." So about three or four hops, at max, that's when a search engine says "Yeah, fuck it. I'm going to go." They never reach the final destination page so that page might as well not exist.
You want to make sure to minimize your redirects. Every time you update your website, every time you do a new design, you change pages, etc. etc., what you actually also want to do is go back to those historic versions and the historic redirects that you might have had from old versions of your website. Yes, I know it's a pain in the ass because you're website might have been run for 15 years or longer but those old redirects still have some value to them.
Revisit them once in a while, make sure that they point to the newest version of that content in one hop. Don't make them go to multiple hops because you lose link value, you lose ranking power. Again, DeepCrawl can help you with that. That's the bottom half of that particular nice little pie chart. 209,000 non-200 redirects. These are even worse, these are 302 redirects that it found.
Nearly 300,000 on a single website. That's not good. So if you see a graph like that when you do a crawl on your website, you know you've got serious issues. When I do an audit on your website and I show up with graphs like that, you probably want to start hiding because it's not going to end in a good way.
Next up, load speed. This is one of my favorite ones as well. All of them are favorite ones but this one especially is close to my heart because I have constant discussions with my clients about optimizing for load speed. We talked about crawl budget before. Crawl budget is a time-based metric. Google spends a certain amount of time crawling your website before it goes on and does something else.
If you have a very fast-loading website, it means that crawl budget can be used more effectively. Search engines will spend less time crawling each page so they can crawl more pages in the same amount of time. There's a tool that I like to use to analyze websites for load speed which is Webpage Test.
I like it because it gives you graphs like this. Nice little waterfall overview of exactly all the different elements that go onto a page and how fast they all load. This is a webpage that has a total load time of 4.2 seconds which is not perfect but it's quite good, 2.8 seconds until the document object model loads, which is basically the HTML outline of a document which is what search engines look at when they try to understand what a web page is actually about.
So four seconds, this is pretty much done. Not too shabby. This is a client of mine and I have constant discussions with this client about optimizing for load speed and I get incredible amount of pushback from them and I haven't even hidden who they are because they keep pissing me off about this.
It's the Belfast Telegraph. Yes, ladies and gentlemen. And this is a constant debate because most of the load speed is advertising slots. For them, it's commercial suicide to reduce the amount of ads on a page and for me, it's common fucking sense. You're killing your website by throwing so many ads on it. You're turning your audience away. They're like "No, we can't reduce the ad slots because it means we get less income."
Yes, fine, on the short-term, you take a hit. On the long term, your audience will come back and read more of your content so you will increase your advertising revenue by decreasing your ad slots and optimizing your load speed. Talking in circles. That's what happens when you let commercial people make decisions about your website.
Don't make that mistake. Fast loading website, it's not even optional anymore. Google is now more or less forcing you to either go really, really fast or to adopt accelerated mobile pages which is Google's own new standard for mobile optimized content and you don't want to go down the ad group because you can show a lot less ads on an amp page than you can on a normal web page.
And it's a lot of work, as well, to implement amp on your website. So you've got to optimize your load speed. Roughly speaking, couple of major elements to optimizing your load speed. Time to first byte, that is the amount of time that elapses between Google or a browser making a request and getting any sort of response back from your web server. That is very important because it there's a slow time to first byte, Google will decrease the rate at which it crawls your website.
Again, we're talking about crawl budget here. If there's a slow time to first byte, Google thinks your website might be struggling to fulfill all the requests that's being made of it and it doesn't want to crash your website so it scales down the crawl rate. If you have a very fast time to first byte, Google knows that your website is functional at an optimal level and will therefore crawl your website at a much faster rate because it knows it's not overloading your website.
So that time to first byte is quite important and you can see on the graphs, by the way, this one, time to first byte is about 294 milliseconds, which sounds okay. It's actually a little bit on the slow side. You want to aim for about 100 milliseconds or fewer. This one, I can't even read it on here, 675 milliseconds. So you can imagine Google thinks that website's just about to crash.
Next, of course, the total page rates. Make sure there's as little data as possible that has to be sent over the line so you have to minify your code. Make sure your HTML code isn't massively bloated. Again, this is why I have a problem with ASP.net websites, because it tend to throw view state code in the actual source code. I don't know if you've actually been on a Microsoft website and you look at the source code.
There's this huge block of total gibberish code in there. It's called the "view state." It's Microsoft's way of storing session information about that particular visit. The largest one I've ever seen is 400 kilobytes, 400 kilobytes of view state code in every single page on a website, that's 400 kilobytes of data that needs to be transmitted over the line every single page.
Please. Just makes me despair, it really does. Images as well. You all like these big banner images on websites but make sure to optimize for the screen that someone is using to load it. You don't want to have a three or four second wait before your background image loads. And that's in a good scenario.
There's no need for that. And, for fuck's sake, get rid of those carousels. Nobody looks at those carousels of images on your website. Nobody. I know it's a good way to keep the hippos pleased, to keep the managers happy. "Yes, my section, my part of the business is on the carousel." So they feel important. But for fuck's sake, nobody looks at them.
If you have social sharing functionality on your website, try not to use third party plugins. Anybody here use the Add This plugin? Add This is quite a popular plugin to put social sharing functionality on your website like "Tweet this link," "Share it on Facebook." Add This is spyware, basically.
They will track all your visitors and sell all the data to advertisers and other sorts of data mining companies so don't use Add This. It's terrible and it's very slow. It's tedious to load. It's just, yeah, it's not a good plugin. And enable compression. When data is compressed, it takes up a lot less space and therefore will be transmitted across the line a lot faster.
Also, if you can, try to use a content delivery network because it means the hops between the visitor and where your website is being stored or served from a lot fewer and that tends to help with load speed of course as well. Lastly, I want to talk about a fun one that I came across recently again.
It's the second time I've come across this so it's not that common but it is probably one you want to keep an eye on. Automatic country redirects. What do I mean by that? Well, you take a website, internationalsite.com, and someone visits it from Ireland and they get redirected with a 302 redirect to the the Irish, English language version, which is perfectly fine.
This is one of those cases where a 302 redirect is the right option because you want that main website to be the one that Google shows in search results. That's why you serve the 302 redirect, because only the Irish visitors will be sent on to the Irish version of your website and the U.S. visitors will also be 302 redirected to the U.S. version of the website.
So far, so good. Your website is automatically configured to do this based on the IP address of the visitor and IP addresses can be geospecified. Google knows exactly which IP address belongs to which country. Your website will know exactly which IP address belongs to which country, and therefore this is a fairly good way to make sure your visitors are looking at the right content immediately without having to click on a flag or a country selector.
The problem is that Google crawls from American IP addresses, mostly. So when Google comes and visits this website, it gets the same country redirect and looks at the US version. Unless you've implemented the country redirect in such a way that this happens when any page is requested on your website, which lazy web developers tend to do, you end up with the scenario that Google can only see your U.S. websites and it doesn't see any other country section of your website.
If it can't see it, it can't rank it. That's a problem. You can understand. Not too hard to get around. You can just make a very simple exception for Google bot user agents. Google has a huge list of different crawlers that it uses that all have their own user agent string. Just make sure there's an exception in there for Google bot user agents or, alternatively, your country redirect only happens once and then there's maybe a language selector in your top right-hand corner of whatever it ends up being, and then users and search engines as well can follow that and look at other pages without being subjected to that country redirect.
It's little things like this, little things like this that can have a huge impact on how your search engine is being crawled and therefore, how it's being ranked in search engines. It's not by far the only scary things that can go wrong. I've seen some things in my life you wouldn't believe, the things that can go wrong with websites.
404 Pages. Another one of my favorite problems, to put it mildly. When a web developer thinks it's being clever by any time a page is requested that doesn't exist, it redirects through the 302 status code as well to a nice, friendly 404 page. It's not really a 404 page because it gets the 200 ok status code so Google thinks it's just a normal page that needs to be indexed.
Fortunately, Google is fairly smart and will identify it as what they call a "soft 404." And you'll see that in Google search console. Structure data. Implemented the wrong way. Internal site search pages that are crawlable by search engines. Hreflang metatags that point to the wrong version, etc. etc. etc.
I could stand here literally all fucking day about all these problems but I think I've bored you enough with the things that I encounter in my daily life that sometimes make me lose the will to live. But on the other hand, it keeps me employed as well so I shouldn't be too bothered by it. But basically, to wrap it up, technical SEO is not something you sprinkle over a website afterwards.
It should be something that you build in on a foundational level from the start because if you don't, you get someone like me to come in afterwards and I'm gonna me shouting at your web developers until they cry like babies. I like that a lot. It gives me a lot of pleasure. But I imagine they won't enjoy it as much so make sure you never get into that situation. Thank you very much.