Technical SEO for Regular People: Part II: All About URLs
In this second article of our “Technical SEO for Regular People” series, we’re starting our deep dive into the "how" of technical SEO, beginning with URLs. Not sure what a URL is or why it matters? Or are you a seasoned vet who has an elevator pitch describing your favorite protocol? Either way, this article is for you.
Technical SEO for Regular People
- Part I: The What, Why, and How of SEO
- Part II: All about URLs
- Part III: Making Sense of the Markup (Markup, Metadata, and Structured Data)
- Part IV: Sending the Right Signals (Redirects and Status Codes)
- Part V: Mobile or Bust (Mobile-first indexing, Responsive Web Design, and Page Speed)
- Part VI: Security—It’s Not Just for Banks (HTTPS, SSL, and Malware)
URLs—What Are They and Why Do They Matter for SEO?
Now that we’re all on the same page about SEO—what it is and why it’s important for your organization—it’s time to dig into the finer details of technical SEO. We're going to start where all journeys on the web start—the URL.
What is a URL?
URL is an abbreviation for Uniform Resource Locator and is a structured, human-readable address that tells a computer where to find your website. Think of URLs as the street addresses of the internet, allowing websites to be found by people and search engines alike. It’s the thing you type into the address bar of your browser and the thing that Google organizes your entire site around—and it looks something like this.
Parts of a URL
URLs have four key parts: a protocol, a domain, a top-level domain, and a path:
protocol://domain-name.top-level-domain/path
- The protocol tells your computer how it should get the specified website—unsecured
http://
, securedhttps://
, etc. In our example above, the protocol is thehttps://
bit at the start. - The domain name—or hostname for you Google Analytics buffs—is the memorable—often branded—part of the URL. It’s the Coke in
https://www.coke.com
, the Toyota inhttps://www.toyota.com
, or, in the above example, the Google inhttps://www.google.com
. - The top-level domain (abbreviated TLD) is an organizational structure for a larger group of domains. There’s
.com
for commercial sites,.edu
for educational sites, etc. It has a similar function to zip codes—there are many 1400 Pennsylvania Avenues, but only one in the 20500 zip code. Same goes for domains—there can be manyGoogle
websites, but only one atgoogle.com
. Oftentimes, organizations and businesses will buy the other TLD corollaries to their primary domain to prevent competitors—and spammers—from stealing their branded traffic (e.g.google.co
,google.net
, etc.). - The path is the set of folders and subfolders (aka directories) on your site. It’s similar to the way your computer houses files within folders and subfolders, but on the internet. This is also the part where parameters for things like campaign URLs, session IDs, etc., would go.
URL Structure
Without well-structured, meaningful URLs, your site can’t be found—let alone indexed—by Google. Here are a few tips to ensure you’re getting the most out of your URLs:
- Keep them simple and relevant—avoid overly-dynamic URLs;
- Keep them user-readable—use lowercase with hyphens as separators, and avoid special characters;
- They should reflect your site’s information architecture;
- They can have target keywords but avoid stuffing.
And for our advanced readers, be sure to add this listicle on SEO best practices for structuring URLs to your reading list this week.
Canonical URLs
Google and other search engines see variances in URLs like case as separate copies of a specific page (e.g. https://www.Google.com
as one version, https://www.google.com
as another version, https://www.googlE.com
as yet another version, etc.). Search engines also treat URLs with different parameters as separate copies of the same page as well. To prevent duplicates of your page from showing up on Google—or even worse, Google penalizing you for duplicate content—it’s vital to indicate which of the copies is the “correct” or primary version by specifying a canonical URL.
Canonical URLs—specified using the rel=canonical
tag in the head of your web page’s code—are a way to tell Google and other search engines which copy of a specific page is the primary version. To ensure the correct version of your pages are being indexed by Google, follow these best practices:
- Add a canonical URL to every page, every time (except in non-production environments);
- Don’t rely solely on 301 redirects to prevent duplicate content;
- Specify a protocol (e.g.
https://
), as Google treats them as separate versions; - Canonical URLs should be subdomain-specific, as Google treats www/non-www as separate.
For ecommerce and other complex sites, canonicalization is even harder to do correctly than it is to pronounce. Need some help?
Sitemaps
A sitemap is literally a map of your website, with each asset—pages, documents, etc.—listed out in a search engine-friendly format. Sitemaps are key to helping search engines find all the content on your site. Without a sitemap, Google is likely to find only content linked from key pages, and only a few levels deep. Or, if your site is new or was recently redesigned, a sitemap submitted to Google will help jumpstart indexation. Sitemaps are fairly straightforward, but here are a few key points to keep in mind:
- Follow the XML sitemap format (and validate it!);
- Keep it to 50,000 URLs per sitemap;
- If you need more than one sitemap, include a
sitemap_index.xml
file containing links to all other sitemaps at the root directory of your site; - If you have multiple subdomains, you’ll need a separate for each.
Up Next: Making Sense of the Markup
In the next article of our “Technical SEO for Regular People” series, we’ll help you make sense of the markup behind your site—and how keeping it neat and tidy makes for better rankings.
To get the next article in our "Technical SEO for Regular People" series, Join the Pod below.