How URL Shorteners Work
Have you ever clicked a short link and wondered how it knows where to take you? Short URLs, like those from Bitly or TinyURL, seem simple, but they hide a lot of work. Let’s look at how URL shorteners work.
What URL Shorteners Do
A URL shortener does one main thing: It takes a long URL and makes it shorter. For example, a long Amazon link can turn into something much smaller. When you click the short link, it takes you to the same place as the long one.
Handling Many URLs
Imagine you are building your own URL shortener. You need to handle millions of URLs every day. Think about these numbers:
- 100 million new URLs each day
- Over 1,000 new short links every second
- Over 10,000 clicks per second
Storage
Over 10 years, you would need to keep track of 365 billion URLs. Just storing these URLs would take a lot of space.
How to Make Short URLs
How do you make these short URLs? Short URLs use both numbers and letters. There are 62 possible characters:
- Numbers 0-9
- Lowercase letters a-z
- Uppercase letters A-Z
With these characters, how short can we make the URLs? To figure this out, let’s work backward. We need enough combinations to cover all the URLs.
- One character gives 62 options.
- Two characters give about 3,000 options (62 squared).
- Three characters give about 238,000 options (62 cubed).
- Seven characters give over 3 trillion options (62 to the 7th power).
Seven characters are enough.
Two Ways to Create Short URLs
There are two main ways to make short URLs:
- Hashing the long URL
- Counting
Hashing
The first way is to use a hash function. A hash function turns the long URL into a string of random characters. But this string is usually too long. Even the shortest hash gives you 32 characters, and we only want seven. So, you could take the first seven characters.
The problem? What if two different URLs give you the same first seven characters? This is called a collision. If this happens, you have to try again until you find seven characters that are not being used. This means checking a database every time you create a short URL, which can take a lot of time.
Counting
The second way is simpler. Instead of hashing, you just count. Each time someone wants to shorten a URL, you give it the next number in order: URL number one, number two, number three, and so on. Then, you turn that number into what’s called base 62.
Here’s how base 62 works:
Let’s say you’re on URL number 11,157. To convert it to base 62:
- Divide 11,157 by 62. You get 179 with a remainder of 59.
- Divide 179 by 62. You get 2 with a remainder of 55.
- Divide 2 by 62. You get 0 with a remainder of 2.
Now, read the remainders backward: 2, 55, 59.
In base 62:
- 2 stays as 2
- 55 becomes t
- 59 becomes x
So, URL number 11,157 becomes 2tx. Your short URL is tinyurl.com/2tx.
Benefits of Counting:
- No collisions
- No database lookups
Downsides of Counting:
- Need a way to make unique numbers across many servers
- Security risk: Someone could guess the next short URL
But, for most cases, counting is the better way.
What Happens When You Click a Short URL?
Making the short URL is only half the problem. The other half is what happens when someone clicks it. When you click a short URL, the system needs to find the original URL and send you there. This happens a lot more often than making new short URLs, so it needs to be fast.
Here’s how it works:
1. Check the cache: The system first checks if it has seen this short URL before. If it has, it sends you to the original URL right away.
2. Check the database: If the short URL is not in the cache, the system looks it up in the database.
3. Cache the result: The system saves the result in the cache for next time.
4. Redirect: The system sends you to the original URL.
The redirect uses a 301 status code. This tells your browser that the URL has moved permanently. Your browser remembers this, so it might skip the URL shortener next time.
The Scale of Things
A single database can handle 10,000 lookups per second. To handle more, you need multiple database copies. Eventually, you need to split the data across multiple databases. This is called sharding.
Sharding involves:
- Distributing data evenly
- Routing requests to the right database
- Handling database failures
- Rebalancing when you add more servers
And that’s not all. In the real world, you also need to think about:
- Limiting how many links someone can shorten to stop spam
- Tracking how many people click each link
- Blocking