How URL shortening scripts work?

I have been quite intrigued with the working of these URL shortner scripts and surprisingly most of them employ an ingenious solution to compress the URL to a shortened one.

http://example.com/fe45 ——-> https://corpocrat.com/blah/page.htm

The answer is base36 encoding. why base36? because it can contain 26 alphabets and 10 numbers in the output. This is surprisingly simple way of encoding a URL.

Base 36 is nothing but, you keep on dividing a number by 36, collect its reminders (or modulo) and map them to the corresponding table of alphabets and numbers. see base 36 table.

This is how most URL shortner scripts work…

1. First Insert a long URL into the database. Get the id of row which should be a primary key and unique. In most cases it can also be auto increment.

URLs are stored in database with unique id.


-----------------------------------------------
ID URL
-----------------------------------------------
10099 https://corpocrat.com/
14566 https://corpocrat.com/blah/page.htm

2. Now get the corresponding ID to that URL stored in database. Since the ID it can contain only numbers 0-9 (base 10), convert to base 36 using php function base_convert from 10 to 36

<?php
    $id = "10099"; 
    echo base_convert($id,10,36); 
?>

Output:

1099 ——-> uj
10099 ——> 7sj
100099 —–> 258j

As you see the output produced by base36, smaller numbers output fewer characters and for even millionth number, we generate just 4 characters in length in the form of mixed alphabets and numbers.

base361.PNG

3. Store that base36 output in the database, respective to that of ID in a separate field. so the trimmed version of URL becomes….

http://example.com/7sj

for the URL https://corpocrat.com with ID 10099 stored in the database.

The above http://example.com/7sj is a mod-write for the php page

http://example.com/7sj -----> http://example.com/short.php?baseid=7sj

which queries the database based for destination URL against base38 stored and then redirects.

The above is a very simple technique and besides this there are many more techniques for URL shortening, for which i recommend the below referenced resources.

Enjoy!

Useful References

* URL compression for huge set of URLs in GB sizes – http://anres.cpe.ku.ac.th….ncsec.pdf
* URL shortening hashes – http://www.codinghorror.com/blog/archives/000935.html

Note:
Base 64 can also give wide range of characters in url shortening.

Pass your 70-290 exam with testking guaranteed, plus also download practice questions for 70-291 as well as 70-431 exams.

Similar Posts:

Tags:

Balakrishnan Prabhu

Mr. Balakrishnan Prabhu is the founder of Corpocrat magazine. He is also the founder of Best Citizenships (BC), assisting wealthy individuals with with global citizenship and residency programs in Europe. His other interests are Linux, Machine learning, Wordpress, etc. You can contact him here

  • you can actually improve on this by using a URL safe base 64 conversion. this utilizes the uppercase Letters and
    +’ and ‘/’ characters of standard Base64 are respectively replaced by ‘-‘ and ‘_’. because of the extra 28 characters used. you can generate far more short URLS for the same number of digits in the ID. I actually have a very early version of this working on Google App Engine at minrlz.com.

  • Nice article. I am thinking about develop an url shortening page for my website and this article is very useful.

  • Nice article. Though you should also consider base64 method as described by Steve or checkout how wordpress generate such random passwords, same method can be used as generate url shorter.

    Check this article “How to generate random password like WordPress using PHP? ” http://tinyurl.com/kuupv5.

    -Abhinav

  • Thanks a lot!
    This will help me in developing my scripts further and expand my business.

    Oliver

  • nice post.. thanks for sharing base36 encoding for url shorting…

  • nice article, thanks

  • Thanks a lot, I just started reading about URL shortening and this was a wonderful start.
    Keep up the good work.

  • Is there any benefit to adding backward references to URLs which contain exactly the same string?

    i.e.

    ———————————————–
    ID URL
    ———————————————–
    10099 https://corpocrat.com/
    14566 https://corpocrat.com/blah/page.htm

    could become:

    ———————————————–
    ID URL
    ———————————————–
    10099 https://corpocrat.com/
    14566 10099/blah/page.htm

    A translation function would then replace all URLs not beginning with http:// with the URL of the referenced ID.

    This can be further improved for speed by creating a separate field for the base URL.

    Such a technique might improve storage, but I wonder there is a way it can be used to make the URLs shorter as well?

    • As the baseXX url is generated by converting the link ID (and not its URL), the only benefit you’ll have is, as you said, the DB storage size.

      But it will mean more process needed (two DB queries instead of one), and some issues if, for example, url 10099 changes.

  • Finally! I am assaying to get insight for professional writing style for my own site and what’s on here definitely provided me some suggestions. Keep up the nice work! m420nv vizio, m261vp vizio, vizio e550vl, vizio 32 e320vp, vizio xvt553sv 55