6. PBKDF-2

The advent of GPUs

When Nvidia launched the world's first Graphics Processing Unit in 1999 -- GeForce 256, they set the stage for future growth of parallel processing.

Post 2008 to today's 2020s, the processing capabilities of these GPUs have grown exponentially. Plus their cost have significantly decreased over time.

Back in early 2000s, if one had to brute force a password, they would use a CPU which had a few cores. So, only a few parallel password and its hash combination could be calculated at a time.

The modern GPUs have thousands of cores. If we take a look at Nvidia RTX 4090, it can calculate around 19 billion SHA-256 hashes per second. Attackers realised this parallel processing power, they could now brute-force passwords at astonishingly fast rate!

If the attacker has to crack a salted password, he takes the salt from the db, and then computes around 100 - 150 billions of SHA-256(password + salt) combinations per second because attackers have multiple GPUs in their password cracking rig. In a few days, most of the passwords will be cracked except for the larger and complex ones like passwords that are greater than 8 characters.

This is a serious security risk. If a hacker with 5 to 10 high-end GPUs can cause this much damage, imagine if a very well-funded entity like a large cyber terrorist group gets access of hundreds of powerful GPUs, the damage can be disastrous!

PBKDF-2!

Password based key derivation function 2 (PBKDF-2) is a key derivation function designed for storing user passwords safely.

A key derivation function (KDF) was made to derive one or more keys from a master key. The derived keys can differ in their key length.

                Let's imagine our master key is "staplehorse" and we want to derive 
                two keys from this master key.
    
                The first key we will use it as our database's password.

                The second key we will use for our PayPal account.
    
    
                    master key     message    key length in bytes  
                        👇           👇          👇
                KDF("staplehorse", "db password", 8) = 120d50e36d1f290c
    
                
                    master key     message    key length in bytes  
                        👇            👇              👇
                KDF("staplehorse", "paypal password", 18) = 9r0f50e46d2xc90ci8317fd38et2az23qw11

The key derivation function uses a psuedorandom function (PRF) internally to generate a new key when given the master key and the message as inputs.

One of the most popular psuedorandom function is -- HMAC (hash based message authentication code). Internally the HMAC algorithm uses a cryptographically secure hash function like SHA-256. The HMAC algorithm is known as a keyed hash function because it utilizes a hash function with two inputs, first the message and second the key. If the HMAC uses SHA-256, it will be called HMAC-SHA-256.

            Let's see how a KDF (key derivation function) will work.

                    master key     message    key length in bytes  
                        👇           👇          👇
                KDF("staplehorse", "db password", 8) = 120d50e36d1f290c
                 |
                 |
                 |-------> HMAC-SHA-256("staplehorse", "db password")
    
                The KDF passes master key and message to the HMAC-SHA-256.
                
                Then the returned value of HMAC-SHA-256 will be trimmed to match the input key length
                which is 8 bytes in this case.

Let us do a quick recap! A key derivation function uses a psuedorandom function to derive many keys from a master key.

Now, PBKDF-2 is a key derivation function that is specifically designed for the use case of handling user passwords. It takes the user's password, salt and iterations as an input. If we don't provide the salt by ourselves, the algorithm generates a salt by its own. Then it passes the user's password and the salt to the underlying psuedorandom function.

We told that PBKDF-2 accepted as an input iterations. Iterations is the number of times the input password and the salt goes through the underlying psuedorandom function.

The number of iterations is said to be the work factor -- higher the number of iterations, more work is required to be done by the processor before giving the final output.

Let us look at a practical example to understand how all this saves us from attackers using GPUs.

            In Django, User.objects.create_user is used to create a user.

            User.objects.create_user("john", "lennon@thebeatles.com", "johnpassword")
            will create a user named "john", with the email "lennon@thebeatles.com"
            and password "johnpassword".

            The above password gets sent to the make_password() function which generates
            a salt and passes the password and salt to the pbkdf2 function which 
            uses HMAC-SHA-256 as its psuedorandom function.

            Django saves the password as a string to db in the following form:
            <algorithm>$<iterations>$<salt>$<hash>

            hash is the final output of PBKDF-2.

            An example password would be (added space between $ for readability):
            pbkdf2_sha256 $ 600000 $ BHz2LaFrMrXt $ VYlU5EBfFTqErm7fo/rjd67mMiYtmAdFoXDR7ueOUxM=

The iterations count in the above example is 600,000. It means that when pbkdf2 function was called, it ran the underlying psuedorandom function that is HMAC-SHA-256, 600,000 times. This results in a dramatic slow down for the brute-force process of password crackers using GPUs.

Let us take a situation where an attacker tries to brute-force a password using Nvidia RTX 4090. We will compare SHA-256 salted password hash vs PBKDF-2.

        Let us assume we are cracking a 11 character password.
        The attacker as always has access to the salt. 
        He now just needs to brute-force all possible password combinations. 
        
        Let us assume our character set to be just 26 lowercase characters.
        Total possible password combinations will be 26 to the power 11: 
        26^11 = 3,670,344,486,987,776 possible password combinations.



        A NVIDIA RTX 4090 can calculate for SHA-256: 19 billion hashes per second
        
        Attacker will run:  sha-256("possible-password-1" + "the-salt")
                            sha-256("possible-password-2" + "the-salt")
                            sha-256("possible-password-3" + "the-salt") and so on...
        
        Time to crack for SHA-256 on NVIDIA RTX 4090: 
        3,670,344,486,987,776 ÷ 19 billion ≈ 193,176 seconds ≈ 2 days.



        A NVIDIA RTX 4090 can calculate for PBKDF2 (600K iterations): 14,000 hashes per second

        Attacker will run:  PBKDF-2("possible-password-1", "the-salt", 600000)
                            PBKDF-2("possible-password-2", "the-salt", 600000)
                            PBKDF-2("possible-password-3", "the-salt", 600000) and so on...

        Time to crack for PBKDF2 (600K iterations): 
        3,670,344,486,987,776 ÷ 14000
        ≈ 2.6 * 10^11 seconds
        ≈ 8,245 years!

To crack greatapples if we just used SHA-256 hashing, attacker could crack it in around 2 days. But with PBKDF-2, it'll take 8,245 years. That's the power of PBKDF-2 (the power of its underlying psuedorandom function's iterations). So, PBKDF-2 ultimately helped us defend against powerful GPU-based brute-force attacks!

Go to next lesson: Diffie–Hellman–Merkle key exchange