Passwords: Cracking, Hashing, Salting

Motivation

Unfortunately, websites get hacked quite frequently. This can leak the authentication data of millions of users. Because of this, one must take precautions regarding how we store this authentication data in the first place.

We will largely be using this phenomenally written article as a resource: https://crackstation.net/hashing-security.htm

This guide will serve as an interactive tutorial to see for yourself the effects of different representations of account passwords.

Storage

First, install pipenv if you have not done so already. Clone https://github.com/codethechange/password-storage-workshop.git, and then run pipenv install followed by pipenv shell. Next, generate a passwords file by running python password-geneartor.py. That should create a passwords.json file. This is a simulated dump of username-password pairs, consisting of a fair proprtion of the most popular passwords coupled with passwords comprised of concatenated dictionary words as well as random strings. Imagine as an adversary that this is a dump of passwords you acquired.

Plaintext

Alas, if you store your passwords in plaintext, it is automatically game over. If the wrong eyes reach this password dump, they can appear as any user on your platform. Let’s try to avoid this.

Hashing

Run python hash-passwords.py to generate hashed-passwords.json. Imagine that this is the dump you, as an adversary, maliciously fetched from some poor web platform. To log in, the server hashes your provided password and see if the resulting hash matches the entry in the database corresponding to your username. This may seem secure out first glance, but we can easily concoct a lookup table. Run python hash-dictionary.py to generate a lookup table for all passwords that are dictionary words. Lookup tables cover many other variations of passwords, like common passwords in general and words with common substitutions. Next, run python crack-passwords.py to find passwords for many users! Also note that most lookup tables are too large to fit in memory, so rainbow tables are used instead for a time/space complexity tradeoff.

Salting

How could we prevent this lookup table? Well, we have to essentially require a unique lookup table for each password. We can produce a unique, random salt for each username-password pair.

Write this code in hash-salt-passwords.py:

import pandas as pd
from hashlib import sha256
import hashlib, binascii
import json
from random import choice
with open('passwords.json') as f:
    d = json.load(f)
from string import printable
chars = [c for c in printable if c.isalnum()]
for i in range(len(d['users'])):
    salt = ''.join([choice(chars) for _ in range(8)])
    m = sha256()
    to_hash = salt + '||' + d['passwords'][i]
    m.update(to_hash.encode("utf-8"))
    d['passwords'][i] = salt + '$' + m.hexdigest()
with open('salted-passwords.json', 'w') as f:
    json.dump(d,f)

Then we will have the passwords stored in a more secure dump: salted-passwords.json.

Now, try cracking this password with crack-salts.py:

# Load table
words = []
with open('dictionary.txt') as f:
    for line in f:
        line = str(line).replace('\n','')
        words.append(line)
# See if we can crack any passwords
import json
with open('salted-passwords.json') as f:
    d = json.load(f)
cracked_users = []
from hashlib import sha256
for i in range(len(d['users'])):
    salt, hash_val = tuple(d['passwords'][i].split('$'))
    if i % 10 == 0:
        print(i)
    # Generate salted hash for each word
    for word in words:
        m = sha256()
        m.update(str(salt + word).encode('utf-8'))
        if m.hexdigest() == d['passwords'][i]: # the hash exists! We have found a collision
            cracked_users.append((d['users'][i], word))
            break
print('Cracked ' + str(len(cracked_users)) + ' passwords!')
print(cracked_users[-10:])

Not that this takes significantly more time to crack than with no salt.

There are two quick improvements to our salting: a key derivation function, where we can control the computational difficulty of each resulting hash value, and a cryptographic PRG.

import pandas as pd
from hashlib import sha256
import hashlib, binascii
import json
from secrets import choice
with open('passwords.json') as f:
    d = json.load(f)
from string import printable
chars = [c for c in printable if c.isalnum()]
for i in range(len(d['users'])):
    salt = ''.join([choice(chars) for _ in range(8)])
    val = binascii.hexlify(hashlib.hmac('sha256', d['passwords'][i].encode('ascii'), salt.encode('ascii'), 1000000))
    d['passwords'][i] = salt + '$' + val.decode('ascii')
    print(d['passwords'][i])
with open('salted-passwords.json', 'w') as f:
    json.dump(d,f)

With that being said, check out this article for more info about the preferred key derivation functions.