Quicker Breach Data Queries

Last month, 4iQ found a massive password list containing 1.4 billion usernames and passwords from previous breaches. The data is broken up into directories and files according to the first few letters of the username to allow for quicker searching using the included query.sh script. While this makes searching for specific users very easy, it is difficult to search the 41GB data dump by domain name for users from an entire organization.

To make it quicker for me to search for users of an org I was pen testing, I decided to throw the data into AWS Athena to see if it could handle querying the data quicker. The process was very quick and easy - well worth taking the time to set up.

1) Create a new S3 bucket

I’m not going to go into detail how to do that here, but I recommend you create a new user that only has permissions for this bucket. For this guide, let’s say you name it s3://breach-wordlists. You can use this same bucket to add more breach files in the future, so I’d name it something generic.

2) Download the data

The magnet link for the breach data can be found easily so I’m not going to list it here. The data took about 20 minutes to download on my AWS instance.

3) Sync to S3

If you don’t already have the AWS CLI tools installed, you’ll have to do that first:

pip install awscli --upgrade --user

Next, configure your access key id and secret access key for the new user:

brkr19@kali:~$ aws configure
AWS Access Key ID [None]:  **********
AWS Secret Access Key [None]:  ********** 
Default region name [None]:
Default output format [None]:

Finally, change into your data directory and run the sync command. This took about 15 minutes for me.

brkr19@kali:/mnt/wordlists/BreachCompilation/data/$ aws s3 sync . s3://breach-wordlists

4) Configure Athena

Step 1

Create a new database, let’s call it breach_database, and a new table called breach_compilation, with a location of s3://breach-wordlists/. 20180105120027etp9t

Step 2

Choose Text File with Custom Delimiters and a Field Terminator of :. 20180105120219ghyqg

Step 3

Create two string columns: username and password. 201801051203436dvc9

Step 4

Continue on without adding a partition, and click Create Table.

5) Run your queries!

You can now query the flat files as if they were a database with standard ANSI SQL. The vast majority of queries I’ve used gave me full results within 10-15 seconds!

Find passwords for a user

SELECT * FROM breach_compilation WHERE username = 'e_mail_address@example.org'

20180105121147iu1fm

Find passwords for a domain

SELECT * FROM breach_compilation WHERE username LIKE '%@example.org'

20180105120848p36og

Find usernames for a password

(might be useful for complex passwords to find other related email addresses)

SELECT * FROM breach_compilation WHERE password = 'ucsennemon'

20180105121329aijat

Let's us know what you think

Contact the author directly at @brkr19 if you have any questions or comments about this post!

Like this post? Please share it with others!