Search Breach Data Quicker with AWS Athena

Est. Reading Time: 3 mins
images/20180105120848p36og.png

Last month, 4iQ found a massive password list containing 1.4 billion usernames and passwords from previous breaches. The data is broken up into directories and files according to the first few letters of the username to allow for quicker searching using the included query.sh script. While this makes searching for specific users very easy, it is difficult to search the 41GB data dump by domain name for users from an entire organization.

To make it quicker for me to search for users of an org I was pen testing, I decided to throw the data into AWS Athena to see if it could handle querying the data quicker. The process was very quick and easy - well worth taking the time to set up.

1) Create a new S3 bucket

I’m not going to go into detail how to do that here, but I recommend you create a new user that only has permissions for this bucket. For this guide, let’s say you name it s3://breach-wordlists. You can use this same bucket to add more breach files in the future, so I’d name it something generic.

2) Download the data

The magnet link for the breach data can be found easily so I’m not going to list it here. The data took about 20 minutes to download on my AWS instance.

3) Sync to S3

If you don’t already have the AWS CLI tools installed, you’ll have to do that first:

pip install awscli --upgrade --user

Next, configure your access key id and secret access key for the new user:

brkr19@kali:~$ aws configure
AWS Access Key ID [None]:  **********
AWS Secret Access Key [None]:  ********** 
Default region name [None]:
Default output format [None]:

Finally, change into your data directory and run the sync command. This took about 15 minutes for me.

brkr19@kali:/mnt/wordlists/BreachCompilation/data/$ aws s3 sync . s3://breach-wordlists

4) Configure Athena

Step 1

Create a new database, let’s call it breach_database, and a new table called breach_compilation, with a location of s3://breach-wordlists/. 20180105120027etp9t

Step 2

Choose Text File with Custom Delimiters and a Field Terminator of :. 20180105120219ghyqg

Step 3

Create two string columns: username and password. 201801051203436dvc9

Step 4

Continue on without adding a partition, and click Create Table.

5) Run your queries!

You can now query the flat files as if they were a database with standard ANSI SQL. The vast majority of queries I’ve used gave me full results within 10-15 seconds!

Find passwords for a user

SELECT * FROM breach_compilation WHERE username = 'e_mail_address@example.org'

20180105121147iu1fm

Find passwords for a domain

SELECT * FROM breach_compilation WHERE username LIKE '%@example.org'

20180105120848p36og

Find usernames for a password

(might be useful for complex passwords to find other related email addresses)

SELECT * FROM breach_compilation WHERE password = 'ucsennemon'

20180105121329aijat

Let us know what you think

Please share this post if you found it useful and reach out if you have any feedback or questions!

Big Breaks Come From Small Fractures.

You might not know how at-risk your security posture is until somebody breaks in . . . and the consequences of a break in could be big. Don't let small fractures in your security protocols lead to a breach. We'll act like a hacker and confirm where you're most vulnerable. As your adversarial allies, we'll work with you to proactively protect your assets. Schedule a consultation with our Principal Security Consultant to discuss your project goals today.