Last month, 4iQ found a massive password list containing 1.4 billion usernames and passwords from previous breaches. The data is broken up into directories and files according to the first few letters of the username to allow for quicker searching using the included
query.sh script. While this makes searching for specific users very easy, it is difficult to search the 41GB data dump by domain name for users from an entire organization.
To make it quicker for me to search for users of an org I was pen testing, I decided to throw the data into AWS Athena to see if it could handle querying the data quicker. The process was very quick and easy - well worth taking the time to set up.
1) Create a new S3 bucket
I’m not going to go into detail how to do that here, but I recommend you create a new user that only has permissions for this bucket. For this guide, let’s say you name it
s3://breach-wordlists. You can use this same bucket to add more breach files in the future, so I’d name it something generic.
2) Download the data
The magnet link for the breach data can be found easily so I’m not going to list it here. The data took about 20 minutes to download on my AWS instance.
3) Sync to S3
If you don’t already have the AWS CLI tools installed, you’ll have to do that first:
pip install awscli --upgrade --user
Next, configure your access key id and secret access key for the new user:
brkr19@kali:~$ aws configure AWS Access Key ID [None]: ********** AWS Secret Access Key [None]: ********** Default region name [None]: Default output format [None]:
Finally, change into your data directory and run the sync command. This took about 15 minutes for me.
brkr19@kali:/mnt/wordlists/BreachCompilation/data/$ aws s3 sync . s3://breach-wordlists
4) Configure Athena
Create a new database, let’s call it
breach_database, and a new table called
breach_compilation, with a location of
Text File with Custom Delimiters and a Field Terminator of
Create two string columns:
Continue on without adding a partition, and click
5) Run your queries!
You can now query the flat files as if they were a database with standard ANSI SQL. The vast majority of queries I’ve used gave me full results within 10-15 seconds!
Find passwords for a user
SELECT * FROM breach_compilation WHERE username = 'email@example.com'
Find passwords for a domain
SELECT * FROM breach_compilation WHERE username LIKE '%@example.org'
Find usernames for a password
(might be useful for complex passwords to find other related email addresses)
SELECT * FROM breach_compilation WHERE password = 'ucsennemon'
Contact the author directly at @brkr19 if you have any questions or comments about this post!