Commonspeak 2: Generating evolutionary wordlists

12 Aug 2018

Commonspeak2 was released at CRESTCon Asia and DEFCON Recon Village 2018.

Wordlists

Generated wordlists from Commonspeak are available at https://wordlists.assetnote.io.

Summary

Commonspeak2 leverages publicly available datasets from Google BigQuery to generate content discovery and subdomain wordlists.

You can obtain Commonspeak2 by visiting the following URL: https://github.com/assetnote/commonspeak2

Star

Documentation

Documentation for Commonspeak2 can be found here: https://github.com/assetnote/commonspeak2/blob/master/README.md

Contributing

If you’ve got any ideas on how we can use BigQuery to obtain better content-discovery / security-related datasets, we’d love to hear it. Pull requests are also very welcome.

You can submit any ideas, suggestions, or bugs via GitHub Issues here.

What about version one?

The first revision of Commonspeak was released in late 2017, being a proof of concept on being able to generate wordlists using BigQuery in a continuous fashion. Research was published at the time on the pentester.io blog. This revision was written in Bash and unfortunately was difficult to extend, especially for large scale data processing.

Since then, we’ve re-written Commonspeak in Golang, and are currently working on better techniques to extract and compile useful wordlists from publicly available, regularly updated datasets from BigQuery.