Commonspeak2 leverages publicly available datasets from Google BigQuery to generate content discovery and subdomain wordlists.
You can obtain Commonspeak2 by visiting the following URL: https://github.com/assetnote/commonspeak2
Documentation for Commonspeak2 can be found here: https://github.com/assetnote/commonspeak2/blob/master/README.md
If you’ve got any ideas on how we can use BigQuery to obtain better content-discovery / security-related datasets, we’d love to hear it. Pull requests are also very welcome.
You can submit any ideas, suggestions, or bugs via GitHub Issues here.
What about version one?
The first revision of Commonspeak was released in late 2017, being a proof of concept on being able to generate wordlists using BigQuery in a continuous fashion. Research was published at the time on the pentester.io blog. This revision was written in Bash and unfortunately was difficult to extend, especially for large scale data processing.
Since then, we’ve re-written Commonspeak in Golang, and are currently working on better techniques to extract and compile useful wordlists from publicly available, regularly updated datasets from BigQuery.