Website Scans
The Website Scans data source contain historical website scans performed on urlscan.io and urlscan Pro by free users, customers and automated systems. This dataset goes back to December 2016 and does not age out any data. This dataset also tracks the brand and phishing detections generated by us.
When to use
You should query this dataset to find historical scans performed on a specific domain, IP, URL, or matching any of the other attributes that might be interesting for your investigation. You should not use this dataset if you want to discover new hostnames or domains. For this use-case please refer to the Hostnames dataset!
Constituting data sources
The majority of scans performed on our platform are driven by the community and our customer base. These data sources are the exception:
certstream-suspicious
– Our own real-time process for discovering hostnames and domains with suspicious keywords from Certificate Transparency logs.urlscan-observe
– Scans triggered by our Incidents as part of urlscan Observe.openphish
– URLs from the free OpenPhish URL sample, submitted every few hours.phishtank
– URLs from the free PhishTank URL feed, submitted multiple times per hour.urlhaus
– URLhaus Malware URL exchange by Abuse.ch.
Concepts / FAQ / Gotchas
- Scans performed by our platform are stored as immutable results, including their artifacts (screenshots, DOM snapshot).
- Brand detections can be added to existing scans retroactively.
- The
task.tags
are set by the submitter during submission. They can't be changed later.