Last updated

API Use Best Practices

These are some general pieces of advice we have collected over the years. Please stick to them. Our lives will be a lot easier!

  • Do not attempt to mirror or scrape our data wholesale. Please work with us if you have specific requirements.
  • Take care to remove PII from URLs or submit these scans as Unlisted, e.g., when there is an email address in the URL.
  • Certain JSON properties in API responses might occasionally be missing. Make sure you handle this gracefully.
  • Use your API key for all API requests (submit, search, retrieve), otherwise you're subject to quotas for unauthenticated users.
  • Use the API-Key HTTP header and not any other header name (e.g., x-api-key).
  • Any API endpoint not documented on this page is not guaranteed to be stable or even be available in the future.
  • Make sure to follow HTTP redirects (HTTP 301 and HTTP 302) sent by urlscan.io.
  • Use exponential backoffs and limit concurrency for all types of requests. Respect HTTP 429 response codes!
  • Use a work queue with backoffs and retries for API actions such as scans, results, and DOM or response downloads.
  • Consider using out-of-band mechanisms to determine whether the URL you want to submit will actually deliver content.
  • Consider first searching for a domain or URL before submitting it to be scanned again.
  • Search: Limit your searches by date if possible, e.g., query just the last 24 hours or seven days.

When developing an integration with urlscan, make sure to also follow these best practices:

  • Integrations: Use a custom HTTP user-agent string for your library/integration. Include a software version if applicable.
  • Integrations: Expose HTTP status codes and error messages to your users.
  • Integrations: Expect properties to be added to any JSON response object at any point in time. Handle this gracefully.