Does Orsa render JavaScript?

Yes. Scrape calls run through a real headless browser path so SPAs and client-rendered content are included by default.

How does Orsa handle Cloudflare and bot detection?

We use patched browser fingerprints and tiered proxy escalation so most targets succeed without you operating infrastructure.

Can I tune aggressiveness per call?

Yes. You can opt into stricter behavior when you need it; robots.txt is respected by default.

What are the default size limits?

Large pages are supported with sensible defaults; higher limits are available for long-form content when your workload needs them.

WEB EXTRACTION

Every URL on
any domain.

Discover the full shape of a website in one call. Recursive index handling, gzip, and malformed XML are handled silently.

Try the endpoint Read the docs

https://

Discover all URLs from a website's sitemap.

Start free, no card required, most teams ship a first integration in under ten minutes.

Why this endpoint exists

Not another JSON page.
A production workflow in one call

One call in. The full URL graph out.

Without Orsa

Before Orsa

A browser worker, proxy account, parser, retry queue, and schema contract for one endpoint.
Product teams wait while platform teams debug website-specific failures.
Every new data field becomes another brittle scraper branch.

With Orsa

After Orsa

One call in. The full URL graph out. With retries, rendering, and validation handled behind the API.
The response shape is typed, documented, and ready for product code.
Teams combine it with adjacent Orsa endpoints without adding vendors.

How it works

Teach the workflow,
then show the endpoint

Point Orsa at a url, let the platform handle the web work, and receive a response your product can trust.

Input01

Send a url

nytimes.com

Orsa runtime02

Render, retry, enrich

Orsa handles browser execution, proxy escalation, parsing, validation, caching, and typed response shaping behind the API.

Output03

One call in. The full URL graph out.

Use the result in crawl planning without owning the extraction stack.

REST path: /api/v1/web/scrape/sitemapSame endpoint used by the SDK examples below.
Input shape: URLnytimes.com
SDKs: TypeScript, Python, cURLStart with TypeScript, Python, or direct cURL.
Best first use: Crawl planningSeed crawls with the real surface area of a site, not just the homepage.

Production numbers

Performance your product
can actually plan around

Every endpoint page should answer the practical buying question: will this hold up once it leaves the demo?

p50 latency410msMeasured as production API latency, not a static mock.

p99 latency1.9sBuilt for the long tail of real websites and crawler paths.

quality barHigh recall on standard and nested sitemap indexesHigh recall on standard and nested sitemap indexes

credits per call1 keySelf-serve usage with predictable metering and no scraping infrastructure to own.

Feature layer

More than the response.
The operating layer behind it

Stripe pages teach the system around the API: inputs, retries, observability, adjacent products, and the code path. This section does the same for Orsa endpoints.

Typed response contracts for product code and AI tools.
Browser, proxy, cache, and validation logic handled by Orsa.
Direct fit for crawl planning and seo ops.

Endpoint

/api/v1/web/scrape/sitemap

Example input

nytimes.com

Promise

One call in. The full URL graph out.

Pairs with

Crawl Website, Scrape Markdown, Scrape HTML

Built for the job

What teams ship with
every url on any domain.

Each product page now speaks to the real workflow behind the endpoint, with concrete jobs instead of a generic feature list.

Crawl planning

Seed crawls with the real surface area of a site, not just the homepage.

SEO ops

Diff sitemaps over time to catch indexing regressions early.

Docs mirrors

Pull every docs path before you snapshot content to Markdown.

Implementation

Keep the code small.
Let Orsa do the messy part

Use the endpoint directly, then combine it with adjacent Orsa APIs as the workflow grows.

Get an API key API reference

request

Responsetyped json

{
  "domain": "nytimes.com",
  "urls": [
    "https://www.nytimes.com/",
    "https://www.nytimes.com/section/world"
  ],
  "sources": ["https://www.nytimes.com/sitemap.xml"]
}

Combine with

Build the full workflow,
not another point solution

The best product integrations usually combine two or three Orsa endpoints behind one customer experience.

Crawl Website Scrape Markdown Scrape HTML

FAQ

The questions teams ask
before shipping

Short answers for the practical details: rendering, limits, freshness, and how this fits into production.

Get started

Put this endpoint
in your product today

Try the live endpoint, then wire the same response into your app with one API key.

Try this endpoint

One API key for every Orsa endpoint · No card required to start.

More than the response.
The operating layer behind it

Stripe pages teach the system around the API: inputs, retries, observability, adjacent products, and the code path. This section does the same for Orsa endpoints.

Typed response contracts for product code and AI tools.

Browser, proxy, cache, and validation logic handled by Orsa.

Direct fit for crawl planning and seo ops.

Every URL onany domain.

Not another JSON page.A production workflow in one call

Before Orsa

After Orsa

Teach the workflow,then show the endpoint

Send a url

Render, retry, enrich

One call in. The full URL graph out.

Performance your productcan actually plan around

More than the response.The operating layer behind it

What teams ship withevery url on any domain.

Crawl planning

SEO ops

Docs mirrors

Keep the code small.Let Orsa do the messy part

Build the full workflow,not another point solution

The questions teams askbefore shipping

Put this endpointin your product today

Every URL onany domain.

Not another JSON page.A production workflow in one call

Before Orsa

After Orsa

Teach the workflow,then show the endpoint

Send a url

Render, retry, enrich

One call in. The full URL graph out.

Performance your productcan actually plan around

More than the response.The operating layer behind it

What teams ship withevery url on any domain.

Crawl planning

SEO ops

Docs mirrors

Keep the code small.Let Orsa do the messy part

Build the full workflow,not another point solution

The questions teams askbefore shipping

Put this endpointin your product today

Every URL on
any domain.

Not another JSON page.
A production workflow in one call

Teach the workflow,
then show the endpoint

Performance your product
can actually plan around

More than the response.
The operating layer behind it

What teams ship with
every url on any domain.

Keep the code small.
Let Orsa do the messy part

Build the full workflow,
not another point solution

The questions teams ask
before shipping

Put this endpoint
in your product today

Every URL on
any domain.

Not another JSON page.
A production workflow in one call

Teach the workflow,
then show the endpoint

Performance your product
can actually plan around

More than the response.
The operating layer behind it

What teams ship with
every url on any domain.

Keep the code small.
Let Orsa do the messy part

Build the full workflow,
not another point solution

The questions teams ask
before shipping

Put this endpoint
in your product today