Does Orsa render JavaScript?

Yes. Markdown extraction runs through a real headless browser by default.

How does Orsa handle Cloudflare and bot detection?

Patched browser fingerprints and tiered proxy escalation — most calls succeed on the first try.

Which Markdown flavor do you return?

GitHub-flavored Markdown by default; speak with us if you need Commonmark-only output.

What's the longest page you can extract?

Very long pages are supported with defaults suited to LLM workflows; limits can be raised on Pro and Scale.

WEB EXTRACTION

Any URL,
clean Markdown.

Orsa's Scrape Markdown endpoint turns any webpage into LLM-ready Markdown fast. We handle proxy escalation, JS rendering, and HTML-to-Markdown conversion — you get text you can paste into prompts, vector stores, or docs.

Try the endpoint Read the docs

https://

Scrape any webpage and get clean markdown content.

Start free, no card required, most teams ship a first integration in under ten minutes.

Why this endpoint exists

Not another JSON page.
A production workflow in one call

One call in. Markdown out.

Without Orsa

Before Orsa

A browser worker, proxy account, parser, retry queue, and schema contract for one endpoint.
Product teams wait while platform teams debug website-specific failures.
Every new data field becomes another brittle scraper branch.

With Orsa

After Orsa

One call in. Markdown out. With retries, rendering, and validation handled behind the API.
The response shape is typed, documented, and ready for product code.
Teams combine it with adjacent Orsa endpoints without adding vendors.

How it works

Teach the workflow,
then show the endpoint

Point Orsa at a url, let the platform handle the web work, and receive a response your product can trust.

Input01

Send a url

https://notion.com/blog/introducing-projects

Orsa runtime02

Render, retry, enrich

Orsa handles browser execution, proxy escalation, parsing, validation, caching, and typed response shaping behind the API.

Output03

One call in. Markdown out.

Use the result in rag knowledge bases without owning the extraction stack.

REST path: /api/v1/web/scrape/markdownSame endpoint used by the SDK examples below.
Input shape: URLhttps://notion.com/blog/introducing-projects
SDKs: TypeScript, Python, cURLStart with TypeScript, Python, or direct cURL.
Best first use: RAG knowledge basesCrawl a docs site, convert every page to Markdown, chunk it, embed it — Orsa handles capture and cleanup.

Production numbers

Performance your product
can actually plan around

Every endpoint page should answer the practical buying question: will this hold up once it leaves the demo?

p50 latency890msMeasured as production API latency, not a static mock.

p99 latency3.2sBuilt for the long tail of real websites and crawler paths.

quality bar98.6% extraction cleanliness98.6% extraction cleanliness (production sample)

credits per call1 keySelf-serve usage with predictable metering and no scraping infrastructure to own.

Feature layer

More than the response.
The operating layer behind it

Stripe pages teach the system around the API: inputs, retries, observability, adjacent products, and the code path. This section does the same for Orsa endpoints.

Typed response contracts for product code and AI tools.
Browser, proxy, cache, and validation logic handled by Orsa.
Direct fit for rag knowledge bases and ai agent context.

Endpoint

/api/v1/web/scrape/markdown

Example input

https://notion.com/blog/introducing-projects

Promise

One call in. Markdown out.

Pairs with

Scrape Sitemap, Crawl Website, Scrape HTML

Built for the job

What teams ship with
any url, clean markdown.

Each product page now speaks to the real workflow behind the endpoint, with concrete jobs instead of a generic feature list.

RAG knowledge bases

Crawl a docs site, convert every page to Markdown, chunk it, embed it — Orsa handles capture and cleanup.

AI agent context

When your agent needs to read a webpage, Markdown is the format that actually works with LLMs.

Content migration

Point Orsa at your sitemap and get clean Markdown for every post without maintaining a scraper.

Implementation

Keep the code small.
Let Orsa do the messy part

Use the endpoint directly, then combine it with adjacent Orsa APIs as the workflow grows.

Get an API key API reference

request

Responsetyped json

{
  "url": "https://notion.com/blog/introducing-projects",
  "title": "Introducing Projects",
  "markdown": "# Introducing Projects\n\nProjects is the new way...",
  "word_count": 1247,
  "reading_time_seconds": 312,
  "published_at": "2026-01-14T09:00:00Z",
  "language": "en"
}

Combine with

Build the full workflow,
not another point solution

The best product integrations usually combine two or three Orsa endpoints behind one customer experience.

Scrape Sitemap Crawl Website Scrape HTML

FAQ

The questions teams ask
before shipping

Short answers for the practical details: rendering, limits, freshness, and how this fits into production.

Get started

Put this endpoint
in your product today

Try the live endpoint, then wire the same response into your app with one API key.

Try this endpoint

One API key for every Orsa endpoint · No card required to start.

More than the response.
The operating layer behind it

Stripe pages teach the system around the API: inputs, retries, observability, adjacent products, and the code path. This section does the same for Orsa endpoints.

Typed response contracts for product code and AI tools.

Browser, proxy, cache, and validation logic handled by Orsa.

Direct fit for rag knowledge bases and ai agent context.

{ "url": "https://notion.com/blog/introducing-projects", "title": "Introducing Projects", "markdown": "# Introducing Projects\n\nProjects is the new way...", "word_count": 1247, "reading_time_seconds": 312, "published_at": "2026-01-14T09:00:00Z", "language": "en" }

Any URL,clean Markdown.

Not another JSON page.A production workflow in one call

Before Orsa

After Orsa

Teach the workflow,then show the endpoint

Send a url

Render, retry, enrich

One call in. Markdown out.

Performance your productcan actually plan around

More than the response.The operating layer behind it

What teams ship withany url, clean markdown.

RAG knowledge bases

AI agent context

Content migration

Keep the code small.Let Orsa do the messy part

Build the full workflow,not another point solution

The questions teams askbefore shipping

Put this endpointin your product today

Any URL,clean Markdown.

Not another JSON page.A production workflow in one call

Before Orsa

After Orsa

Teach the workflow,then show the endpoint

Send a url

Render, retry, enrich

One call in. Markdown out.

Performance your productcan actually plan around

More than the response.The operating layer behind it

What teams ship withany url, clean markdown.

RAG knowledge bases

AI agent context

Content migration

Keep the code small.Let Orsa do the messy part

Build the full workflow,not another point solution

The questions teams askbefore shipping

Put this endpointin your product today

Any URL,
clean Markdown.

Not another JSON page.
A production workflow in one call

Teach the workflow,
then show the endpoint

Performance your product
can actually plan around

More than the response.
The operating layer behind it

What teams ship with
any url, clean markdown.

Keep the code small.
Let Orsa do the messy part

Build the full workflow,
not another point solution

The questions teams ask
before shipping

Put this endpoint
in your product today

Any URL,
clean Markdown.

Not another JSON page.
A production workflow in one call

Teach the workflow,
then show the endpoint

Performance your product
can actually plan around

More than the response.
The operating layer behind it

What teams ship with
any url, clean markdown.

Keep the code small.
Let Orsa do the messy part

Build the full workflow,
not another point solution

The questions teams ask
before shipping

Put this endpoint
in your product today