← cookbook/ingest/firecrawl-to-instadash
★ featured recipe8 min read·intermediate·updated just now

Scrape with Firecrawl, stream to Instadash

We'll crawl news.ycombinator.com, extract a typed Story per page with Firecrawl's extract API, then push the rows into a versioned Instadash grid you can query, share, and let agents cite. End to end in ~50 lines of TypeScript.

┌─ the pipeline ─┐● live
stage 01 · firecrawl

Walks the domain. Returns Markdown + a Zod-typed object per page via the extract API. Skips robots-disallowed paths automatically.

i/o
in https://news.ycombinator.com
out~30 pages · structured JSON
time to ship
~5 min
including npm i
lines of code
~50
one file, zero glue
rows landed
~28
with limit: 30
cost
$0.06
firecrawl free tier

Why this combination

Web scraping is the cheapest data source going, but the output is messy: raw HTML, half-baked structure, no schema, no versioning. Firecrawl solves the first half — it crawls a domain and returns LLM-friendly Markdown plus a structured object per page when you give it a Zod schema. Instadash solves the second — typed, queryable, versioned, mesh-indexed the second the rows land.

Wired together they replace the usual Python script + S3 + Postgres + dashboard build with one ~50-line TypeScript pipeline that hits a single HTTP endpoint to ship. The result is a live grid, a public URL, an MCP-callable endpoint, and a row in the mesh that any agent can find.

01 · Install & auth

Two keys — Firecrawl (free tier is enough for the demo) and Instadash. No SDK package required for the Instadash side; the recipe uses plain fetch.

# install
npm i @mendable/firecrawl-js zod
npm i -D tsx @types/node
 
# auth — both keys read from env at runtime
export FIRECRAWL_API_KEY="fc-..."
export INSTADASH_KEY="sk_..."        # mint at https://instadash.io/get-started

02 · Define the target schema

Instadash infers schema server-side on every push, but giving Firecrawl an explicit Zod shape produces cleaner rows and removes "why is this column sometimes null" surprises. Nullable fields stay nullable; required fields stay required.

import { z } from 'zod'
 
const StorySchema = z.object({
  title:     z.string(),
  url:       z.string().url(),
  author:    z.string().nullable().optional(),
  points:    z.number().nullable().optional(),
  comments:  z.number().nullable().optional(),
  posted_at: z.string().nullable().optional().describe('ISO 8601 timestamp'),
  domain:    z.string().nullable().optional(),
})
 
type Story = z.infer<typeof StorySchema>

03 · Crawl with Firecrawl

crawl with an extract block returns one structured object per page that matched the include path. We cap at 30 pages for the demo — set limit to whatever your plan allows.

import FirecrawlApp from '@mendable/firecrawl-js'
 
const fc = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! })
 
const job = await fc.crawl('https://news.ycombinator.com', {
  limit: 30,
  includePaths: ['/item.*'],
  scrapeOptions: {
    formats: ['extract'],
    extract: {
      schema: StorySchema,
      prompt: 'Extract one Hacker News story per page. Skip jobs and ads.',
    },
  },
})
 
const rows: Story[] = job.data
  .map(d => d.extract)
  .filter((s): s is Story => !!s && 'title' in s && 'url' in s)

04 · Push to an Instadash grid

One HTTP call. X-Grid-Name is the only required header — everything else is optional metadata that ends up in the mesh.

const res = await fetch('https://instadash.io/ingest', {
  method: 'POST',
  headers: {
    Authorization:        `Bearer ${process.env.INSTADASH_KEY}`,
    'Content-Type':       'application/json',
    'X-Grid-Name':        'hn-top',
    'X-Grid-Title':       'Hacker News — top stories',
    'X-Grid-Description': 'Refreshed via Firecrawl.',
    'X-Grid-Tags':        'hn,news,firecrawl',
    'X-Grid-Visibility':  'public',
  },
  body: JSON.stringify(rows),
})
 
const { grid_url, version } = await res.json()
console.log(grid_url)

The push is atomic. Each call creates a new version snapshot — re-run the script tomorrow and you get v2 with a diff view, no migration to write.

05 · You now have a live grid

The script writes progress to stderr and the final grid URL to stdout, so you can open "$(npm start)" from a shell and have it land in the browser.

Going further

Drop the script into a GitHub Action to refresh nightly. Switch to a Cloudflare Worker cron trigger if you need tighter timing. Add an action column with X-Grid-Actions and a human can mark which stories to summarise next — see the LangGraph HITL recipe for the read-back pattern.

$ npm start
$ npm start
firecrawl: starting crawl of news.ycombinator.com
extracted 28 rows
instadash: pushing to grid hn-top
✓ done · v1
https://instadash.io/your-handle/hn-top

What you got for free

  • versioned snapshotsEvery push creates an immutable, time-travelable version. Diff any two from the dashboard.
  • public mesh entryX-Grid-Visibility: public registers the grid. Agents cite it via instamesh_search.
  • schema inferredServer-side type inference on every push. Filters, sparklines, action columns auto-wired.
  • edge-served readsGET /<handle>/<slug>/rows is read-cached on Cloudflare's edge — fast and rate-limit-friendly.
  • MCP-callableDrop the MCP server in Claude/Cursor and ask "what changed since yesterday?" against your own grid.
Full source on GitHub

The complete runnable file, including the package.json and .env.example.

↗ view on github
─ related recipesview all →