Scrape with Firecrawl, stream to Instadash

Why this combination

Web scraping is the cheapest data source going, but the output is messy: raw HTML, half-baked structure, no schema, no versioning. Firecrawl solves the first half — it crawls a domain and returns LLM-friendly Markdown plus a structured object per page when you give it a Zod schema. Instadash solves the second — typed, queryable, versioned, mesh-indexed the second the rows land.

Wired together they replace the usual Python script + S3 + Postgres + dashboard build with one ~50-line TypeScript pipeline that hits a single HTTP endpoint to ship. The result is a live grid, a public URL, an MCP-callable endpoint, and a row in the mesh that any agent can find.

01 · Install & auth

Two keys — Firecrawl (free tier is enough for the demo) and Instadash. No SDK package required for the Instadash side; the recipe uses plain fetch.

# install
npm i @mendable/firecrawl-js zod
npm i -D tsx @types/node
 
# auth — both keys read from env at runtime
export FIRECRAWL_API_KEY="fc-..."
export INSTADASH_KEY="sk_..."        # mint at https://instadash.io/get-started

02 · Define the target schema

Instadash infers schema server-side on every push, but giving Firecrawl an explicit Zod shape produces cleaner rows and removes "why is this column sometimes null" surprises. Nullable fields stay nullable; required fields stay required.

import { z } from 'zod'
 
const StorySchema = z.object({
  title:     z.string(),
  url:       z.string().url(),
  author:    z.string().nullable().optional(),
  points:    z.number().nullable().optional(),
  comments:  z.number().nullable().optional(),
  posted_at: z.string().nullable().optional().describe('ISO 8601 timestamp'),
  domain:    z.string().nullable().optional(),
})
 
type Story = z.infer<typeof StorySchema>

03 · Crawl with Firecrawl

crawl with an extract block returns one structured object per page that matched the include path. We cap at 30 pages for the demo — set limit to whatever your plan allows.

import FirecrawlApp from '@mendable/firecrawl-js'
 
const fc = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! })
 
const job = await fc.crawl('https://news.ycombinator.com', {
  limit: 30,
  includePaths: ['/item.*'],
  scrapeOptions: {
    formats: ['extract'],
    extract: {
      schema: StorySchema,
      prompt: 'Extract one Hacker News story per page. Skip jobs and ads.',
    },
  },
})
 
const rows: Story[] = job.data
  .map(d => d.extract)
  .filter((s): s is Story => !!s && 'title' in s && 'url' in s)

04 · Push to an Instadash grid

One HTTP call. X-Grid-Name is the only required header — everything else is optional metadata that ends up in the mesh.

const res = await fetch('https://instadash.io/ingest', {
  method: 'POST',
  headers: {
    Authorization:        `Bearer ${process.env.INSTADASH_KEY}`,
    'Content-Type':       'application/json',
    'X-Grid-Name':        'hn-top',
    'X-Grid-Title':       'Hacker News — top stories',
    'X-Grid-Description': 'Refreshed via Firecrawl.',
    'X-Grid-Tags':        'hn,news,firecrawl',
    'X-Grid-Visibility':  'public',
  },
  body: JSON.stringify(rows),
})
 
const { grid_url, version } = await res.json()
console.log(grid_url)

The push is atomic. Each call creates a new version snapshot — re-run the script tomorrow and you get v2 with a diff view, no migration to write.

05 · You now have a live grid

The script writes progress to stderr and the final grid URL to stdout, so you can open "$(npm start)" from a shell and have it land in the browser.

Going further

Drop the script into a GitHub Action to refresh nightly. Switch to a Cloudflare Worker cron trigger if you need tighter timing. Add an action column with X-Grid-Actions and a human can mark which stories to summarise next — see the LangGraph HITL recipe for the read-back pattern.