micro-appsautomationhow-to

How to Build Micro-Apps That Scrape and Summarize Answers for Non-Technical Teams

UUnknown

2026-02-11

8 min read

Build tiny scrape-and-summarize micro-apps for sales/marketing using headless browsers, lightweight APIs and LLMs—ship fast and stay compliant.

Hook: Stop waiting on engineering — give reps bite-sized answers, not links

Marketing and sales teams waste hours opening pages, hunting for a single fact, and pasting it into pitch decks or outreach. Engineering teams can’t prioritize every tiny data request. The micro-app pattern solves this: tiny, focused services that scrape one answer and return a concise summary — fast, auditable and safe for non-technical users.

Why micro-apps for scrape-and-summarize matter in 2026

In 2026 the game has shifted. Headless browsers and lightweight server runtimes are cheap and tiny LLM inference or fast API access is ubiquitous. Trends to lean on:

Raspberry Pi AI HAT+ 2 and tiny LLMs let teams run summarization locally when required for privacy (ZDNET coverage, late 2025).
Tabular & structured models
Edge compute & serverless — tiny containers and edge functions make micro-apps globally available with low cost.

What you’ll build (cookbook overview)

This cookbook builds a minimal micro-app that takes a URL and a short question, fetches the page with a headless browser, extracts relevant text, and returns a short LLM-generated answer with sources. It’s optimized for sales/marketing reps who need a single accurate paragraph with citations.

Architecture (minimal)

Client: Slack slash command / Notion button / simple web UI
API: Lightweight HTTP service (One endpoint) exposing /answer
Scraper: Playwright (headless browser) or Puppeteer for JS-heavy pages
Extractor: Readability + CSS/XPath fallbacks + regex
Summarizer: LLM API (cloud or local) with prompt that enforces citations
Cache & Store: Redis + optional vector DB for reuse (Pinecone, Weaviate)

Step 1 — Design the API

Keep it tiny. One endpoint that returns structured JSON is all you need.

POST /answer
Content-Type: application/json
{
  "url": "https://example.com/product-page",
  "question": "What's the latest pricing plan for enterprise?",
  "max_age_seconds": 3600
}

Response shape

{
  "summary": "Enterprise plan is $X/user/month with Y features.",
  "highlights": ["Feature A: ...", "Feature B: ..."],
  "sources": [{"href":"...","text_snippet":"..."}],
  "cached": false
}

Step 2 — Scraping with a headless browser (Playwright example)

Use Playwright for resilience on JS-heavy pages. Keep the browser context short-lived and run in a pool.

# Python async example (playwright + FastAPI)
from playwright.async_api import async_playwright

async def fetch_page(url: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until='networkidle')
        html = await page.content()
        await browser.close()
        return html

Extraction strategy — robust and layered

Don’t rely on one method. Combine:

Readability / Mercury-like extraction for main content
CSS/XPath selectors for targeted fields (price, version, date)
Heuristics + regex for specific patterns (email, phone, $ currency)
DOM proximity: find the paragraph(s) closest to headings matching the question

# pseudo-code: pick best extractor
if css_selector_provided:
    result = select_css(html, selector)
elif readability_success:
    result = readability_extract(html)
else:
    result = fallback_text_snippets(html, query_keywords)

Step 3 — Summarize safely with an LLM

Use a controlled prompt that instructs the model to cite exact snippets and link to the source. Prefer model responses in JSON to make parsing deterministic.

Prompt:
You are an assistant that returns a short answer (1-3 sentences) to a user's question using only the provided page snippets.
Return JSON: {"answer":"...","highlights":["..."],"sources":[{"href":"...","snippet":"..."}]}

Snippets:
1) [text snippet A] (url: ...)
2) [text snippet B] (url: ...)
Question: What is the enterprise pricing?

Example: calling OpenAI-style API (pseudocode)

response = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{"role":"system","content":prompt}],
    temperature=0.0,
    max_tokens=200
)

Local inference option

For privacy or cost control, run small summarization models locally (on-device or in your VPC). In 2026 more capable small models and hardware like the Raspberry Pi AI HAT+ 2 can handle short summarizations — good for internal-only micro-apps or offline sites.

Step 4 — Make it friendly for non-technical teams

Wrap the endpoint with connectors they already use:

Slack slash command: /answer https://... — replies with the summary and a “view sources” button
Notion button or Zapier webhook that inserts summaries into CRMs notes
Browser extension that sends the current URL to the micro-app

Slack example (Outgoing webhook)

slash command: /scrape-summary https://example.com/product What is price?
-> Micro-app responds with the JSON fields turned into a Slack block (summary + link)

Step 5 — Reliability: caching, rate limits, proxies

Caching is crucial. Cache both raw page HTML and the final summary. Use TTLs tuned to the content type (news vs docs).

Short TTL (1–10 minutes) for rapidly changing pages
Longer TTL (1–24 hours) for docs or product pages

Rate limiting protects your micro-app and remote sites. Implement token-bucket limits per user and global concurrency limits for Playwright instances.

Proxy strategy: For scale and to avoid IP blocks, use rotating residential or datacenter proxies, or managed scraping APIs. Use a single proxy layer to keep auditability and rotate at the worker level.

Step 6 — Observability and cost control

Track these metrics:

Requests per user and per URL
Average time per scrape (headless browser time)
LLM tokens per request and per user
Error reasons: 4xx/5xx, DOM-not-found, blocked by anti-bot

Use these to set budgets and soft-fail behaviors (e.g., return cached answer if live scrape cost exceeds budget).

Step 7 — Anti-blocking & ethics

Hard reality: scraping can trigger anti-bot defenses. Avoid escalation and legal risk.

Respect robots.txt and site terms for production workloads — if a site disallows scraping, route requests to manual review.
Use standard headers, randomized timeouts, and HEAD request checks before full navigation.
Avoid scraping login-protected content unless you have explicit permission.
Log all scraped URLs and user requests for auditability; display a link to the original source in outputs for transparency.

If in doubt, ask legal. Sales shortcuts that ignore TOS can create downstream legal and brand risk.

Step 8 — Example full stack (summary)

Minimal deployment stack that scales:

FastAPI container with async Playwright and an LLM client
Redis for HTML cache and rate-limits
Small vector DB for storing previously extracted facts (optional)
Managed proxy provider if you need scale
CI/CD to build small container images and deploy to Cloud Run / AWS Lambda (via Lambda SnapStart for warm Playwright) or edge

Sample FastAPI route (simplified)

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()

class Req(BaseModel):
    url: str
    question: str

@app.post('/answer')
async def answer(req: Req):
    html = await fetch_page(req.url)            # Playwright
    snippets = extract_snippets(html, req.question)
    summary = await llm_summarize(snippets, req.question)
    return {"summary": summary, "sources": snippets}

Advanced patterns — structured extraction & tabular outputs

For product sheets, pricing tables, or features lists, return structured rows instead of free text. In 2026, specialized small models and tabular foundation models are widely available to convert text-to-table reliably.

Use a two-step pipeline: extract raw table HTML → normalize to rows → run an LLM or table model to validate/clean.
Return JSON tables for direct ingestion into CRMs or spreadsheets.

Security & privacy checklist

Mask or redact PII before sending to third-party LLMs.
Use VPC endpoints and private connectors for cloud LLM APIs if required.
Log requests with user ID and retention policy mapped to corporate compliance.

Cost-saving tips

Cache aggressively and return cached answers for repeated questions.
Use deterministic low-temp prompts to minimize token usage.
Batch LLM calls when possible (summarize multiple snippets in one request).
Consider triggering full LLM summarization only when confidence from local rules is low.

Case study (hypothetical)

A B2B sales team built a micro-app to answer “Does competitor X support SSO?” The micro-app checks the competitor’s docs, extracts SSO-related headings, and returns a one-liner with links. Adoption: reps used it on 60% of qualifying calls, and the company saved ~40 engineer-hours/month previously spent gathering competitor intelligence. They later switched to a hybrid model: a local small-model for on-demand summaries plus a periodic cloud model to produce longer reports.

Future-proofing & 2026 predictions

Expect these shifts:

More capable edge LLMs: On-device summarization will reduce costs and improve privacy for internal tools.
Structured-first extraction: Companies will prefer table outputs for immediate ingestion into analytics stacks—tabular models will power that flow.
Regulatory clarity: Tighter enforcement and clearer TOS patterns will force micro-apps to be both auditable and permission-aware.

Quick troubleshooting guide

No content extracted: enable a screenshot capture to debug DOM changes.
Anti-bot blocks: switch to a smaller browser footprint, increase wait times, or fall back to a scraping API with higher trust.
Expensive LLM calls: return a best-effort short snippet and queue a detailed summary for asynchronous delivery.

Actionable checklist to ship in a week

Prototype a FastAPI endpoint and Playwright fetcher (2 days).
Wire an LLM prompt that enforces citations (1 day).
Build caching + rate-limits + Slack connector (2 days).
Run a pilot with a small group of reps and collect feedback (1–2 days).

Final takeaways

Micro-apps = single-purpose + fast feedback. They remove friction for reps and keep engineering overhead low.
Layer your extraction: readability → selectors → regex → LLM. Each layer reduces cost and increases reliability.
Respect legal and privacy constraints — logging, consent, and redaction matter as much as engineering.

Call to action

Ready to ship a scrape-and-summarize micro-app for your reps? Start with the one-endpoint FastAPI + Playwright prototype above. If you want a ready-made starter repo, developer-friendly examples for Slack or Notion, and a vetted prompt library for citation-first summarization, grab our micro-app boilerplate and step-by-step CI/CD guide — deploy a working Slack-integrated micro-app in under a day.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.