Skip to main content
Use this guide when user code needs internet access, but you still need strict control over which destinations can be reached.

Diagram: Policy-enforced scraping flow

Start with filtered mode

Enable only approved hosts first, then run scraping logic.
import { DockerIsol8 } from "@isol8/core";

const engine = new DockerIsol8({
  mode: "ephemeral",
  network: "filtered",
  networkFilter: {
    whitelist: [
      "^api\\.github\\.com$",
      "^en\\.wikipedia\\.org$",
    ],
    blacklist: ["^169\\.254\\."],
  },
  timeoutMs: 30000,
  memoryLimit: "512m",
});

await engine.start();
In filtered mode, blacklist rules take precedence over whitelist rules.

Pattern 1: approved API fetch

const result = await engine.execute({
  runtime: "python",
  code: `
import urllib.request, json

url = "https://api.github.com/repos/Illusion47586/isol8"
resp = urllib.request.urlopen(url)
data = json.loads(resp.read())
print(json.dumps({
  "repo": data["full_name"],
  "stars": data["stargazers_count"]
}))
`,
});

console.log(result.stdout);

Pattern 2: graceful handling for blocked hosts

const result = await engine.execute({
  runtime: "python",
  code: `
import urllib.request

targets = [
  "https://api.github.com",
  "https://example-blocked-domain.invalid"
]

for url in targets:
  try:
    urllib.request.urlopen(url, timeout=5)
    print(f"ALLOW {url}")
  except Exception as e:
    print(f"BLOCK {url}: {e}")
`,
});

Pattern 3: scraping HTML with packages

For richer parsing, install parser libraries:
const result = await engine.execute({
  runtime: "python",
  installPackages: ["requests", "beautifulsoup4"],
  code: `
import requests
from bs4 import BeautifulSoup

html = requests.get("https://en.wikipedia.org/wiki/Docker_(software)", timeout=10).text
soup = BeautifulSoup(html, "html.parser")
first_p = soup.select_one(".mw-parser-output > p:not(.mw-empty-elt)")
print(first_p.get_text(strip=True)[:300])
`,
});

Authenticated API calls with secrets

When scraping private APIs, inject credentials using secrets.
const secured = new DockerIsol8({
  mode: "ephemeral",
  network: "filtered",
  networkFilter: {
    whitelist: ["^api\\.example\\.com$"],
    blacklist: [],
  },
  secrets: {
    API_TOKEN: process.env.API_TOKEN!,
  },
});

const result = await secured.execute({
  runtime: "python",
  code: `
import os, urllib.request, json

req = urllib.request.Request(
  "https://api.example.com/data",
  headers={"Authorization": f"Bearer {os.environ['API_TOKEN']}"}
)
resp = urllib.request.urlopen(req)
print(resp.status)
`,
});
Secret masking applies to stdout/stderr text. If script writes secrets to files, those file contents are not auto-redacted.

Observe network behavior during scraping

Enable network request logs for filtered runs:
isol8 run scraper.py \
  --net filtered \
  --allow "^api\.github\.com$" \
  --log-network \
  --no-stream
In non-stream mode, CLI prints collected network log entries when available.

Remote scraping workers

For centralized scraping infrastructure, run remote server and use RemoteIsol8.
import { RemoteIsol8 } from "@isol8/core";

const remote = new RemoteIsol8(
  {
    host: "http://localhost:3000",
    apiKey: process.env.ISOL8_API_KEY!,
    sessionId: "scrape-job-001",
  },
  {
    network: "filtered",
    networkFilter: {
      whitelist: ["^api\\.github\\.com$"],
      blacklist: [],
    },
    timeoutMs: 30000,
  }
);

await remote.start();
const res = await remote.execute({
  runtime: "python",
  code: "print('remote scrape run')",
});
await remote.stop();

Safer scraping design patterns

  • whitelist exact hostnames instead of broad wildcards
  • keep timeouts short for external requests
  • parse to structured output (JSON) rather than raw HTML dumps
  • separate fetch and parse stages to isolate failures
  • pre-bake stable dependencies to avoid per-run install overhead