Data processing

Use this guide when users submit data transformations or analysis scripts and you need strong execution isolation with predictable resource controls.

Diagram: Typical data-processing flow

Choose the right execution pattern

Pattern	Best for	Core fields
stdin transform	small/medium JSON payloads	`stdin`, `code`, `runtime`
file-based run	CSV/images/multi-file artifacts	`files`, `outputPaths`
persistent pipeline	multi-step transformations with shared state	`mode: "persistent"`, `putFile/getFile` or remote `sessionId`

Baseline configuration

import { DockerIsol8 } from "@isol8/core";

const engine = new DockerIsol8({
  mode: "ephemeral",
  network: "none",
  timeoutMs: 60000,
  memoryLimit: "1g",
  sandboxSize: "512m",
});

await engine.start();

Pattern 1: stdin transformations

Great for row-wise mapping, filtering, and aggregation where input/output are JSON.

Library
API
CLI

const transformCode = `
import json, sys

data = json.load(sys.stdin)
result = [
  {**row, "total": row["price"] * row["quantity"]}
  for row in data
]
json.dump(result, sys.stdout)
`;

const input = JSON.stringify([
  { item: "Widget", price: 9.99, quantity: 100 },
  { item: "Gadget", price: 24.99, quantity: 50 },
]);

const result = await engine.execute({
  runtime: "python",
  code: transformCode,
  stdin: input,
});

const transformed = JSON.parse(result.stdout);

curl -X POST http://localhost:3000/execute \
  -H "Authorization: Bearer $ISOL8_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request": {
      "runtime": "python",
      "code": "import json,sys; d=json.load(sys.stdin); print(len(d))",
      "stdin": "[{\"x\":1},{\"x\":2}]"
    },
    "options": { "network": "none", "timeoutMs": 30000 }
  }'

isol8 run -e "import json,sys; d=json.load(sys.stdin); print(len(d))" \
  --runtime python \
  --stdin '[{"x":1},{"x":2}]'

Pattern 2: file-based processing and artifact retrieval

Use this when jobs need multiple inputs or non-text outputs (CSV, PNG, parquet, etc.).

Library
API

const result = await engine.execute({
  runtime: "python",
  code: `
import pandas as pd

df = pd.read_csv("/sandbox/data.csv")
out = df.groupby("category")["revenue"].sum().reset_index()
out.to_csv("/sandbox/summary.csv", index=False)
print("summary ready")
`,
  installPackages: ["pandas"],
  files: {
    "/sandbox/data.csv": csvContent,
  },
  outputPaths: ["/sandbox/summary.csv"],
});

const summaryCsvBase64 = result.files?.["/sandbox/summary.csv"];

{
  "request": {
    "runtime": "python",
    "code": "print('process files')",
    "files": {
      "/sandbox/data.csv": "Y2F0ZWdvcnkscmV2ZW51ZQphLDEwCg=="
    },
    "outputPaths": ["/sandbox/summary.csv"]
  },
  "options": {
    "network": "none",
    "timeoutMs": 60000
  }
}

CLI run does not expose full generic files/outputPaths request fields. Use library or API for full file injection/retrieval workflows.

Pattern 3: multi-step persistent pipelines

Use persistent mode when later steps must reuse files from earlier ones.

const session = new DockerIsol8({ mode: "persistent", timeoutMs: 60000 });
await session.start();

await session.putFile("/sandbox/raw.json", JSON.stringify(rawData));

await session.execute({
  runtime: "python",
  code: `
import json
raw = json.load(open("/sandbox/raw.json"))
clean = [r for r in raw if r.get("value") is not None]
json.dump(clean, open("/sandbox/clean.json", "w"))
print(len(clean))
`,
});

await session.execute({
  runtime: "python",
  code: `
import json
clean = json.load(open("/sandbox/clean.json"))
report = {"count": len(clean)}
json.dump(report, open("/sandbox/report.json", "w"))
`,
});

const report = await session.getFile("/sandbox/report.json");
console.log(report.toString("utf-8"));
await session.stop();

Persistent containers are runtime-bound. Do not switch Python -> Node in the same persistent container.

Remote pipelines with shared sessions

For distributed workers, use remote sessionId instead of keeping local engine state.

import { RemoteIsol8 } from "@isol8/core";

const remote = new RemoteIsol8(
  {
    host: "http://localhost:3000",
    apiKey: process.env.ISOL8_API_KEY!,
    sessionId: "data-job-123",
  },
  {
    network: "none",
    timeoutMs: 120000,
  }
);

await remote.start();
await remote.putFile("/sandbox/input.json", JSON.stringify(rawData));
await remote.execute({ runtime: "python", code: "print('step 1')" });
await remote.stop();

Package strategy for stable throughput

installPackages is powerful but adds latency. For production data workloads:

move stable dependencies into isol8.config.json dependencies.*
run isol8 setup to pre-build custom images
keep per-request installs for truly dynamic packages only

Parallel batch processing

You can schedule many independent jobs with Promise.all; engine semaphores and maxConcurrent bound active container count.

const jobs = datasets.map((dataset) =>
  engine.execute({
    runtime: "python",
    code: "import json,sys; d=json.load(sys.stdin); print(len(d))",
    stdin: JSON.stringify(dataset),
  })
);

const results = await Promise.all(jobs);

Returning chart/image artifacts

For visual outputs, write image files into /sandbox and retrieve via outputPaths.

const result = await engine.execute({
  runtime: "python",
  code: `
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt

plt.plot([1,2,3],[2,4,8])
plt.savefig("/sandbox/chart.png", dpi=150)
print("chart ready")
`,
  installPackages: ["matplotlib"],
  outputPaths: ["/sandbox/chart.png"],
});

const chartBase64 = result.files?.["/sandbox/chart.png"];

Execution guide

Request fields, execution modes, streaming, and output semantics.

Packages and images

Pre-bake dependencies to reduce per-job install latency.

Remote server and client

Run processing jobs on centralized remote infrastructure.

Performance tuning

Tune concurrency, pool behavior, and resource limits.

​Diagram: Typical data-processing flow

​Choose the right execution pattern

​Baseline configuration

​Pattern 1: stdin transformations

​Pattern 2: file-based processing and artifact retrieval

​Pattern 3: multi-step persistent pipelines

​Remote pipelines with shared sessions

​Package strategy for stable throughput

​Parallel batch processing

​Returning chart/image artifacts

​Related pages

Execution guide

Packages and images

Remote server and client

Performance tuning

Diagram: Typical data-processing flow

Choose the right execution pattern

Baseline configuration

Pattern 1: stdin transformations

Pattern 2: file-based processing and artifact retrieval

Pattern 3: multi-step persistent pipelines

Remote pipelines with shared sessions

Package strategy for stable throughput

Parallel batch processing

Returning chart/image artifacts

Related pages