Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trulayer.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Python SDK is designed so that a TruLayer ingest outage never becomes an application outage. This page documents the default behavior and the one opt-in knob for teams who deliberately want the opposite tradeoff.

Default — drop and warn

When the ingest API is unreachable (network error) or returns a transient status (5xx), the SDK:
  1. Retries the batch up to with exponential backoff (500 ms, 1 s, 2 s).
  2. On the third failure, drops the batch in-memory and emits a single warnings.warn(...).
  3. Suppresses warnings for subsequent failures within a 60-second window to avoid log flooding — a fresh warning is emitted once the window rolls over.
User code never blocks on network I/O and never sees a batch failure surface as an exception. Trace capture runs on the caller thread; transport runs on a background flush thread that owns the retry logic. This is the right default for almost every production service. A dead ingest endpoint should degrade observability, not customer-facing behavior.

Opt-in — TRULAYER_FAIL_MODE=block

Set TRULAYER_FAIL_MODE=block to make client.shutdown() (and the flush that runs during shutdown) raise a typed TruLayerFlushError when a batch exhausts its retries.
import os
import trulayer
from trulayer.errors import TruLayerFlushError

os.environ["TRULAYER_FAIL_MODE"] = "block"

trulayer.init(
    api_key=os.environ["TRULAYER_API_KEY"],
    project_name="critical-eval-pipeline",
)

try:
    with trulayer.trace("nightly-eval") as trace:
        ...
    trulayer.shutdown()
except TruLayerFlushError as err:
    # Alert, mark the run as failed, or abort the job deliberately.
    print(f"ingest failed: {err} (batch size {err.batch_size})")
    raise
TruLayerFlushError exposes two fields:
  • batch_size: int — number of traces in the failed batch.
  • __cause__ — the underlying network or HTTP error (standard Python exception chaining).

When to use block mode

Block mode is a niche tool. Reach for it only when:
  • The workload is a batch job whose entire value depends on TruLayer receiving the output (eval pipelines, backfills, scheduled quality runs).
  • Silently losing traces is materially worse than surfacing an error to the operator.
  • The caller is prepared to handle TruLayerFlushError — typically by failing the job and retrying the whole run.
Do not use block mode for:
  • User-facing request handlers (ASGI/WSGI apps). A transient ingest outage will cascade into customer-visible failures.
  • Background services that must survive observability outages (payment processors, auth flows, webhooks).

Zero-network — TRULAYER_MODE=local

For CI and offline development, set TRULAYER_MODE=local. The SDK swaps the HTTP sender for an in-memory LocalBatchSender that stores every trace for inspection, never touches the network, and never warns.
TRULAYER_MODE=local pytest
Combine with the trulayer.testing helpers for assertions on captured traces.

Replay — TRULAYER_MODE=replay

Set TRULAYER_MODE=replay together with TRULAYER_REPLAY_FILE=<path> to load a previously captured JSONL file on init(). Useful for golden-file regression tests and reproducing a production trace locally.
TRULAYER_MODE=replay \
TRULAYER_REPLAY_FILE=fixtures/golden.jsonl \
  pytest
TRULAYER_MODE=replay implies local — replayed traces never escape to the live API, because they were produced by a previous capture and would double-count in the dashboard. Malformed JSONL lines are skipped with a warning.

Decision guide

ScenarioRecommended mode
Production HTTP serviceDefault (drop + warn)
Background worker with SLO on ingestDefault (drop + warn)
Nightly eval / backfill jobTRULAYER_FAIL_MODE=block
CI unit testsTRULAYER_MODE=local
CI integration tests against a golden captureTRULAYER_MODE=replay + TRULAYER_REPLAY_FILE
Local development without an API keyTRULAYER_MODE=local

Archived project

HTTP 403 responses with code: "error.project.archived" are treated differently from other errors. They indicate that the project associated with your API key has been archived — a deliberate configuration change, not a transient failure. When the SDK receives this response:
  1. It logs an ERROR-level message via the standard Python logging module (logger name trulayer):
    ERROR trulayer: Ingest permanently disabled — the project associated with this API key has been archived.
    Unarchive the project at https://app.trulayer.ai/projects to resume, then restart the process or create a new client.
    
  2. The exporter is permanently disabled for that client instance. Subsequent flush attempts are no-ops and produce no further log output.
  3. Your application continues running normally — only TruLayer observability is suspended.

Why the exporter does not retry

A 403 is an authoritative refusal. Retrying would produce noise without any possibility of success. The SDK treats this the same way a browser treats an HTTP 403: stop, log, and do not retry.

Resuming after unarchiving

Unarchiving the project (from Projects settings at app.trulayer.ai/projects) restores ingest immediately — no key rotation needed. However, any already-running client instance that received the 403 will not automatically resume. You must either:
  • Restart the process — the new process starts a fresh client that will send normally.
  • Create a new client — call trulayer.init(...) (or instantiate a new TruLayerClient) again with the same API key. The new client is independent of the disabled one.

Detecting the error programmatically

In block mode (TRULAYER_FAIL_MODE=block), the TruLayerFlushError raised on permanent 403s carries a status_code attribute you can inspect:
from trulayer.errors import TruLayerFlushError

try:
    trulayer.shutdown()
except TruLayerFlushError as err:
    if getattr(err, "status_code", None) == 403:
        print("Project may be archived — check app.trulayer.ai/projects")
    raise
See Project lifecycle for full details on archiving, unarchiving, and the one-active-project constraint.

See also