Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trulayer.ai/llms.txt

Use this file to discover all available pages before exploring further.

A trace is one end-to-end unit of work — typically a single user request, a single agent turn, or one invocation of a background job. A trace is made up of one or more spans, each representing a step within that unit of work.

When to create a trace

Create a new trace whenever a unit of work starts. Common trace boundaries:
  • One HTTP request to your app
  • One message from a user in a chat session
  • One iteration of an agent’s reasoning loop
  • One cron run or queue-message handler
A trace should not span multiple user requests. If a user sends three messages in a row, that’s three traces, grouped into one session.

When to create a span

Create a span for any discrete step inside a trace whose latency, input, output, or errors you want to see separately. Common span types:
Span typeWhat it captures
llmA call to a language model — prompt, response, tokens, model name
retrievalA vector search or lookup — query, top-k results, latency
toolA tool/function call — arguments, return value
customAny other code block — business logic, parsing, validation
Spans can be nested — an agent loop might have an outer custom span for the whole iteration, with llm and tool spans inside.

Example: manual instrumentation

import trulayer

trulayer.init(api_key="...", project_name="rag-app")

with trulayer.trace("answer_question") as trace:
    trace.set_input({"question": question})

    with trace.span("retrieve", span_type="retrieval") as span:
        docs = vector_store.search(question, k=5)
        span.set_output({"doc_count": len(docs)})

    with trace.span("generate", span_type="llm") as span:
        span.set_model("gpt-4o-mini")
        response = client.chat.completions.create(...)
        span.set_output(response.choices[0].message.content)
        span.set_tokens(
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
        )

    trace.set_output({"answer": response.choices[0].message.content})

Example: auto-instrumentation

For OpenAI, Anthropic, LangChain, and a growing list of frameworks, you don’t need to wrap calls manually — one call to instrument_*() patches the provider client and every subsequent call becomes a span automatically.
import trulayer
from openai import OpenAI

trulayer.init(api_key="...", project_name="rag-app")

client = OpenAI()
trulayer.instrument_openai(client)

# Every client.chat.completions.create() call is now traced automatically.
Auto-instrumented spans can be combined with manual ones — the SDK keeps track of the active trace via async-local context and nests them correctly.

What gets captured

Every span captures:
  • Input and output — redacted via your scrub_fn if configured
  • Latency — wall-clock time from span start to end
  • Model — for llm spans, the model name
  • Tokens — prompt / completion / total for llm spans
  • Error — if the span raised an exception, the message and type
  • Metadata — any custom key-value pairs you attach
See the SDK reference for every method.

Performance

The SDK is non-blocking: spans are buffered in-process and flushed to the TruLayer ingest API in a background thread (Python) or via queueMicrotask (TypeScript). Your request path is never blocked on network I/O. Default batch size is 50 spans or 2 seconds, whichever comes first. Tune via configuration.