Guides

Transforms

Transform your data before insert.

Transforms

A transform changes the shape of data while RawTree inserts it.

Many telemetry and cloud services send one large JSON object that contains the real events inside an array. That wrapper object is useful for delivery, but it is not always the shape you want to query. A transform tells RawTree which array contains the real rows and which wrapper fields should be copied onto each row.

Without a transform, RawTree stores the JSON object you send. With a transform, RawTree reads the JSON object, extracts the nested records, and stores each extracted record as its own row.

Example

CloudWatch Logs sends payloads shaped like this:

{
  "owner": "123456789012",
  "logGroup": "/aws/lambda/api",
  "logStream": "2024/01/15/[$LATEST]abcdef",
  "logEvents": [
    { "id": "1", "timestamp": 1705312800000, "message": "started" },
    { "id": "2", "timestamp": 1705312800100, "message": "finished" }
  ]
}

If you insert that JSON without a transform, RawTree inserts one row: the wrapper object with a nested logEvents array.

If you insert it with --transform cloudwatch-logs, RawTree inserts two rows, one for each log event:

{
  "id": "1",
  "timestamp": 1705312800000,
  "message": "started",
  "owner": "123456789012",
  "logGroup": "/aws/lambda/api",
  "logStream": "2024/01/15/[$LATEST]abcdef"
}
{
  "id": "2",
  "timestamp": 1705312800100,
  "message": "finished",
  "owner": "123456789012",
  "logGroup": "/aws/lambda/api",
  "logStream": "2024/01/15/[$LATEST]abcdef"
}

Now you can query the log events directly:

SELECT timestamp, message, logGroup
FROM cloudwatch_logs
ORDER BY timestamp
LIMIT 10;

When to use transforms

Use a transform when all of these are true:

  • Your source format is one of the supported formats below.
  • One JSON object contains many records inside a nested array.
  • You want to query those records as table rows.

Do not use a transform when your data is already one event per JSON object or one event per JSONL line. Insert that data normally.

RawTree supports built-in transforms only. Custom transform code is not currently supported; transform the data before insert if you need a custom shape.

How to insert with a transform

Use --transform with inline JSON, JSON files, or JSONL files.

rtree insert --table traces \
  --data '{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"api"}}]},"scopeSpans":[{"scope":{"name":"http"},"spans":[{"name":"GET /health","spanId":"abc"}]}]}' \
  --transform otlp-traces

The API accepts the transform name as a query parameter on a JSON body insert.

curl -X POST "https://api.rawtree.com/v1/tables/traces?transform=otlp-traces" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"api"}}]},"scopeSpans":[{"scope":{"name":"http"},"spans":[{"name":"GET /health","spanId":"abc"}]}]}'

After insert, query the destination table as usual.

rtree query "SELECT * FROM traces LIMIT 10"

Native OpenTelemetry endpoints

OpenTelemetry SDKs and collectors can send native OTLP directly to RawTree without adding a transform query parameter. RawTree supports OTLP/HTTP under /otlp/v1/* and OTLP/gRPC on the standard collector services. Both protocols apply the same built-in transforms and write to the default signal tables:

SignalOTLP/HTTP endpointOTLP/gRPC serviceTransformDestination table
TracesPOST /otlp/v1/tracesopentelemetry.proto.collector.trace.v1.TraceService/Exportotlp-tracestraces
LogsPOST /otlp/v1/logsopentelemetry.proto.collector.logs.v1.LogsService/Exportotlp-logslogs
MetricsPOST /otlp/v1/metricsopentelemetry.proto.collector.metrics.v1.MetricsService/Exportotlp-metricsmetrics

RawTree accepts OTLP/HTTP JSON (application/json), OTLP/HTTP protobuf (application/x-protobuf), and OTLP/gRPC protobuf. Configure SDKs with your project API key as the bearer token. If RawTree accepts an export but drops invalid signal records, the OTLP response includes partialSuccess with the signal-specific rejected count.

Native OTLP endpoints write to default signal tables unless you provide a signal-specific table header:

SignalCustom table header
Tracesx-rawtree-traces-table
Logsx-rawtree-logs-table
Metricsx-rawtree-metrics-table

The API key selects the project, and the table header selects the destination table.

curl -X POST "https://api.rawtree.com/otlp/v1/traces" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"api"}}]},"scopeSpans":[{"spans":[{"name":"GET /health","spanId":"abc"}]}]}]}'

For SDKs that use environment variables, set the protocol and endpoint explicitly:

export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.rawtree.com/otlp
export OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer%20$API_KEY,x-rawtree-traces-table=my_traces"
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.rawtree.com
export OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer%20$API_KEY,x-rawtree-traces-table=my_traces"

When running the local Docker Compose stack, use http://localhost:4317 for OTLP/gRPC.

Transforms are not supported with URL inserts. If you use ?url=, transform the data before hosting it.

Supported transforms

TransformInput shapeOutput rows
otlp-tracesOTLP export with resourceSpans, or one resource span object with resource and scopeSpansOne row per span
otlp-logsOTLP export with resourceLogs, or one resource log object with resource and scopeLogsOne row per log record
otlp-metricsOTLP export with resourceMetrics, or one resource metric object with resource and scopeMetricsOne row per data point
cloudwatch-logsCloudWatch Logs payload with logEventsOne row per log event
cloudtrailCloudTrail payload with RecordsOne row per CloudTrail record
firehoseAWS Firehose HTTP endpoint payload with records[].dataOne row per decoded record

Unknown transform names return a 400 response with the list of supported names.

OpenTelemetry output

The OpenTelemetry transforms merge resource attributes into each emitted row. Attribute values such as stringValue, intValue, boolValue, and doubleValue are unwrapped into their inner values. Complex attribute values such as arrayValue and kvlistValue keep their OTLP wrapper object.

For traces and logs, RawTree also adds scope.name when the source scope includes a name.

For metrics, RawTree emits one row per data point and adds metric metadata:

FieldDescription
metric.nameThe OTLP metric name
metric.typeThe OTLP metric type, such as gauge, sum, histogram, summary, or exponentialHistogram
metric.unitThe metric unit when present
metric.descriptionThe metric description when present
scope.nameThe instrumentation scope name when present

Result shapes

The exact output columns depend on the source fields. Each transform starts with the nested record it unwraps, then merges selected wrapper fields into the row.

otlp-traces

Input can be a full OTLP export with resourceSpans, or a single resource span object with resource and scopeSpans.

For each item in scopeSpans[].spans[], RawTree inserts one row. The row starts with the span object itself, then RawTree adds resource attributes such as service.name, and adds scope.name when the source scope has a name.

Example output row from a span named GET /health:

{
  "spanId": "abc",
  "name": "GET /health",
  "service.name": "api",
  "scope.name": "http"
}

otlp-logs

Input can be a full OTLP export with resourceLogs, or a single resource log object with resource and scopeLogs.

For each item in scopeLogs[].logRecords[], RawTree inserts one row. The row starts with the log record object itself, then RawTree adds resource attributes such as service.name, and adds scope.name when the source scope has a name.

Example output row from an INFO log record:

{
  "timeUnixNano": "1700000000000000000",
  "severityText": "INFO",
  "body": { "stringValue": "request started" },
  "service.name": "api",
  "scope.name": "logger"
}

otlp-metrics

Input can be a full OTLP export with resourceMetrics, or a single resource metric object with resource and scopeMetrics.

For every metric data point under gauge, sum, histogram, summary, or exponentialHistogram, RawTree inserts one row. The row starts with the data point object itself, then RawTree adds resource attributes, optional scope.name, and metric fields such as metric.name, metric.type, metric.unit, and metric.description.

Example output row from a cpu.usage gauge data point:

{
  "timeUnixNano": "1700000000000000000",
  "asDouble": 42.5,
  "host.name": "node-1",
  "scope.name": "runtime",
  "metric.name": "cpu.usage",
  "metric.type": "gauge",
  "metric.unit": "%"
}

cloudwatch-logs

Input must contain a top-level logEvents array.

For each item in logEvents[], RawTree inserts one row. The row starts with the log event object itself, then RawTree adds top-level CloudWatch fields such as owner, logGroup, logStream, and messageType.

Example output row from a CloudWatch log event:

{
  "id": "37199704885633600550210639780",
  "timestamp": 1705312800000,
  "message": "START RequestId: abc-123",
  "owner": "123456789012",
  "logGroup": "/aws/lambda/my-function",
  "logStream": "2024/01/15/[$LATEST]abcdef",
  "messageType": "DATA_MESSAGE"
}

cloudtrail

Input must contain a top-level Records array.

For each item in Records[], RawTree inserts the CloudTrail record object itself. RawTree does not add wrapper fields for this transform.

Example output row from a CloudTrail GetObject record:

{
  "eventVersion": "1.08",
  "eventTime": "2024-01-15T10:30:00Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "GetObject",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "203.0.113.42",
  "userIdentity": { "type": "IAMUser", "userName": "alice" }
}

If the unwrapped record already contains a field with the same name as a merged wrapper field, the merged value is the value inserted for that column. For example, OpenTelemetry scope.name comes from the scope wrapper, and CloudWatch logGroup comes from the top-level wrapper.

AWS output

The cloudwatch-logs transform unwraps logEvents. Each emitted row keeps the log event fields and also includes these wrapper fields when present:

FieldDescription
ownerAWS account owner from the wrapper payload
logGroupCloudWatch log group
logStreamCloudWatch log stream
messageTypeCloudWatch message type

The cloudtrail transform unwraps the top-level Records array. CloudTrail records are inserted as the emitted rows without additional wrapper fields.

firehose

Use firehose when AWS Firehose delivers to RawTree through an HTTP endpoint.

Configure the Firehose endpoint URL with the target table and transform:

https://api.rawtree.com/v1/tables/events?transform=firehose

Set the Firehose access key to your RawTree project API key. RawTree reads that key from the X-Amz-Firehose-Access-Key header.

Each Firehose records[].data value must be base64-encoded data. If the decoded value is a JSON object, RawTree inserts it as one row. If it is an array of JSON objects, RawTree inserts one row per object. If it is UTF-8 TSV text and columns is present, RawTree inserts one row per TSV line using those column names. If it is UTF-8 TSV text without columns, RawTree inserts one row with the decoded text in a data field. Other decoded formats return 400.

RawTree returns the Firehose response shape:

{
  "requestId": "firehose-request-id",
  "timestamp": 1710000000000
}

Empty results

A transform can emit zero rows if the input JSON does not contain the expected wrapper arrays. For example, otlp-traces requires spans under resourceSpans or scopeSpans, and cloudtrail requires a Records array.

If an insert succeeds but inserts fewer rows than expected, check the source shape and query the table with SELECT * FROM <table> LIMIT 10.