Thundra

Thundra: Serverless Observability for AWS Lambda

The black box nature of AWS Lambda and other serverless environments means that identifying and fixing performance issues is difficult and time-consuming. Built for straightforward debugging, monitoring, and observability, Thundra provides deep insight into your entire serverless environment. Thundra collects and correlates all your metrics, logs, and traces, allowing you to quickly identify problematic invocations and also analyzes external services associated with that function. With Thundra’s zero overhead and automated instrumentation capabilities, your developers are free to write code without worrying about bulking up their Lambdas or wasting time on chasing black box problems.

Get Started    Discussions

Trace data provides end-to-end visibility into requests throughout the entire chain. Traces can be used for identifying which parts of the system have performance bottlenecks, detecting which components of the system lead to errors, and debugging the whole request flow for domain-level bugs. It has the following fields in addition to monitor base fields mentioned here.

Since trace data can contain multiple systems and applications participated in the flow, base fields of the trace are assigned by the starter system/application of the trace. For example, application related fields (such as applicationName, applicationId, …, applicationTags) are the properties of the starter system/application of the trace.

Moreover, since id field is inherited from base fields, there is no explicit traceId field here because inherited id field represents id of the trace.

Base Trace Fields

  • rootSpanId | string
    Id of the root span in the trace.

  • startTimestamp | long
    Start time of the span as UNIX Epoch time in milliseconds.

  • finishTimestamp | long
    End time of the trace as UNIX Epoch time in milliseconds. This field is optional and might be empty due to the following reasons:

    Trace has not finished yet.
    End of the trace cannot be detected (or very hard to detect) because it is distributed across many systems including external ones which don’t have tracing support.

  • duration | long
    Duration of the trace in milliseconds. Even though trace has not finished yet (no finishTimestamp set), this field must be set to the time interval between the current time when the trace data is collected and start time (startTimestamp ) when the trace has started.

  • tags | map<string, *>
    Tags of the metric in key-value format. For Example:

    <customerId, 1234567890>
    <customerName, “John Doe”>
    <isEnterpriseCustomer, true>
    <customerInfo, {

             “country”, “Turkey”,
             “city”: “Ankara”
    

    }>

Notes on `tags`

Note 1:
For labeling support, tag in <string, boolean> format can be used. So, for label based searches, tag with label name and value true can be queried.
Note 2:
Only tags with string, number or boolean typed values can be queried.

Span Fields

Span data represents a unit of work performed Span has the following fields in addition to monitor base fields mentioned here. Since id field is inherited from base fields, there is no explicit spanId field here, because inherited id field represents id of the span.

  • traceId | string
    The id of the owner trace.

  • transactionId | string
    The id of the owner transaction which this span belongs to. An empty value means that the span is not connected with any transaction.

  • parentSpanId | string
    The id of the parent span. This field is empty if the span is root so it has no parent span.

  • spanOrder | long
    Order of the span between its siblings which have the same parent span. If there is “happens-before” relation between spans (for example, spans are laid out sequentially), by the spanOrder property, the flow between sibling spans can be reconstructed. This field is optional and the default value is -1 which means there is no specified order. The value must be monotonically increasing but it is not required to increase one by one.

  • domainName | string
    The domain of the context where the span belongs to. For Example: API, DB, Cache etc.

  • className | string
    Class of the context where the span belongs to. For Example:

    AWS-Lambda (for API domain)
    AWS-DynamoDB (for DB domain)
    Redis (for Cache domain)
    ...

  • serviceName | string
    Identifies the service the span is from.

  • operationName** string*Name of the operation. For Example:

    user-get (AWS Lambda function name)
    Users (AWS DynamoDB table name)
    redis.mycompany.com (Redis host name)
    ...

  • startTimestamp | long
    Start time of the span as UNIX Epoch time in milliseconds.

  • finishTimestamp | long
    End time of the span as UNIX Epoch time in milliseconds.

  • duration | long
    Duration of the span in milliseconds.

  • tags | map<string, *>
    Tags of the metric in key-value format. For Example:

    <customerId, 1234567890>
    <customerName, “John Doe”>
    <isEnterpriseCustomer, true>
    <customerInfo, {

             “country”, “Turkey”,
             “city”: “Ankara”
    

    }>

Note on `tags`

Note 1:
For labeling support, tag in <string, boolean> format can be used. So, for label based searches, tag with label name and value true can be queried.
Note 2:
Only tags with string, number or boolean typed values can be queried.

  • logs | map<string, spanLog>
    End time of the span as UNIX Epoch time in milliseconds.
    Span logs are used for representing events occurred in the context of the span with their occurrence times. Span logs are in the following format:
    • name | string
      Name of the log event. This field is optional.
    • value | *
      Value of the log event.
    • timestamp | long
      Occurrence time of the log event as UNIX Epoch time in milliseconds.