Thundra

Thundra: Serverless Observability for AWS Lambda

The black box nature of AWS Lambda and other serverless environments means that identifying and fixing performance issues is difficult and time-consuming. Built for straightforward debugging, monitoring, and observability, Thundra provides deep insight into your entire serverless environment. Thundra collects and correlates all your metrics, logs, and traces, allowing you to quickly identify problematic invocations and also analyzes external services associated with that function. With Thundra’s zero overhead and automated instrumentation capabilities, your developers are free to write code without worrying about bulking up their Lambdas or wasting time on chasing black box problems.

Get Started    Discussions

Agent Specific CPU Metrics

Different runtimes (Java, Node.js, Python, Go, …) might provide their own additional CPU and Memory Metrics along with the base metrics below. Check the list of agents below this page to see detailed information on agent-specific metrics.

Metric data represents measured or calculated statistics about a particular process or activity in the system over intervals of time — in other words, a time series. A metric can be application/environment specific (CPU metrics, memory metrics), module/layer-specific (cache metrics, DynamoDB metrics) or domain specific (user metrics). Metrics have the following fields in addition to monitor base fields mentioned here.

Gerneral Metrics

  • traceId | string
    If the metric is collected in a trace, this field represents id of the owner trace. This field is optional. An empty value means that the metric is not connected with any trace.

  • transactionId | string
    If the metric is collected in a transaction, this field represents id of the owner transaction. This field is optional. An empty value means that the metric is not connected with any transaction.

  • spanId | string
    If the metric is collected in a span, this field represents id of the owner span. This field is optional. An empty value means that the metric is not connected with any span.

  • metricName | string
    Name of the metric. For example: "CPUMetri" etc.

  • metricTimestamp | long
    The time when metric was collected was UNIX Epoch time in milliseconds.

  • metrics | map<string,string|number|boolean>
    Metric values in key-value format. For example: "<cacheName, “users”>" etc.

  • tags | map<string, >*
    Tags of the metric in key-value format. For example:

    <customerId, 1234567890>
    <customerName, “John Doe”>
    <isEnterpriseCustomer, true>
    <customerInfo, {

             “country”, “Turkey”,
             “city”: “Ankara”
    

    }>

Note on `tags`

Note 1:
For labeling support, tag in <string, boolean> format can be used. So, for label based searches, tag with label name and value true can be queried.
Note 2:
Only tags with string, number or boolean typed values can be queried.

Memory Metrics

Memory metric is a kind of “Metric” which

has metricName field with value MemoryMetric.
has the following metrics in the metrics field.

  • app.usedMemory | long
    Used memory in bytes by the application itself.

  • app.maxMemory | long
    Maximum memory in bytes of the application can have.

  • sys.usedMemory | long
    Used memory in bytes the system (there might be multiple applications running in the instance).

  • sys.maxMemory | long
    Maximum memory in bytes of the system (there might be multiple applications running in the instance) can have.

Unavailable Memory Metrics

If any of the Memory metrics are not available, their value should be -1.0.

CPU Metrics

CPU metric is a kind of “Metric” which

has metricName field with value CPUMetric.
has the following metrics in the `` field.

  • app.cpuLoad | double
    CPU load of the application. In here, the load is a value between 0.0 and 1.0. So the percentage can be calculated from here. For example: 0.0 is %0, 0.5 is %50.

  • sys.cpuLoad | double
    CPU load of the system (there might be multiple applications running in the instance). In here, the load is a value between 0.0 and 1.0. So the percentage can be calculated from here. For example: 0.0 is %0, 0.5 is %50.

Unavailable CPU Metric

If any of the CPU metrics are not available, the value should be -1.0.