Root Cause Analysis of Outliers

Thundra helps users monitor their serverless applications that are the epitome of highly-distributed small black boxes. It is hard to find out and troubleshoot problems for your functions. Thundra provides you details of each function to find out problems in your Lambda function and display invocation metrics. With Performance Analysis Page, it becomes extremely easy to detect the outlier invocations and checking for downstream services with 2 clicks.

You can navigate to performance analysis tab in order to analyze your monitoring data firstly selecting any function that you want to detect outliers from functions page.

The Performance Analysis tab provides intelligent and extremely valuable information of your Lambda function invocations. All the information is represented in the form of heatmaps and graphs that allow you to better comprehend and access the invocation data. From the Performance Analysis charts provided, you can easily detect problematic invocations, dive into their performance and isolate the issue.

Heatmap allows you to visualize your function-specific invocations that are plotted time against duration. Depending on the interval of time and duration falls in, it is plotted as square according to that interval. As legend shows that darker colors indicates that higher invocation count within lying interval.

Heat Map Selection

Selection mode of heatmap, allows you to select an area of invocations in order to get more insight for selected invocations. When you select an area , charts below show data for selected invocations. You may think there is a problematic area and you can easily select and dive further into those invocations.

Resource chart displays all of the services of specific invocations with usages of each services. Using resource chart, you can see the breakdown of services usages and which services consumes much more time compared to total time.

In Duration and Count Chart, type of invocations in selected area is visualized. Hover on the graph to get more information about invocations in that time interval such as:

  • Average durations

  • Duration of the 99th percentile

  • Duration of the 95th percentile

  • Error

  • Cold Start

  • Health

All the invocations in selected area are displayed at the bottom. If you want to dive deeper for each invocation to analyze outliers, click on one of them.So, you shall be able to see trace data and logs among other detailed information pertaining to the specific invocation selected.

How to configure your serverless system for metrics?

For configuring metrics in each environment visit our documentation pages: Java, Python, NodeJS, .NET and Go.