Twitter Uses Rezolus Tool to Spot Abnormalities On Fine-Grain Timescale

Spotting abnormalities in performance are one of the most important parts of ensuring that consistency of the system. The process might seem fine for most but recently Twitter faced usage spike but the timescale was too small that the usual tool that was used for detecting any anomalies was not detecting anything.

To explain why Twitter needed Rezolus and how effective is this tool, site reliability engineer Brain Martin explained that this tool is a high-resolution system performance telemetry agents and now these tools are available through GitHub that is open-source. Twitter was already using this tool in order to help quantify their workload which means in case of any fluctuation in the usage, this tool detects the issue. Moreover, it provides data that helps in detecting runtime performance issue in case of workload.

The main reason of developing Rezolus was because of the fact that they needed to keep track of the system performance on a smaller time scale. The management discovered that while working on high throughput synthetic benchmark, there were abnormalities in the system from a very brief moment. In most cases, there are no systems efficient enough to spot an abnormality during a brief moment i.e. 10 seconds. Following these spikes, if there is a low sample, it makes the problem unless the length of the anomalies is longer. This was making it not only difficult for the management to understand the issue but spot any change and they started using Rezolus as a solution.

While explaining the usage, Brain stated an event where several services were experiencing a continuous decrease in the success rate of just a few minutes at a time. The team wasn’t able to detect any issue because the already available telemetry was unable to detect anything. It was quite evident that since the throttling decisions were made on a relatively finer timescale than the regular telemetry collection the team finally started suspecting the sub-minutely burst. Later, this throttling of a few minutes was detected with the help of Rezolus. By using this tool, Twitter will not only be able to spot the issue but also at what time the issue occurred making it easier for the team to resolve.

This is how Twitter's Rezolus Tool Helps It Detect Usage Spikes
Photo: AP

Read next: Twitter provides a new flexible option for advertisers with its 6-second viewable ad policy
Previous Post Next Post