Visualize evidence to catch bots with confidence

The aim of bot detection is to distinguish between bot and human data. A human is unlikely to hit the same position on a screen multiple times, especially if tapping or clicking somewhere else in between, very quickly or with constant duration.

Detecting bots, however, is harder than it may seem, particularly since they are constantly evolving to circumvent many forms of protection. But every problem has a solution. Let's examine our approach to assisting developers in identifying bots with confidence.

Why is transparent evidence crucial in bot detection?

Obscuring botting via the application of randomness on positions, durations and delays is difficult. There are a large number of additional characteristics that need to be consistent and human behavior is not fully random either. In case no suspicious behavior can be identified by simple rules with sufficient confidence, artificial intelligence may assist.

In any case, merely reporting bot usage of a player to the game developers or publishers is not sufficient. A solid, traceable reason to ban a player is necessary to avoid a false positive that could irritate players or to present in the event of a dispute. It is therefore desirable to support claims of bot usage with textual and, most importantly, visual explanations and to make these available to the game owners.

What does the evidence generation process look like?

Denuvo Unbotify provides an automated infrastructure to generate botting evidence for all game sessions for which bot usage was detected and reported. Additionally, evidence plots can help analyze specific behavior or check the correctness of bot predictions. This can be a useful tool towards avoidance of any false positives.

Our process can be summarized as follows: Upon request via our dashboard or via a REST API call, we automatically fetch prepared information to generate plots illustrating the identified bot behavior, accompanied by textual explanations.

This is most obvious for detections by heuristic rules, which search for specific patterns in player behavior, but less obvious for detections by machine learning models, as they are expected to find relations in data that may not be immediately evident to the human eye. However, we can still identify the machine learning model’s feature that contributed most to a bot detection, for example, using SHapley Additive exPlanations (SHAP – a game-theoretic approach to computing the Shapely value), and then produce a suitable visualization.

An example of how botting evidence is generated

Before looking at concrete botting evidence visualizations, we consider two observations in Figures 1 and 2. Both plots show apparently similar touch patterns of players on mobile phones, where x and y represent the screen dimensions of a particular mobile phone and the second plot’s z-axis shows the time of a touch.

The former pattern was generated using a bot, while the latter reflects purely human behavior. This suggests that, obviously, looking at simple 2- or 3-dimensional plots of touch data only will not always help understand whether a bot detection is correct or not.

Figure 1: Touch positions generated by a bot

Figure 2: Touch positions created by a human

Nevertheless, we can identify multiple significant characteristics of bot behavior when taking a closer look at the data depicted in Figure 1.

1. Bot characteristic 1

Figure 3 shows evidence for a bot detection due to a significant number of repetitions of specific touchdown positions, each starting a stroke, i.e., a movement of a finger on the screen. As these repetitions are interleaved with more human-like tap behavior on other positions, the bot pattern may not be immediately obvious. Consecutive strokes starting at the same position are therefore not considered to avoid false detections since it can happen when humans use an emulator to play.

However, when we observe a significant number of strokes starting at positions without repetitions of actions on positions in their regions, this detection is solid. The underlying rule determines the 25% of positions that are tapped most frequently and computes their ratio of occurrences among all tap positions, which is about 53.8%. For human data, the percentage is expected to be closer to 25%.

While the first plot depicts the positions on the screen overlaid with their heat map along with histograms considering individual coordinates, the second plot finally makes these repetitions clear. There we see that four pillars grow significantly larger than the others.

Figure 3: Evidence for a significant number of repetitions of touch positions

In contrast, a similar plot of the human data is presented in Figure 4. If many taps happen in a small region, it is likely that positions occur more often than once. However, it shall also be noted that the histogram bins multiple pixels in both directions for better visualization and there are no repetitions here. One may also note the very different ranges of the counts of values within the histogram bins in the plots of bot and human data, respectively. While the maximum value in the former is 25, it is 5 in the latter.

Figure 4: Touch down position repetitions for human data

2. Bot characteristic 2

Figure 5 presents another bot characteristic that we can identify in the data depicted in Figure 1 that analyzes the tap durations. While the bot successfully mimics a typical tap duration of a human (around 100ms), it does not account well enough for natural deviations.

The first plot compares the actual interquartile range of tap durations (in yellow) with a minimum expected range (in red), which are 3ms and5 ms, respectively. It means that a human is not expected to achieve 50% of tap durations lie in a range of 5ms, for example, between 97ms and 102ms.

The second plot gives an overview of all tap durations and their positions. The positions are projected onto both coordinate axes to show which of them lie within the (expected or actual) interquartile range.

Figure 5: Evidence for a too small deviation in tap durations

Finally, Figure 6 shows a comparable plot for the human data depicted in Figure 2, where the tap durations are distributed with a significantly larger variation, with an interquartile range of 36ms.