Video Gaming

Fighting bots at scale: Identifying data processing bottlenecks and best practices

blog_image_header (1)

Right now, without knowing it, bots are currently plaguing your mobile game! Bots have a dangerous knock-on effect on gaming experience, user retention and in-app purchases, eventually ruining your business model and degrading studio’s reputation. 

The proliferation of bots and their increasing sophistication are making it difficult for developers to come up with a proper response. 

Here comes a fresh approach to fighting bots  

The biggest challenge faced by outdated bot prevention methods, which tend to focus on "server interactions" – the requests that bots make to your server – is the lack of information that can effectively identify and differentiate between humans and bots. Fraudsters are aware of this and have learnt how to spoof various application programming interface calls to the servers, circumventing many forms of bot detection and prevention.   

This is why a more sophisticated approach has been devised that does not necessitate tight integration with particular game logic and makes use of sensor data – information that is difficult to fake.  

A baseline of authentic user behavior on specific flows within an app is built by leveraging sensor data (touch pressure and position, accelerometer, gyroscope, light sensors) to track everything that happens to a device or its peripherals. The technology then monitors all the interactions with the mobile game through several elements, such as how fast the clicks happen and where they are taking place on the screen, the touch areas, device’s position and movement.  

By comparing those interactions with what is considered normal for that specific game, cheaters utilizing bots that are operating outside the parameters will be detected. 

The importance of data processing efficiency in detecting bots  

To make the bot detection technology work effectively, the behavioral data is collected on the client side and transmitted to our servers for further analysis. This is where Denuvo Unbotify analyzes these fine-grained data points.  

The amount of data is also highly game-dependent. For instance, this kind of data is extremely rich on mobile devices because there are more sensors available to gather data from. It can amass a constant stream of one data packet per second per player, capturing the interaction details at millisecond precision. This can add up to terabytes per day even for a game with a modest 50,000 concurrent players.  

It therefore requires the ingestion and management of vast amounts of data to ensure high accuracy and swift response in detecting botting activity and cheating, thus maintaining player community integrity. Efficiency in data processing is crucial for us to keep an eye on more players more frequently. Otherwise, only a portion of players can be focused on and important interactions could be overlooked. 

Data processing efficiency is one of the keys for us to fight bots 

Unbotify filters through the large amounts of player data with the help of cloud-based resources, using Databricks to orchestrate the data processing. This also allows us to scale according to the developer’s demand. How can this be done effectively and efficiently? 

blog_image_header (1)

Read the full version of our analysis here!

 

1. Bottlenecks in traditional data partition management 

With the proliferation of large datasets in cloud storage solutions such as Amazon S3, ensuring efficient data loading and processing becomes critical for maintaining performance and scalability in modern data platforms like Databricks. A common challenge arises when working with partitioned data over long periods, as managing file system paths for specific date-hour ranges or other dimensions can introduce significant overhead and bottlenecks. 

We managed to address these issues by leveraging Spark optimizations on Databricks to handle large-scale data efficiently. Our new approach enhances performance by eliminating manual path management and explicitly defining schemas for JSON data during the DataFrame creation process.  

As a result, in our case, PySpark’s optimizations, such as lazy evaluation, partition pruning, and predicate pushdown, could be applied at scale and now Spark can dynamically load only the necessary data while optimizing I/O and computation across the cluster. 

 Read the full analysis here! 

2. Optimizing data loading workflows 

After dealing with the performance interferences in the data loading and management procedure, we determined through experiments that our current Databricks job workflow design also caused performance bottlenecks when dealing with ever-increasing volumes of data.  

We, therefore, examine how we transitioned from a traditional, resource-intensive pipeline to a more efficient and structured workflow in Databricks. Our new approach improved data processing, storage and I/O performance by addressing specific bottlenecks encountered with large-scale data operations. 

Read the full analysis here!

3. Window functions as a bottleneck in data aggregation 

We also focus on the processing task of our Databricks job. This task is the most CPU intensive and time-consuming component in the data pipeline and our underlying infrastructure becomes more vulnerable when being exposed to increasing data volumes as a result. We will concentrate on how a shift from traditional row-based processing methods to more advanced array-based operations have helped to mitigate bottlenecks in data partitioning and processing large datasets.  

The improvements have shifted execution to leverage SQL transformations on arrays rather than relying on computationally expensive window functions. 

Read the full analysis here!

4. Transitioning from user-defined functions to scalable Spark-based solutions

Previously, one of our data aggregation logics was implemented as a monolithic Python script within a Spark User-Defined Function (UDF) that processed observations in a sequential manner, applying multiple detection logics and utility calculations within a single execution flow. 

While it is effective for smaller datasets, this approach created several performance bottlenecks and scalability issues as our observation volumes grew. The use of a UDF meant that each observation was processed individually and our entire analysis was performed in-memory, leading to long execution times, high memory consumption and making it impractical for large-scale, real-time data processing. 

Our aim was to refactor and translate the aggregation logic to leverage Spark’s native operations and distributed computing capabilities, replacing the UDF-based approach. The new design breaks down the monolithic script into distinct processing stages, utilizing Spark’s built-in functions and window operations to manage and optimize each stage efficiently. This refactoring allows our logic to aggregate larger datasets and higher observation volumes in a structured and scalable manner. 

Read the full analysis here!

5. Reduction in DataFrame operations 

As our organization increasingly depends on Databricks for large-scale data processing, identifying and eliminating performance bottlenecks became essential to run efficient workloads. Databricks excels in distributed computing and scalability. However, by overlooking inefficient operations, we suffered major performance degradation.  

We, therefore, showcase a specific scenario in our daily operational workflow, where we eliminated a small, but critical common pitfall and were able to reduce our task’s processing time by half. 

Read the full analysis here!

A fair gaming environment needs a cutting-edge bot protection 

Denuvo Unbotify has developed a robust platform for handling large-scale data workloads with the help of Databricks, allowing us to process and analyze game sensor data more effectively and scalable. By pinpointing and resolving the key performance issues through the provided case studies, we can improve performance and boost flexibility in handling diverse and intricate workloads, enabling future expansion with the capacity to adjust to growing data volumes of concurrent players with minimal effect on expenses or performance. 

Large data processing capacity is essential for improving the game's bot prevention. It enables us to provide developers with precise bot detection by providing scalable data ingestion and accuracy. Furthermore, data processing efficiency is crucial for ensuring game transparency in the fight against cheating in general and bots in particular. The developers and/or publishers will be fed with as much insight as possible into why certain players are flagged as bots or cheaters. 

With our effective data processing ability, online reports and feedback will be issued in real time. Developers can also have access to graphical evidence in order to verify visually botting behavior or carry out basic investigations in response to players’ complaints. This will help them avoid false positives, guaranteeing the highest effectiveness in bot detection and ensuring a healthy gaming environment for players. 

Want to protect your mobile game? 

Contact us now if you need a hand with fighting bots in your mobile game! 

Similar posts