Dev Letter: Server Performance Improvements
The questionable server performance and a whole ton of unsolved issues with the regular skin release has been pushing the community away for quite a while already, even the released FIX PUBG campaign are still only vociferous statements - no notable improvements have been released yet. To show the community they still care the PUBG Corp. team has released another vast press release explaining the how the server side currently works.
In today’s Dev Letter, we’d like to share some of the progress we’ve made towards significantly improving server performance, as well as additional improvements we’re currently working towards.
The version of Unreal Engine that PUBG currently uses is based on a Client-Server model and therefore, the status of each actor (objects placed in levels that represent characters, buildings, backdrops, cameras, etc) has to be updated through the server for each player.
Server performance is usually indicated by server tick-rate or Frame Time. As server performance increases, the time per frame will decrease. As time per frame decreases, the Server Response Time improves as well.
The quicker the server responds, the faster your actions actions/movements are updated to other people. For example, how quickly you disappear from your opponent’s screen after ducking behind a wall on yours. If we want to reduce what is commonly known as de-sync, Server Response Time needs to be improved.
Improvements through Update #14
The network process structure before Update #14 and takeaways
Before Update #14, the network was processed on the server in the Unreal Engine as shown below.
Let us first explain the above network process flow. In the “Net Dispatch” stage, the packet received from the client is processed on the server. For example, you have incoming information such as Gunfire, Movement, etc. Things that are processed during this step are usually spread to other clients in two forms: RPC(Remote Procedure Calls) and Replication. After this, game logic like Physics Simulation is processed during the “Simulate & Render” stage and the result is delivered to all clients via “Net Flush”.
However, when RPCs are sent as a result of the Net Dispatch process, they are not sent immediately, but queued in the buffer. The many things stored in the buffer are sent to all clients during the “Net Flush” stage and the buffer is then emptied.
In this structure, the RPC has to go through the “Simulate & Render” stage to be delivered from “Net Dispatch” to “Net Flush” and therefore causes a delay. Perhaps Unreal Engine was structured this way to minimize the number of packets delivered to UDP. Regardless, when the number of packets are minimized, the network works more efficiently.
The new network process structure and improvements
We decided that improving the server response time was more important than decreasing the number of packets for PUBG. Therefore, the team redesigned the structure as shown below in Update #14. The team added a “Net Send Flush” stage before “Simulate & Render” stage.
In the “Net Send Flush” stage, all UDP data stored in the buffer will be sent out and the buffer will be emptied. Through this new flow, the time it used to take for “Simulate & Rendering” is no longer needed, thus decreasing the delay time. During the Net Send Flush phase there are no extensive calculations for actor replication and pending UDP data is flushed.
As there are two network updates, “Net Send Flush” and “Net Flush,” in the new structure, the network update rate doubled after Update #14, which caused some people to assume that the server tick-rate increased. However, it was not the server tick-rate, but the network update rate which had jumped to 60 tick-rate as another network update is delivered during Server Tick processing.
These results can be found in Battle(Non)Sense's Update #14 Netcode Analysis.
As you can see in the table below, when 40 people are alive, the average delay for gunfire is reduced from 94.5msec to 77msec (18% decrease).
Improvements for Update #19
Profiling results before Update #19 and a new hypothesis
Profile data before PC Update #19, measured when 90 players are alive on June 25, 2018 is as following.
The Net Flush time is 43.2msec, 41% of the total frame time. Much of this time is used for “serializing” in order to replicate each actor to the client. “Serialize” is a process of writing data in an order in memory to deliver actor status to client through the network.
As we were searching for the optimization method based on the above profiling result, we thought “if we are able to reduce the number of replicated actors, especially characters, the total Net Flush time will reduce significantly.”
Unlike other games that use a dedicated server on Unreal Engine, up to 100 players simultaneously play in a game in PUBG, which means the number of actors is significantly higher. The large data size of actors is one issue, but the sheer volume of actors is the bigger problem. While we were thinking of ways to reduce the total number of actors, we thought replicating distant characters at a lower frequency would help. Since far away characters aren’t relevant at that distance, the number of actors that are serialized can be greatly reduced without affecting game play, and as a result, Net Flush time can be reduced.
Development process: Replication Interleaving System
Starting from the above idea, we reached a conclusion to implement a system that skips replication requests to a more appropriate frequency based on client and actor distance. We named this the “Replication Interleaving” system. First, we pulled out the section where actors are replicated, and lowered replication frequencies of far away characters. Then we analyzed the issues and the types of visual changes.
Once we were able to resolve the issues that occured when replication frequency was lowered, we tested how far we could go and reached the conclusion that lowering the replication frequencies to ¼ of the original level still had no impact on gameplay.
The completed Replication Interleaving system was implemented as following:
(Note: This is the status as of today, and this value may change in the future for better server performance and smoother movements)
- Step 1: Skip 1 frame on the characters that are located further than 70m
- Step 2: Skip 2 frames on the characters that are located further than 400m
Result of Improvement
Server performance increased by 20% after the new system was implemented. In the below diagram, we tracked the frame rate of an NA region server when 85 players were alive. After the update, the server tick-rate increased by 22% from 18.5 to 22.9. Other regions also showed over an average of 20% frame rate increase.
What's even better is the change that occurred in response time. Please refer to the Update #19 related YouTuber Battle(non)Sense’s video.
In the above table, you can see that when 85 players are alive, the average delay time for gunfire dropped by 58%, from 149.4msec to 61.6msec, which indicates that the issue known as de-sync significantly improved.
Through other improvements, in addition to the Replication Interleaving, the server tick-rate increased by 20% and network delay dropped by 50% when more than 80 players are alive.
Improving the server tick-rate since the launch of PUBG has been an ongoing priority for the team. In addition to solving software issues, improvements have also been made to hardware. However, we know that there have not been clearly noticeable improvements to players for the past few months preceding Update #19.
During FIX PUBG, we doubled down on server performance improvement, and continue to research and experiment on various ideas, but this is a time consuming process.
In order to implement a single function, preliminary research must be done, and after the function is implemented, a large volume of analysis, verification, and testing is required. It is difficult to solve all the problems in a short period of time because effort and time must be constantly invested in each problem. Wrong implementation of new features can cause bigger problems. Therefore, new features must be implemented and applied as carefully as possible.
That said, after applying the improvements we’ve already talked about, we’re now working on optimizing the "Net Dispatch" stage of the process. According to our analysis, most of the time is consumed on character move processing, and we have pinpointed some opportunities for optimizing it. The movement of characters has a high impact on PUBG game play. Therefore, this task requires a lot of careful attention to ensure that any improvements made do not affect how characters move in an abnormal way, such as the jittering we described above.
We are experimenting with some ideas already and we’re anticipating the time required for the "Net Dispatch" stage to drop more than 50% from the current 41.8msec if these ideas do not have to be modified by what we find during the testing process. Stabilizing the feature after implementing new ideas is expected to take more than a month, but we will continue to work quickly and implement this as soon as possible.
The ultimate goal is to always keep the server tick-rate at 30, from 100 players to the very last bullet. We’ll keep working hard to achieve this goal so we that can continue to provide the very best Battle Royale experience possible for you all.
Head of Development, PUBG Amsterdam