网易智企为电魂旗下的《野蛮人2》提供了一套完整的定制化游戏AI机器人解决方案，包含接入方案、机器人训练、机器人部署和机器人迭代四大内容。具体来说： 接入方案：玩法体验和接入方法设计； 机器人训练：SAR方案设计、模型网络设计、大规模分布式训练； 机器人部署：多难度AI机器人接口、多风格AI机器人接口、局内动态难度调整接口、私有化部署支持； 机器人迭代：遇到大的版本更新，进行模型加训或模型结构调整，确保机器人表现符合要求；
这里我们用gif直观展现了不同风格AI的行为表现，第一个gif是谨慎风格的AI机器人，它在血量低的时候倾向远离对手并用远程道具poke对手；第二张是激进风格的AI机器人，即使它在低血量的时候也会主动进攻并击杀敌人；最后是辅助风格的AI，在队友血量低时它倾向掩护并治疗队友。我们统计了不同风格AI机器人的一些基本数据情况，并用雷达图的形式展示出来。Assist、Death、Kill、SR(Score Rata)、TFT(Team Fight Tendency)、EDK(Effective Devil Kill)、DK(Devil Kill)、TD(Team fight Damage)、AT(Attack Tendency)。从图中可以看出，谨慎风格更倾向于打团战（TFT很高）而且其死亡次数和击杀次数都比较低；激进风格则是攻击欲望很强（AT很高），其击杀和死亡次数都很高；辅助风格比较明显，其助攻次数、得分率以及参加团战倾向都很高。
针对不同难度的设计，我们采用了利用fake state将模型从最高难度依次削弱的方法。即我们会训练一个最强的模型，然后在模型中依次加入不同的fake state，最终得到了6个不同的难度。
I am delighted to be here at GDC to give this speech to all our partners in the gaming industry. The main topic is the challenges we faced and the solutions we came up with when applying reinforcement learning game bots in our game, the BarbarQ2. Our game is expected to be officially launched in the second half of the year. We have completed online testing, and the performance of AI game robots throughout the testing process has been excellent, far beyond our expectations. I would like to give special thanks to our partner NetEase GrowthEase, who provided us with a complete set of AI game bot solutions to help us integrate services, the training and deployment of AI game bot were implemented by them as well.
Here is the framework for the speech content. First, introduce our game, then give a brief overview of reinforcement learning. Next, mainly talk about the challenges and solutions we encountered in applying reinforcement learning. Finally, some conclusions.
Here I would like to give a brief introduction of the game: BarbarQ2, as well as its core gameplay called mushroom melee which AI game bot applied into. Mushroom Melee has a 3v3v3 form, a gamer keep collecting mushrooms, by killing others or from the ground directly, to continuously update equipment and scores. A gamer can become a demon king through accumulating scores or killing current demon king, the team wins the game if it contains the demon king at the end of the game. Here is video showing the process from game start to the master evil
I want to talk about our aim of application of the AI game bots, that is we want to get very human-like AI game bots, these bots can be utilized to match or against the human gamers to give them much better experience as well as decline the waiting time of matchmaking.
In a real game environment, the behaviors of human players are more complex. Some traditional implementation of game robot (such as finite state machine FSM, behavior tree, etc.) are often difficult to imitate all the behaviors of human players, which make the action pattern of traditional robot fixed and single. Players can detect the fake of traditional bot at a glance; moreover, with the increase in the number of battles, players can easily learn the behavior rules of traditional bot, and high-end players are able to win the game bot very easily, it leads to a greatly reduced sense of accomplishment after the battle even players win the game. In order to solve this problem, we use the method of reinforcement learning to train game bot called AI game bot. The reinforcement learning is a special and important model training method in the field of artificial intelligence. It is totally different from conventional supervised learning or unsupervised learning that requires a large amount of training data, but requires a real environment. The model need to be trained (usually called agent) continuously interact with a real environment, and the environment gives corresponding interactive feedbacks to the agent. The agent will be guided by these feedbacks to interact with the environment, thus better adapt to the environment. Particularly, an agent observes the state of the environment at the current moment, then output an action that should be executed in the environment, after the action is executed in the environment, this agent will receive rewards from the environment feedback. After continuous cycling of the above process, the agent will learn an interaction strategy that better adapts to the environment; because the learning process of agent is similar to the learning process of human, reinforcement learning is also considered to be a more feasible way to achieve general artificial intelligence, so it has a higher status in the field of artificial intelligence. Our AI game bot adopts the training method of reinforcement learning, so it performs better than the traditional game bot, in terms of human-like level and strength.
In order to reduce the cost of training AI game bots and improve training efficiency, we have developed a distributed reinforcement learning training framework called RLEase. The core part of the framework includes two modules: Worker and Learner. The main function of Worker is to interact with the game environment, collect data such as states, actions, and rewards generated by the interaction, and regularly send these data to the Learner module, as well as synchronize the model on the Learner side regularly; the main function of the Learner is to use the data collected by the Worker for model training. During actual training, there are multiple Workers and only one Learner. Other parts of the framework provide more auxiliary functions for training tasks. The Stat module is used to count some basic data in training, such as the loss during training, the wining rate between models, etc.; the Model Manager module is used for model management, including model I/O, model scheduling, etc., model scheduling is to select a suitable opponent model for the current model according to the wining rate between the models to achieve diverse self-play.
We designed such a network structure to adapt to the training of the BBQ2 AI game bot model. The input and output of the network are both multi-head structures. The game environment states be input to the network include character states, team states, opponent states, and boss states, these four parts of the map states. The character states are multi-dimensional information includes character position data, blood volume data, skill data, buff data and so on. The team states are information includes the role state of each teammate and team points. The opponent states are information includes the opponent's character state and the points of the enemy team. While the map states are used to represent the basic information such as items, obstacles, and mushrooms around character. Since the reinforcement learning algorithm we applied follows the Actor-Critic paradigm, the output of the network contains two parts: action and value. The output action is the action that the AI game bot should perform in the environment, and the output value is used to evaluate this action to be good or bad. The output value is only used during training. When the model is deployed online, the value output will be discarded and only the action output will be kept.
Considering the cost, we had many restrictions in the process of developing AI game bot. The total computing resources used by the entire project under 3000 CPU cores; due to the very short term before the game online test, we have only three weeks to develop AI game bots and we also need to update model frequently to catch up with the game’s fast update pace.
In most cases, game developers have to learn reinforcement learning related knowledge if they want to apply AI game bot in their games, it turns out to a be heavy burden for these game developers; while for AI engineers, they usually do not understand video game development.
In order to solve this problem, we developed a middleware called AIBridge. As the name suggests, AIBridge builds a bridge between game developers and AI engineers. The main purpose of AIBridge is to encapsulate the complex logic structure of the AI server. Game developers only need to call its exposed API to use the functions of AI game bot. For game developers, the logic of AI robots is completely transparent.
There are two main classes in AIBridge: AIGamePlay and AIAgent. The AIGamePlay maintains a set of sessions with game logic, and the AIAgent class corresponds to a specific role in the game. The AIGamePlay class provides game developers with three interfaces: GameStart, Tick, and GameEnd. Game developers need to call GameStart after the game starts to establish a session between the game and AI service, then call tick as needed to let AIAgent do actions, Call GameEnd before the game ends to end the session. This method can make game access to AI game bot service very easily.
Second challenge, due to the video game’s complex logic and huge amount of code, the game inevitably contains many unknown bugs. Some bugs are fatal to the model training. For example, during the actual training, we encountered a bug that a character got the maximum value of the movement speed after releasing its ultimate, other characters could not catch up and kill it anyway, so the states generated under this condition were definitely wrong, and the AI game bot behaved weirdly because they trained by wrong data.
To solve this problem, we conducted a series of real-time data statistical analysis of the training model, and decided whether the training is normal by observing whether the distribution of various indicators was within a reasonable range, thus to know whether there are some unknown game’s bugs.
At the same time, we also checked the videos generated in the training tasks that are suspected of containing bugs to analyze the problem.
Abstracting complex game systems is our biggest challenge in AI game bot development. As of the launch of our AI game bot, the BBQ2 game has involved 6 heroes and 3 maps. Each match is a battle between 3 teams, and the existence of same heroes is allowed. Heroes in the game are allowed to picked up and use items that appear randomly; in addition, BBQ2 also includes complex external growing systems such as star power and equipment. We need to abstract all these contents into the state space that is input to model. As the game continuously updates, there will be more heroes, items, and equipment, we also need to care about the model’s adaptation to these new changes.
In addition to input information, we also need to design the action space of the AI game bots to ensure they can do any actions as smoothly as human players. It’s crucial to make sure the rationality of these designs to get excellent performance of AI game bot.
To solve the problem of the state space being too large, we implemented some very effective feature engineering. For instance, to better get information of mushrooms around the character, we created a circle centered on a character and divided it to 8 equal sectors, then normalized the number of mushrooms in each sector and limited the perception range of a AI game bot. Each character perceived the actual location of other visible characters during actual training. Experiments prove that this is a very effective way in improving the speed of model training. In order to solve the problem of state space changes caused by game updates, we replaced all one-hot encodings used to represent items or skills states with their embedding forms.
We abstracted the actions in the game and divided them into 11 different types, as well as action direction was abstracted into 8 types, plus 10 skill releasing targets: 9 heros and a boss. Therefore, every time the agent has 990 choices when need to make an action
In order to reduce the problem of training difficulties caused by overly complex games, we designed a training method of curriculum learning. In the early stage of the training task, we use a fixed line-up of heroes and fixed equipment. After the AI game bot reaches a relatively high level, we introduce more heroes and equipment. Until the final stage of training, all heroes and equipment will be trained randomly. Using this method, we greatly reduced the difficulty of training, and the model could achieve very good result even the heroes and equipment are all randomly given.
Using the reward-shaping method can also speed up model training. Based on past experience, we carried out very detailed reward shaping on the BBQ2 project to ensure that AI game bot can behave like human player. For example, we designed some rewards to encourage other characters to kill the demon king, and the character who becomes the demon king receives a higher penalty, thus the AI game bot knows how to siege the demon king or the demon king try it best to avoid other characters' attack.
Making game bot more closed to human player is our original intention of developing AI game bot. To accomplish this, we designed AI game bot with different styles and difficulties.
To achieve diverse styles, we focused on the reward shaping method. We implemented three different styles, they are aggressive style, auxiliary style, and cautious style. For the aggressive style design, we added appropriate rewards for damage and killing, and reduced penalties for taking damage and dying based on the basic robot model. The assisted style added rewards for assists and healing, and reduced rewards for damage. The cautious style added rewards for objectives and survival, increased penalties for taking damage, and reduced rewards for damage and killing.
Here we use gifs to intuitively show the behavior of different styles. The left gif is cautious style. When the blood volume is low, it tends to stay away from the opponent and poke the opponent with remote items; the middle one is the aggressive style, even when it is low in blood, it attacks and kills enemies actively; the right one is assisted style, it tends to cover and heal teammates when their blood is low. We have collected some basic data of different styles of AI game bots and displayed them in the form of radar chart. Assist, Death, Kill, SR (Score Rata), TFT (Team Fight Tendency), EDK (Effective Devil Kill), DK (Devil Kill), TD (Team fight Damage), AT (Attack Tendency). The figure shows that the cautious style prefers team fight (high TFT) and the number of death and killing is relatively low; the aggressive style has a strong desire to attack (high AT), it has higher number of death and killing; the assisted style is obvious, the number of assists, scoring rate, and tendency to team fight are all very high.
To achieve diverse difficulty levels, we utilized the method of fake state which gradually weakening the model from strongest level. Specifically, we trained the strongest model and then added different fake states to the model one by one, resulting in 6 difficulty levels
The so-called fake state is to make the AI game bot has a wrong perception, which leads to wrong decisions. In this example, the equipment and blood volume of the purple character are worse than that of the green character in the real condition, but we deliberately change the state input to the purple character, so that the purple character mistakenly thinks his equipment and blood volume are better than the opponent's, the purple character who should be running away will tend to attack the opponent actively. But in fact the opposite happened, so the purple character was killed.
Let's relax and watch a video of a human player fight with an AI game bot.
To conclude, in the BBQ2 Singapore testing, we created over 500,000 AI game bots, conducted more than 100,000 games. The average waiting time for real players was reduced by 64%, and the number of game openings increased by 3.5 times.
Through the development of this project, we can feel that introducing RL-based AI game bot into games can indeed enhance players' gaming experience. Throughout the implementation of the entire project, we encountered numerous challenges, but we also came up with corresponding solutions. such as AI-Bridge middleware, training stats monitoring, replay analysis, feature engineering, reward shaping, curriculum learning, etc.
Here, we would like to express our special thanks to our business partner, GrowthEase of NetEase. If you are considering applying reinforcement learning game bot to your game, choosing the right service provider can maximize risk avoidance and revenue lift. Our partner, GrowthEase, has provided us with a complete AI robot solution, including service integrate, model training, model deployment, and upgrade, which has helped us successfully apply RL-based AI game bot to our game."