Original link: https://www.xiejingyang.com/2022/12/14/%E7%94%A8js%E5%86%99%E5%8D%A1%E7%89%8C%E6%B8%B8% E6%88%8F%EF%BC%88%E5%85%AB%EF%BC%89/
foreword
Long time no see. It’s been a long, long time since I posted the last share. During this period of time, many changes have occurred, such as job-hopping, resignation, starting a business, etc., and I have been busy with many things at hand, but the pigeons have been so long. The reason is that there is a very important function that has been stuck and has no ideas, but without this function, I feel that this project cannot continue, that is: card ai.
Online address: http://cardgame.xiejingyang.com
github: https://github.com/xieisabug/card-game
It is also recommended that you watch the video and read the article simultaneously, it will be more intuitive : https://www.bilibili.com/video/BV18e411P74S/
background
Why do you want to do AI at this stage? I believe everyone can see this game. The reference is Hearthstone. The stand-alone experience in Hearthstone is very attractive to me, not only because of the wonderful level and numerical design, but also because the computer opponents also have rich random When the situation arises, when there is a magic draw or a stuck hand, it will give the player the same experience as playing against a real person, and feel that the AI is also playing cards. I believe this is not the effect that the script can achieve, and it is precisely because of this that it gives the player Motivation to challenge again and again.
The story mode of my first few levels is directly written scripts. When I design the 10th level, I hope that the player and AI will start fighting, but if it is always a script, I will need to write a lot of repetitive and difficult to maintain scripts. , I came up with the idea of making an ai that can play cards by itself, so that I can spend more time and energy on the design of values and levels. So, I put the AI before a lot of game features. And when the game function is added later, the system design can also be carried out from the perspective of AI.
inconsequential beep beep
At present, this simple AI has been completed, and the key technology is an algorithm called Monte Carlo tree search. Next, I will follow the steps of: find a suitable technology ⇒ design algorithm ⇒ code to complete ai, and share these steps, and at the same time, I will briefly introduce the Monte Carlo tree search.
Find the right technology
When looking for a technical solution, I tried two solutions. The first solution is to define an action priority, and then simulate all possible actions according to the priority. Whether the score has increased or decreased, if it has increased, it will be kept, and it will be decreased. If you fail, you will fall back to the action. This scheme is very similar to the greedy algorithm. Unfortunately, this card game cannot use the greedy algorithm to find the optimal solution. The order of actions and the coordination of cards are very important. It is not a certain action to win the game. The decision is based on the coordinated operation of the entire round, but this plan has the shadow of a Monte Carlo tree, and it also provides a lot of ideas for later changing it to a Monte Carlo tree.
The second method attempts to use machine learning, using a method similar to a decision forest to obtain the best action. If enough games are put into the model for training, many decision trees will be generated, and votes will be made after simulating actions in the decision tree. come up with the best solution. But I also said that enough games are needed for model training, because of the lack of training sets, and the difficulty of quantifying the game’s actions is very high, so this plan also failed.
When I was at a loss, one day when I had nothing to do and went to Station B, I saw a popular science video of an up master ( https://www.bilibili.com/video/BV1qR4y1J7ht ), mentioned this Monte Carlo tree, and I learned that this algorithm is generally used to solve the ai of this kind of chess game or asymmetric information game, And part of alpha go is to use this algorithm.
A brief introduction to Monte Carlo trees. Monte Carlo is a famous gambling city. According to legend, the Monte Carlo method was first invented when a certain mathematician was playing poker. Some statistical methods are carried out on the basis of the simulation, and the statistical data is transmitted backwards within the framework of the simulation to affect the selection of the simulation path, so as to achieve a more efficient and accurate effect than ordinary simulation.
We can use our card game as an example to understand it more vividly.
design algorithm
In the current scene, the actions we can operate are: playing cards, attacking cards, and attacking heroes. At this time, Monte Carlo will generate the root node, and then expand the branches of the root node according to these three actions to simulate all branches. Battle, and then count the results of the battle, repeat this operation on the expanded branch, until a certain stop condition is met, this condition can be simulated 10,000 times, or 200ms, etc., and finally compare the statistics of the simulated battle, Choose the most suitable action set.
It sounds simple, right? The difficulty of Monte Carlo lies in converting the entire game into the steps in its framework. It is divided into four stages: selection, expansion, simulation, and backpropagation . Monte Carlo will not stop Repeat these four stages to simulate the game, as long as the process of obtaining the result is transformed into these four steps.
Coding done ai
Since our game is a hearthstone-like card game, it is different from the method of using Monte Carlo trees in games with symmetrical information such as chess. Hearthstone-like card games can perform many actions continuously in one round. And the cards owned by the other party are also unknown. The Monte Carlo tree of chess can simulate the process of playing against each other to reproduce the thinking of human beings when playing games, that is, simulate what the other party might do, while the Hearthstone-like card game can only Monte Carlo trees can be used to simulate human thinking about the current scene, that is, only our own actions can be simulated. Therefore, we need to map the entire algorithm process to multiple actions in this round, and conduct simulations to select which action is currently used to achieve the best.
First, you need to define the functions corresponding to the four stages of Monte Carlo, select, expand, simulate, backPropagation, and define an external interface getBestAction, and then fill in the search process of the Monte Carlo tree into this interface.
class MonteCarloTreeSearch { // Return the best action based on the current game state. getBestAction(myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife) { } // Select the next node to explore using a Monte Carlo tree search algorithm. select(state) { } // Extend the given node by adding its possible children to the tree. expand(node) { } // Simulate a random game from a given node to estimate its potential value. simulate(node) { } // Back-propagate the value and visit count of a given node based on the results of the simulation. backPropagation(node, win) { } }
In order to make AI faster, we need to limit the time AI thinks each time, so first limit the simulation running time of AI within our controllable range:
getBestAction(myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife) { // Initialize cacheTree this.cacheTree = {}; // initialize state let state = { myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife }; // Calculate the specified time let end = Date.now() + this.maxTime * 1000; let win; // Simulate continuously within the specified time while (Date.now() < end) { // Monte Carlo tree search algorithm... } }
According to the framework of the Monte Carlo tree, we can convert the process into: obtain the simulated nodes according to the current battle state, expand the nodes, start the simulation from the expanded nodes, and finally reverse the simulated results :
// Simulate continuously within the specified time while (Date.now() < end) { // Get the simulated node let node = this.select(state); if (node && !node.isLeaf() && !this.isWin(node.state)) { // Expand the node node = this.expand(node); // Simulate the expanded node win = this.simulate(node); } // Pass the simulated results in reverse this.backPropagation(node, win); }
The purpose of “selection” is actually to obtain a node that needs to be simulated most at present. The strategy is this: if the node can be expanded, then select the node that can be expanded, otherwise select the best child node among the child nodes, and continue to judge Whether the child node is expanded or not, repeat this action until an expandable node or leaf node is found.
You may notice a special word in the description just now, that is “best”. This is also the first secret to make the Monte Carlo tree more scientific. I am using the UCB algorithm here. You can read about the UCB algorithm It is understood that a score is obtained based on the number of times played and the number of wins. The higher the score, the better the evaluation. For the specific UCB algorithm principle, you can go to the Internet to see other people’s explanations, which are better than what I said.
To improve the select function, I have expressed most of the explanations in the code through comments:
select(state) { // Used to prevent infinite downward search, the maximum number of layers let maxLevel = 0; // Take out the node corresponding to the current state from the cache let node = this.cacheTree[hashState(state)]; while (node.isFullyExpanded() && !node.isLeaf()) { maxLevel++; // Get all the actions that the node can do let actions = node.allActions(); let bestAction; let bestUCB1 = -Infinity; // Traverse all actions and find the action with the best evaluation actions.forEach(a => { let action = a. action; let child = node. childNode(action); if (child) { // Use UCB algorithm to score nodes let childUCB1 = child.getUCB1(this.UCB1); if (childUCB1 > bestUCB1) { bestAction = action; bestUCB1 = childUCB1; } } }); if (bestAction) { node = node. childNode(bestAction); } if (maxLevel > this. maxDeep) { break; } } return node; }
In the definition just now, we designed many functions for node, and then created node classes and added implementations for these functions (this top-down design method is also my favorite programming method, which can make my thinking more clearly):
const { range, hashState } = require('../../utils'); class MonteCarloTreeSearchNode { /** * Constructor, initialize node* Record the parent node of the node, the generated action, and the current state* * For better performance, we cache all child nodes in advance, which is what the cacheAllChildren function does*/ constructor(parent, action, state) { this. parent = parent; this.action = action; this.state = state; // Used to cache child nodes this.children = {}; // Number of games this.n_plays = 0; // Number of wins this.n_wins = 0; // The hash value of the current state is also calculated in advance for performance this.hash = hashState(this.state); // Cache the child nodes of the actions that the current node can perform this.cacheAllChildren(); } /** * Cache the child nodes of the actions that the current node can perform */ cacheAllChildren() { if (this. action && this. action. event === "END_MY_TURN") { return; } let action; let myHandCard = this.state.myHandCard, otherTableCard = this.state.otherTableCard, myTableCard = this.state.myTableCard, fee = this.state.fee; // Cache end round node action = { event: "END_MY_TURN" }; this.children[this.actionHash(action)] = { action, node: null }; // Card play judgment, all cards that can be played within the cost for (let i = 0; i < myHandCard.length; i++) { if (myHandCard[i]. cost <= fee) { action = { event: "OUT_CARD", card: Object. assign({}, myHandCard[i]), i }; this.children[this.actionHash(action)] = { action, node: null }; } } // If there is a taunt, you must attack the taunt first, if not, attack at will let dedicationIndexList = []; // index of the taunt card let attackCardList = []; // The card metadata corresponding to the card index // Find all taunts otherTableCard.forEach((i, index) => { if (i. isDedication) { dedicationIndexList. push(index); attackCardList. push(i); } }); let hasDedication = true; // If there is no mockery, all can be attacked. For convenience, put them directly in attackCardList and dedicationIndexList if (attackCardList.length === 0) { hasDedication = false; attackCardList = otherTableCard. slice(); dedicationIndexList = range(otherTableCard. length) } // attack card judgment for (let i = 0; i < myTableCard.length; i++) { if (!myTableCard[i].isActionable) continue; // Those who have acted do not need to continue let attackCard = Object.assign({}, myTableCard[i]); // If there is no taunt, you can attack the opponent's hero if (!hasDedication) { // Attack the opponent's hero action = { event: "ATTACK_HERO", k: attackCard.k }; this.children[this.actionHash(action)] = { action, node: null } } // attack attackable cards for (let j = 0; j < attackCardList.length; j++) { let beAttackCard = Object. assign({}, attackCardList[j]); action = { event: "ATTACK_CARD", card: attackCard, attackCard: beAttackCard, myK: attackCard.k, attackK: beAttackCard.k, i, j: dedicationIndexList[j] }; this.children[this.actionHash(action)] = { action, node: null } } } } /** * Get all executable actions of the current node* @returns all executable actions*/ allActions() { return Object.values(this.children); } /** * Extended node * @returns child node */ expand(action, childState) { if (Object. keys(this. children). indexOf(this. actionHash(action)) == -1) { throw new Error("No such play!"); } let childNode = new MonteCarloTreeSearchNode(this, action, childState) this.children[this.actionHash(action)] = { action, node: childNode } return childNode } /** * Get the child node corresponding to the action * @returns child node */ childNode(action) { let hash = this. actionHash(action); if (!this. children[hash]) { return null; } return this.children[hash].node; } /** * Get the collection of nodes that have not been expanded * @returns the collection of child nodes that have not been expanded */ unexpandedActions() { let ret = []; Object.values(this.children).forEach(child => { if (child. node === null) { ret.push(child.action); } }); return ret; } /** * Update the game status of the current node according to whether it wins or not */ update(win) { this.n_plays += 1; this.n_wins += win? 1 : 0; } /** * action hash * @returns hash value of the action */ actionHash(action) { switch(action. event) { case "OUT_CARD": return `${action.event}_${action.i}`; case "ATTACK_HERO": return `${action.event}_${action.k}`; case "ATTACK_CARD": return `${action.event}_${action.i}_${action.j}`; default: return action.event; } } /** * Is it a leaf node */ isLeaf() { return this.allActions().length <= 1; // can only end the round} /** * UCB algorithm */ getUCB1(biasParam) { return (this.n_wins / this.n_plays) + Math.sqrt(biasParam * Math.log(this.parent.n_plays) / this.n_plays); } /** * Whether the child nodes are fully expanded */ isFullyExpanded() { return this.unexpandedActions().length === 0 || (this.unexpandedActions().length === 1 && this.unexpandedActions()[0].event === "END_MY_TURN"); } } module.exports = MonteCarloTreeSearchNode;
All the functions of the sub-nodes are completed, and continue to improve the framework of the Monte Carlo tree. In the second step, expand, expand is relatively better understood than select, which is to expand the currently selected node to the next state, randomly Select an action that has not been performed, and then simulate the execution of this action to generate a new state. Through this state to generate a node, the expand operation is completed:
expand(node) { // Get the unexpanded actions of the node let actions = node.unexpandedActions(); // Choose an action randomly let index = Math.floor(Math.random() * actions.length); let action = actions[index]; // Expand the node let childState = nextStateByAction(node.state, action); let childNode = node.expand(action, childState); // If it is end_turn, the state will not change, so there is a problem with the tree if (action.event !== 'END_MY_TURN') { this.cacheTree[hashState(childState)] = childNode; } return childNode }
It should be noted that there is a special action in our game, which is to end the round directly. This action is very important. In the game, we have many actions to choose from, but sometimes it is best to do nothing In the Monte Carlo tree algorithm, we are very dependent on the state, and the action of directly ending the round will not change the state, so we have to separate this action from other actions.
The next step is to simulate the rest of the round, which is somewhat similar to expand. Randomly select actions and continue to expand until there are no more actions. However, this process will not be truly expanded and cached, but as long as the final the result of.
Of course, at the end of the simulation round, it is necessary to judge whether this series of actions has played a positive role. This is also the top priority of our algorithm, that is: scene value evaluation.
/** * Simulate the process of playing cards* @param {*} node simulated node * @returns win or not */ simulate(node) { let state = node. state; let tempNode = node; // If there is no victory or the action is not finished, keep simulating while (!this.isWin(state) && !this.isEndMyTurn(state)) { let actions = tempNode. allActions(); let index = Math. floor(Math. random() * actions. length); let action = actions[index].action; let childState = nextStateByAction(tempNode. state, action); tempNode = new MonteCarloTreeSearchNode(tempNode, action, childState) state = childState; } // Determine whether it is successful return this.checkValue(state.myTableCard, state.otherTableCard, state.myHandCard, state.myRemainingCard, state.fee, state.myLife, state.otherLife) > this.currentValue; }
checkValue is the scene evaluation function, and the currentValue used to compare the victory is the scene value evaluated in the initial state of the ai running.
First write out the code for currentValue calculation, and initialize the root node:
getBestAction(myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife) { // Initialize the current state value this.currentValue = this.checkValue(myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife); // Initialize cacheTree this.cacheTree = {}; // initialize state let state = { myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife }; // Initialize root this.cacheTree[hashState(state)] = new MonteCarloTreeSearchNode(null, null, state); // Calculate the specified time let end = Date.now() + this.maxTime * 1000; let win; // Simulate continuously within the specified time while (Date.now() < end) { let node = this. select(state); if (node && !node.isLeaf() && !this.isWin(node.state)) { node = this. expand(node); win = this. simulate(node); } this. backPropagation(node, win); } }
Next, analyze how checkValue needs to be completed.
According to my understanding of card games, the value of the scene will take into account the health and attack power of the monsters on both sides of the field. If the monster has special affixes or effects, it also needs to be considered. If it is your own monster, it is positively correlated. The enemy The monsters of are negatively correlated, and the two parties cancel out to become the final scene value.
After performing these basic calculations, it is also necessary to consider that if the cards on the table can play combo, then the evaluation of the card will be improved. If there are unplayed combo cards in the hand, then the evaluation will also be improved. There are still very powerful cards in the group that have not been drawn, so the evaluation can also be improved, so this checkValue function can be continuously improved. I release the checkValue I am currently using for your reference:
/** * Calculate the value of the current scene* @param {*} myTableCard My table card set* @param {*} otherTableCard The opponent's table card set* @param {*} myHandCard My hand card* @param {*} myRemainingCard My remaining cards * @returns the value of the current scene */ checkValue(myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife) { // Algorithm: Your side's face value + your own hand value + your own deck value - your opponent's face value // Purpose: Play more combos, calculate the order of playing cards and the value of your own side in various ways of playing cards is the largest the method of // // Note: Calculation of special monster attribute value = taunt 2 + battlecry 0 + deathrattle 1 + 5 per round + holy shield 1 + lifesteal 2 + energy 1 // fixed combination value = fixed combination setting value + original card value // // Algorithm of our own scene value = our field attack + our monster's life value + our special monster attribute + fixed combination value * 1.5 The fixed value is multiplied by an enlarged number to allow AI to play more combos // Algorithm of opponent's field value = opponent's field attack + opponent's monster HP + opponent's special monster attribute + fixed combination value * 1.5 // Value of your own deck = fixed combination value // Value of your own hand = fixed combination value return (myTableCard.length ? myTableCard.reduce((pre, current) => { return pre + current.attack + current.life + this.calSpecialValue(current) }, 0) : 0) + this.comboCardValue(myTableCard) + this. calHandCardValue(myHandCard) + this. calRemainingCardValue(myHandCard, myRemainingCard) + (myLife - otherLife) - (otherTableCard. length ? otherTableCard. reduce((pre, current) => { return pre + current.attack + current.life + this.calSpecialValue(current) }, 0) : 0) } /** * Calculates the value of special monster attributes* Taunt2Battlecry0Deathrattle15 Divine Shield1 per turnLifesteal2Energetic1 * @param {*} card card * @returns special value */ calSpecialValue(card) { let value = 0; value += (card. isDedication ? 2 : 0); value += (card. isStrong ? 1 : 0); value += (card. isFullOfEnergy ? 1 : 0); value += (card. onEnd ? 1 : 0); value += (card. onMyTurnEnd ? 5 : 0); value += (card.onMyTurnStart ? 5 : 0); value += (card.specialValue != undefined ? card.specialValue : 0); // Because specialValue can be negative, it can only be directly judged as undefined return value } /** * * @param {*} cardList */ comboCardValue(cardList) { return 0 } calHandCardValue(myHandCard) { myHandCard.forEach(card => { // Traverse the hand cards to see if there is a special combination}) return 0; } calRemainingCardValue(handCard, remainingCard) { return 0 }
In this way, the simulation is considered complete, so the last step is left, which is relatively simple: backpropagation, which is also used in some machine learning algorithms, and it makes the results of each simulation accurate The accuracy of the next simulation result will be improved:
/** * Backpropagation, update the value related to the parent node upwards * @param {*} node simulated node * @param {*} winner whether to win or not */ backPropagation(node, win) { let tempNode = node; // Keep passing up until the root node while (tempNode) { tempNode. update(win); tempNode = tempNode. parent; } }
Well, at this point, the search phase of the Monte Carlo tree has been completed, and the next step is to obtain the current results. Add these codes to the back of getBestAction, so that you can directly call getBestAction to obtain the result of the Monte Carlo tree search. The result of the action is:
// If the node is not fully expanded, it proves that the simulation is not completed if (this.cacheTree[hashState(state)].isFullyExpanded() === false) throw new Error("Failed to simulate completion") let node = this.cacheTree[hashState({ myTableCard, otherTableCard, myHandCard, myRemainingCard, fee, myLife, otherLife })] let allActions = node. allActions() let bestAction // If there is only one choice and it is the end of the turn, then you can return directly without calculation if (allActions.length === 1 && allActions[0].action.event === 'END_MY_TURN') { return allActions[0].action } // Traverse all actions, calculate the winning rate of the action, the action with the highest winning rate is the best action let max = -Infinity for (let a of all Actions) { let action = a. action; // When the action ends the round directly, the winning rate is set to 0.5, if there is a winning rate over 50%, replace it, otherwise execute the end round if (action.event === 'END_MY_TURN') { bestAction = action; if (max < 0.5) { bestAction = action max = 0.5 } } else { let childNode = node. childNode(action) if (!childNode) { console.error("childNode is null", node.children, action); } let ratio = childNode.n_wins / childNode.n_plays if (ratio > max) { bestAction = action max = ratio } } } return bestAction
Summarize
If the subsequent levels involve battles, I will use this algorithm for AI battles.
There are many things that can be improved, such as checkValue can be more scientific, many branches of the Monte Carlo tree can be pruned and weight adjusted, so that the search can be carried out on more useful branches, etc. You can also adjust these places by yourself, let AI Can achieve a more intelligent effect.
The post Writing card games with js (eight) first appeared on Xieisabug .
This article is reproduced from: https://www.xiejingyang.com/2022/12/14/%E7%94%A8js%E5%86%99%E5%8D%A1%E7%89%8C%E6%B8%B8% E6%88%8F%EF%BC%88%E5%85%AB%EF%BC%89/
This site is only for collection, and the copyright belongs to the original author.